Compare commits

...

3961 Commits

Author SHA1 Message Date
Tony Hutter 33174af151 Tag zfs-2.2.5
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-08-02 18:03:09 -07:00
Tony Hutter 6f27c4cadd [2.2.5-only] Make 'rmmod zfs' work after zfs-2.2.4 (#16406)
db65272ae was added to zfs-2.2.4 to stub in the
VDEV_PROP_RAIDZ_EXPANDING enum without adding the RAIDz expansion
feature.  This was needed to provide the right enum count for when the
VDEV_PROP_SLOW_IO proprieties got added.  This had the unfortunate side
effect of breaking module removal though.

Specifically, with the VDEV_PROP_RAIDZ_EXPANDING stub added,
the module would correctly omit making kobjects for the RAIDz expansion
vdev property, but then would try to blindly remove its non-existent
kobjects during module unload.

This commit fixes the issue by checking for an uninitialized kobject.

Fixes: #16249

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2024-08-02 18:03:09 -07:00
Alexander Motin dd5de55eba ZTS: Make do_vol_test() more deterministic (#16379)
- Explicitly disable compression since mkfile uses a zero buffer.
 - Explicitly sync file systems instead of waiting for timeout.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-30 11:36:52 -07:00
Tony Hutter b5835ed137 Linux 6.9: Fix UBSAN errors in sa.c (#16380)
This is a follow-on to 156a64161b
that ignores UBSAN errors in sa.c.

Thank you @thwalker3 for the fix.

Original-patch-by: @thwalker3
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16278
Closes #16330
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-23 17:13:57 -07:00
Chunwei Chen ef08cb26da Fix long_free_dirty accounting for small files (#16264)
For files smaller than recordsize, it's most likely that they don't have
L1 blocks. However, current calculation will always return at least 1 L1
block.

In this change, we check dnode level to figure out if it has L1 blocks
or not, and return 0 if it doesn't. This will reduce the chance of
unnecessary throttling when deleting a large number of small files.

Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-23 12:02:10 -07:00
Rob Norris 9ad205ecde AUTHORS: refresh with recent new contributors (#16362)
Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-07-23 11:58:49 -07:00
Mark Johnston 14cce09a65 FreeBSD: Use a statement expression to implement SET_ERROR() (#16284)
This way we can avoid making assumptions about the SDT probe
implementation.  No functional change intended.

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-23 11:58:49 -07:00
Rob Norris 9835255f5d zdb: dump ZAP_FLAG_UINT64_KEY ZAPs properly (#16334)
These are used for DDT and BRT stores. There's limited information
available to produce meaningful output, but at least we can put
something on screen rather than crashing.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-17 14:54:47 -07:00
Rob Norris 4d2f7f9839 vdev_open: clear async fault flag after reopen
After c3f2f1aa2, vdev_fault_wanted is set on a vdev after a probe fails.
An end-of-txg async task is charged with actually faulting the vdev.

In a single-disk pool, the probe failure will degrade the last disk, and
then suspend the pool. However, vdev_fault_wanted is not cleared. After
the pool returns, the transaction finishes and the async task runs and
faults the vdev, which suspends the pool again.

The fix is simple: when reopening a vdev, clear the async fault flag. If
the vdev is still failed, the startup probe will quickly notice and
degrade/suspend it again. If not, all is well!

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
2024-07-17 14:54:47 -07:00
Rob Norris 25c4271d2f zts: test single-disk pool resumes properly after disk pull
A single disk pool should suspend when its disk fails and hold the IO.
When the disk is returned, the pool should return and the IO be
reissued, leaving everything in good shape.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
2024-07-17 14:54:47 -07:00
Martin Wagner c950c5d369 disable automatic dependency tracking for dkms builds
Previously the dkms build left some unwanted files
in `/usr/lib/modules` which could cause package
managers to not properly clean up old kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Wagner <martin.wagner.dev@gmail.com>
Closes #16221 
Closes #16241
2024-07-17 14:54:47 -07:00
Alexander Motin 13ccbbb47a Some improvements to metaslabs eviction
- Add old eviction for special and dedup metaslab classes. Those
vdevs may be potentially big and fragmented with large metaslabs,
while their asynchronous write pattern is not really different
from normal class. It seems an omission to not evict old metaslabs
from them.
 - If we have metaslab preload enabled, which means we are not too
low on memory, do not evict active metaslabs even if they are not
used for some time.  Eviction of active metaslabs means we won't
be able to write anything until we load them, that may take some
time, that is straight opposite to metaslab preload goals.  For
small systems the memory saving should be less important after
recent reduction in number of allocators and so open metaslabs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16214
2024-07-17 14:54:47 -07:00
Alexander Motin ba3c7692cd Destroy ARC buffer in case of fill error
In case of error dmu_buf_fill_done() returns the buffer back into
DB_UNCACHED state.  Since during transition from DB_UNCACHED into
DB_FILL state dbuf_noread() allocates an ARC buffer, we must free
it here, otherwise it will be leaked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15665
Closes #15802
Closes #16216
2024-07-17 14:54:47 -07:00
Rob N 27cc6df760 Use memset to zero stack allocations containing unions
C99 6.7.8.17 says that when an undesignated initialiser is used, only
the first element of a union is initialised. If the first element is not
the largest within the union, how the remaining space is initialised is
up to the compiler.

GCC extends the initialiser to the entire union, while Clang treats the
remainder as padding, and so initialises according to whatever
automatic/implicit initialisation rules are currently active.

When Linux is compiled with CONFIG_INIT_STACK_ALL_PATTERN,
-ftrivial-auto-var-init=pattern is added to the kernel CFLAGS. This flag
sets the policy for automatic/implicit initialisation of variables on
the stack.

Taken together, this means that when compiling under
CONFIG_INIT_STACK_ALL_PATTERN on Clang, the "zero" initialiser will only
zero the first element in a union, and the rest will be filled with a
pattern. This is significant for aes_ctx_t, which in
aes_encrypt_atomic() and aes_decrypt_atomic() is initialised to zero,
but then used as a gcm_ctx_t, which is the fifth element in the union,
and thus gets pattern initialisation. Later, it's assumed to be zero,
resulting in a hang.

As confusing and undiscoverable as it is, by the spec, we are at fault
when we initialise a structure containing a union with the zero
initializer. As such, this commit replaces these uses with an explicit
memset(0).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16135
Closes #16206
2024-07-17 14:54:47 -07:00
Rob Norris d06c8de748 zdb: bring crash handling over from ztest
ztest has a very nice ability to show a backtrace when there's an
unexpected crash. zdb is used often enough on corrupted data and can
blow up too, so nice output is useful there too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16181
2024-07-17 14:54:47 -07:00
Rob N 2a2e358475 libspl_assert: always link -lpthread on FreeBSD
The pthread_* functions are in -lpthread on FreeBSD. Some of them are
implicitly linked through libc, but on FreeBSD 13 at least
pthread_getname_np() is not. Just be explicit, since -lpthread is the
documented interface anyway.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16168
2024-07-17 14:54:46 -07:00
Martin Matuška bc42d96d66 Unbreak FreeBSD cross-build on MacOS broken in 051460b8b
MacOS used FreeBSD-compatible getprogname() and pthread_getname_np().
But pthread_getthreadid_np() does not exist on MacOS. This implements
libspl_gettid() using pthread_threadid_np() to get the thread id
of the current thread.

Tested with FreeBSD GitHub actions
freebsd-src/.github/workflows/cross-bootstrap-tools.yml

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #16167
2024-07-17 14:54:46 -07:00
Rob Norris 88686213c3 libspl/assert: use libunwind for backtrace when available
libunwind seems to do a better job of resolving a symbols than
backtrace(), and is also useful on platforms that don't have backtrace()
(eg musl). If it's available, use it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-07-17 14:54:46 -07:00
Rob Norris 21f66db674 libspl/assert: dump backtrace in assert
Adds a check for the backtrace() function. If available, uses it to show
a stack backtrace in the assertion output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-07-17 14:54:46 -07:00
Rob Norris 3ca305f873 libspl/assert: add lock around assertion output
If multiple threads trip an assertion at the same moment (quite common),
they can be printing at the same time, and their output gets messy.

This adds a simple lock around the whole thing, to prevent a second task
printing assert output before the first has finished.

Additionally, if libspl_assert_ok is not set, abort() is called without
dropping the lock, so that any other asserting tasks will be killed
before starting any output, rather than only getting part-way through.
This is a tradeoff; it's assumed that multiple threads asserting at the
same moment are likely the same fault in different instances of a
thread, and so there won't be any more useful information from the other
tasks anyway.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-07-17 14:54:46 -07:00
Rob Norris 96cad4ca4c libspl/assert: show process/task details in assert output
Makes it much easier to see what thing complained.

Getting thread id, program name and thread name vary wildly between
Linux and FreeBSD, so those are set up in macros. pthread_getname_np()
did not appear in musl until very recently, but the same info has always
been available via prctl(PR_GET_NAME), so we use that instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-07-17 14:54:46 -07:00
Brooks Davis 5668411713 Only provide execvpe(3) when needed
Check for the existence of execvpe(3) and only provide the FreeBSD
compat version if required.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Brooks Davis <brooks.davis@sri.com>
Closes #15609
2024-07-17 14:54:46 -07:00
Rob Norris 32cd2da551 find_system_library: fix var cleanup when library not found
The "not found" path is attempting to clear SOMELIB_CFLAGS and
SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to
AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`,
which is defined as "if the first arg is non-empty". The m4 "empty"
construction is [], therefore, the existing AC_SUBST calls never modify
the variables at all.

The effect of this is that leftovers from the library test can leak out.
At least, if a library header is found in the first stage, but the
library itself is not, -lsomelib is added to SOMELIB_LIBS and further
tests done. If that library is not found, SOMELIB_LIBS will not be
cleared.

For most of our library tests this hasn't been a problem, as they're
either always found properly via pkg-config or set directly, or the
calling test immediately aborts configure. For an optional dependency
however, an apparent "partial" result where the header is found but no
corresponding library causes link errors later.

I think a complete fix should probably not be setting SOMELIB_xxx until
the final result is known, but for now, adjusting the AC_SUBST calls to
explictly set the empty shell string (which is not "empty" to m4) at
least restores the intent.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
2024-07-17 14:54:46 -07:00
Rob N fa2480f5b3 abd_iter_page: rework to handle multipage scatterlists
Previously, abd_iter_page() would assume that every scatterlist would
contain a single page (compound or no), because that's all we ever
create in abd_alloc_chunks(). However, scatterlists can contain multiple
pages of arbitrary provenance, and if we get one of those, we'd get all
the math wrong.

This reworks things to handle multiple pages in a scatterlist, by
properly finding the right page within it for the given offset, and
understanding better where the end of the page is and not crossing it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reported-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16108
2024-07-17 14:54:46 -07:00
Rob N ad8c8c1e31 zts: add a debug option to get full test output
The test runner accumulates output from individual tests, then writes it
to the log at the end. If a test hangs or crashes the system half way
through, we get no insight into how it got to where it did.

This adds a -D option for "debug". When set, all test output is written
to stdout.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16096
2024-07-17 14:54:46 -07:00
Rob N f14a62ebbe zts: allow running a single test by name only
Specifying a single test is kind of a hassle, because the full relative
path under the test suite dir has to be included, but it's not always
clear what that path even is.

This change allows `-t` to take the name of a single test instead of a
full path. If the value has no `/` characters, we search for a file of
that name under the test root, and if found, use that as the full test
path instead.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16088
2024-07-17 14:54:46 -07:00
Daniel Berlin dfdac38afb Fix missing semicolon in trace_dbuf.h (#16281)
On fedora 40, on the 6.9.4 kernel (in updates-testing), assign_str
expands to a "do {<stuff> } while(0)" loop.  Without this semicolon,
the while(0) is unterminated, causing a cascade of useless errors.
With this semicolon, it compiles fine.  It also compiles fine on 6.8.11
(the previous kernel).  I have not tested earlier kernels than that, but
at worst it should add a pointless semicolon.

All other instances in the source tree are already terminated with
semicolons.

Signed-off-by: Daniel Berlin <dberlin@dberlin.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-16 16:34:07 -07:00
a1ea321 08da054005 one-word manpage correction: snapshot->rollback (#16294)
This commit fixes what is probably a copy-paste mistake. The
`dracut.zfs` manpage claims that the `bootfs.rollback` option executes
`zfs snapshot -Rf`. `zfs snapshot` does not have a `-R` option. `zfs
rollback` does.

Signed-off-by: Alphan Yılmaz <alphanyilmaz@gmail.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-16 16:34:07 -07:00
Tony Hutter bb401c02fc Linux 6.9 compat: META (#16358)
Update the META file to reflect compatibility with the 6.9
kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-16 16:29:26 -07:00
Rob Norris da9da6aea6 ZTS: handle FreeBSD version numbers correctly (#16340)
FreeBSD patchlevel versions are optional and, if present, in a different
location in the version string.

Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-16 15:47:10 -07:00
Tony Hutter 97f1eb8052 ZTS: Fix redacted_send failures on FreeBSD
We're seeing failures for redacted_deleted and redacted_mount
on FreeBSD 13-15:

    09:58:34.74 diff: /dev/fd/3: No such file or directory
    09:58:34.74 ERROR: diff /dev/fd/3 /dev/fd/4 exited 2

The test was trying to diff the file listings between two directories to
see if they are the same.  The workaround is to do a string comparison
of the directory listings instead of using `diff`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16224
2024-07-16 15:46:30 -07:00
Rob Norris 7d8e2a7f73 Linux 5.16: use bdev_nr_bytes() to get device capacity
This helper was introduced long ago, in 5.16. Since 6.10, bd_inode no
longer exists, but the helper has been updated, so detect it and use it
in all versions where it is available.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-16 15:40:29 -07:00
Rob Norris 3ea3649755 Linux 6.10: work harder to avoid kmem_cache_alloc reuse
Linux 6.10 change kmem_cache_alloc to be a macro, rather than a
function, such that the old #undef for it in spl-kmem-cache.c would
remove its definition completely, breaking the build.

This inverts the model used before. Rather than always defining the
kmem_cache_* macro, then undefining then inside spl-kmem-cache.c,
instead we make a special tag to indicate we're currently inside
spl-kmem-cache.c, and not defining those in macros in the first place,
so we can use the kernel-supplied kmem_cache_* functions to implement
spl_kmem_cache_*, as we expect.

For all other callers, we create the macros as normal and remove access
to the kernel's own conflicting names.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-16 15:33:46 -07:00
Rob Norris 0342c4a6b2 Linux 6.10: rework queue limits setup
Linux has started moving to a model where instead of applying block
queue limits through individual modification functions, a complete
limits structure is built up and applied atomically, either when the
block device or open, or some time afterwards. As of 6.10 this
transition appears only partly completed.

This commit matches that model within OpenZFS in a way that should work
for past and future kernels. We set up a queue limits structure with any
limits that have had their modification functions removed. For newer
kernels that can have limits applied at block device open
(HAVE_BLK_ALLOC_DISK_2ARG), we have a conversion function to turn the
OpenZFS queue limits structure into Linux's queue_limits structure,
which can then be passed in. For older kernels, we provide an
application function that just calls the old functions for each limit in
the structure.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-07-16 15:33:37 -07:00
Tony Hutter d7bf0e5259 Linux 6.9: Fix UBSAN errors in zap_micro.c
You can use the UBSAN_SANITIZE_* Kbuild options to exclude certain
kernel objects from the UBSAN checks.  We previously excluded
zap_micro.o with:

UBSAN_SANITIZE_zap_micro.o := n

For some reason that didn't work for the 6.9 kernel, which wants us
to use:

UBSAN_SANITIZE_zfs/zap_micro.o := n

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16278
Closes #16330
2024-07-16 15:33:31 -07:00
Tony Hutter c24a039042 Linux 6.9: Call add_disk() from workqueue to fix zfs_allow_010_pos (#16282)
The 6.9 kernel behaves differently in how it releases block devices.  In
the common case it will async release the device only after the return
to userspace.  This is different from the 6.8 and older kernels which
release the block devices synchronously.  To get around this, call
add_disk() from a workqueue so that the kernel uses a different
codepath to release our zvols in the way we expect.  This stops
zfs_allow_010_pos from hanging.

Fixes: #16089

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
2024-07-16 15:33:23 -07:00
Rob N f4e2aed42a Linux 6.7 compat: detect if kernel defines intptr_t
Since Linux 6.7 the kernel has defined intptr_t. Clang has
-Wtypedef-redefinition by default, which causes the build to fail
because we also have a typedef for intptr_t.

Since its better to use the kernel's if it exists, detect it and skip
our own.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16201
2024-07-16 15:33:17 -07:00
George Amanakis 54ef0fdf60 head_errlog: fix use-after-free
In the commit of the head_errlog feature we introduced a bug in
dsl_dataset_promote_sync(): we may dereference origin_head and hds, both
dereferencing ddpa after calling promote_sync() on ddpa.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #16272
Closes #16273
2024-07-15 09:07:33 -07:00
George Amanakis 2eab4f7b39 Fix assertion in Persistent L2ARC
At the end of l2arc_evict() fix an assertion in the case that l2ad_hand
+ distance == l2ad_end.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #16202
Closes #16207
2024-05-29 13:35:14 -07:00
Alexander Motin 4c0fbd8d6d FreeBSD: Add zfs_link_create() error handling
Originally Solaris didn't expect errors there, but they may happen
if we fail to add entry into ZAP.  Linux fixed it in #7421, but it
was never fully ported to FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13215
Closes #16138
2024-05-29 08:54:19 -07:00
Alexander Motin fa4b1a404e ZAP: Fix leaf references on zap_expand_leaf() errors
Depending on kind of error zap_expand_leaf() may return with or
without valid leaf reference held.  Make sure it returns NULL if
due to error it has no leaf to return.  Make its callers to check
the returned leaf pointer, and release the leaf if it is not NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #12366 
Closes #16159
2024-05-29 08:54:19 -07:00
Alexander Motin 4c484d66b7 Fix ZIL clone records for legacy holes
Previous code overengineered cloned range calculation by using
BP_GET_LSIZE(). The problem is that legacy holes don't have the
logical size, so result will be wrong.  But we also don't need
to look on every block size, since they all must be identical.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16165
2024-05-29 08:54:19 -07:00
Alexander Motin 41f2a9c81f Fix scn_queue races on very old pools
Code for pools before version 11 uses dmu_objset_find_dp() to scan
for children datasets/clones.  It calls enqueue_clones_cb() and
enqueue_cb() callbacks in parallel from multiple taskq threads.
It ends up bad for scan_ds_queue_insert(), corrupting scn_queue
AVL-tree.  Fix it by introducing a mutex to protect those two
scan_ds_queue_insert() calls.  All other calls are done from the
sync thread and so serialized.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16162
2024-05-29 08:54:19 -07:00
Alexander Motin 6724746596 Slightly improve dnode hash
As I understand just for being less predictable dnode hash includes
8 bits of objset pointer, starting at 6.  But since objset_t is
more than 1KB in size, its allocations are likely aligned to 2KB,
that means 11 lower bits provide no entropy. Just take the 8 bits
starting from 11.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16131
2024-05-29 08:54:19 -07:00
Alexander Motin 938d1588eb Make more taskq parameters writable
There is no reason for these module parameters to be read-only.
Being modified they just apply on next pool import/creation, that
is useful for testing different values.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16118
2024-05-29 08:54:19 -07:00
Alexander Motin 0f1e8ba2f8 L2ARC: Cleanup buffer re-compression
When compressed ARC is disabled, we may have to re-compress when
writing into L2ARC.  If doing so we can't fit it into the original
physical size, we should just fail immediately, since even if it
may still fit into allocation size, its checksum will never match.

While there, refactor the code similar to other compression places
without using abd_return_buf_copy().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16038
2024-05-29 08:54:19 -07:00
Alexander Motin b474dfad0d Refactor dbuf_read() for safer decryption
In dbuf_read_verify_dnode_crypt():
 - We don't need original dbuf locked there. Instead take a lock
on a dnode dbuf, that is actually manipulated.
 - Block decryption for a dnode dbuf if it is currently being
written.  ARC hash lock does not protect anonymous buffers, so
arc_untransform() is unsafe when used on buffers being written,
that may happen in case of encrypted dnode buffers, since they
are not copied by dbuf_dirty()/dbuf_hold_copy().

In dbuf_read():
 - If the buffer is in flight, recheck its compression/encryption
status after it is cached, since it may need arc_untransform().

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16104
2024-05-29 08:54:19 -07:00
chenqiuhao1997 9edf6af4ae Replace P2ALIGN with P2ALIGN_TYPED and delete P2ALIGN.
In P2ALIGN, the result would be incorrect when align is unsigned
integer and x is larger than max value of the type of align.
In that case, -(align) would be a positive integer, which means
high bits would be zero and finally stay zero after '&' when
align is converted to a larger integer type.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Qiuhao Chen <chenqiuhao1997@gmail.com>
Closes #15940
2024-05-13 10:27:38 -05:00
Tony Hutter 2566592045 Tag zfs-2.2.4
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-04-30 10:01:15 -07:00
Alan Somers 3d4d61988a Fix updating the zvol_htable when renaming a zvol
When renaming a zvol, insert it into zvol_htable using the new name, not
the old name.  Otherwise some operations won't work.  For example,
"zfs set volsize" while the zvol is open.

Sponsored by:	Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by:	Alan Somers <asomers@FreeBSD.org>
Closes #16127
Closes #16128
2024-04-30 10:01:15 -07:00
Brian Behlendorf 61f3638a34 Add prefetch property
ZFS prefetch is currently governed by the zfs_prefetch_disable
tunable. However, this is a module-wide settings - if a specific
dataset benefits from prefetch, while others have issue with it,
an optimal solution does not exists.

This commit introduce the "prefetch" tri-state property, which enable
granular control (at dataset/volume level) for prefetching.

This patch does not remove the zfs_prefetch_disable, which remains
a system-wide switch for enable/disable prefetch. However, to avoid
duplication, it would be preferable to deprecate and then remove
the module tunable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Co-authored-by: Gionatan Danti <g.danti@assyoma.it>
Closes #15237 
Closes #15436
2024-04-30 10:01:15 -07:00
Don Brady 706307445e vdev probe to slow disk can stall mmp write checker
Simplify vdev probes in the zio_vdev_io_done context to
avoid holding the spa config lock for a long duration.

Also allow zpool clear if no evidence of another host
is using the pool.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15839
2024-04-30 10:01:15 -07:00
Don Brady ea3f7c12a9 Extend import_progress kstat with a notes field
Detail the import progress of log spacemaps as they can take a very
long time.  Also grab the spa_note() messages to, as they provide
insight into what is happening

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Closes #15539
2024-04-29 17:45:53 -07:00
George Wilson 6f323353d2 Add ashift validation when adding devices to a pool
Currently, zpool add allows users to add top-level vdevs that have
different ashifts but doing so prevents users from being able to
perform a top-level vdev removal. Often times consumers may not realize
that they have mismatched ashifts until the top-level removal fails.

This feature adds ashift validation to the zpool add command and will
fail the operation if the sector size of the specified vdev does not
match the existing pool. This behavior can be disabled by using the -f
flag. In addition, new flags have been added to provide fine-grained
control to disable specific checks. These flags
are:

--allow-in-use
--allow-ashift-mismatch
--allow-replicaton-mismatch

The force flag will disable all of these checks.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #15509
2024-04-29 13:50:05 -07:00
Ameer Hamza b3b37b84e8 Fix arcstats for FreeBSD after zfetch support
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16141
2024-04-29 13:50:05 -07:00
Ameer Hamza 4d17e200dd Add zfetch stats in arcstats
arc_summary also reports zfetch stats but it's inconvenient to monitor
contiguously incrementing numbers. Adding them in arcstats allows us to
observe streams more conveniently.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16094
2024-04-29 13:50:05 -07:00
Dag-Erling Smørgrav 5972bb856c Use ASSERT0P() to check that a pointer is NULL.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Closes #15225
2024-04-29 13:50:05 -07:00
Tony Hutter ef3fea63eb GCC: Fixes for gcc 14 on Fedora 40
- Workaround dangling pointer in uu_list.c (#16124)
- Fix calloc() transposed arguments in zpool_vdev_os.c
- Make some temp variables unsigned to prevent triggering a
  '-Werror=alloc-size-larger-than' error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16124
Closes #16125
2024-04-29 13:50:05 -07:00
Brian Behlendorf 71216b91d2 Python 3.12 deprecated python3-distutils
As for python-3.12 the distutils package has been deprecated.
The latest ax_python_devel.m4 macro from the autoconf archive
has been updated accordingly so let's pull in the new version.

We can also drop the changes made to our customized version
to continue if the development version is not installed since
this functionality has been included upstream.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16126
Closes #16129
2024-04-29 13:50:05 -07:00
Todd 284489893b zfs-kmod: fix empty rpm requires/conflicts
Fix an error in zfs-kmod.spec that causes kmod-zfs packages not to
include the correct RPM requires/conflicts relationships.  With this
change applied, RPM correctly no longer allows kmod-zfs & zfs-dkms
packages to be installed together.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Todd Seidelmann <18294602+seidelma@users.noreply.github.com>
Closes #16121
2024-04-29 13:50:05 -07:00
Seth Troisi 6581b17842 ZTS: user_namespace_004.ksh avoid error in cleanup if unsupported
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Seth Troisi <sethtroisi@google.com>
Closes #16114
2024-04-29 13:50:05 -07:00
Seth Troisi 51d3c23150 Add newline to two zpool messages
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Seth Troisi <sethtroisi@google.com>
Closes #16113
2024-04-29 13:50:05 -07:00
Tino Reichardt 16c223eec9 Do no use .cfi_negate_ra_state within the assembly on Arm64
Compiling openzfs on aarch64 with gcc-8 and gcc-9 is failing currently.
See issue #14965 for deeper context.

On platforms without pointer authentication, .cfi_negate_ra_state can be
defined to a no-op:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/aarch64-tdep.c#l1413

I have tested this on Arm64 FreeBSD 13.2 and AlmaLinux-8.

Reviewed-by: Andrew Turner <andrew.turner4@arm.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14965
Closes #15784
2024-04-29 13:50:05 -07:00
Andrew Turner 7aaf6ce9d8 Add the BTI elf note to the AArch64 SHA2 assembly
On ELF platforms there is a note to specify when an application or
library supports BTI. When linking one of these the linker needs
all input object files to have the note. If not it will not include
it in the output file.

Normally the compiler would generate it, but for assembly files we
need to do it our selves.

Add the note to the aarch64 sha256 and sha512 assembly files.

Tested by building with BTI enabled and using the -zbti-report=error
flag to lld that makes it an error if the note is missing.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Turner <andrew.turner4@arm.com>
Closes #16086
2024-04-29 13:50:05 -07:00
Rob N 3f817debb4 AUTHORS: refresh with recent new contributors
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16079
2024-04-29 13:50:05 -07:00
Jason Lee 97889c037a return NULL at end of send_progress_thread
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason Lee <jasonlee@lanl.gov>
Closes #16074
2024-04-29 13:50:05 -07:00
Maxim Filimonov 86b39b41a0 Fix locale-specific time
In `zpool status -t`, scrub date/time is reported using the C locale,
while trim time is reported using the current one. This is inconsistent.
This patch fixes that.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Maxim Filimonov <che@bein.link>
Closes #15878
Closes #15879
2024-04-29 13:50:05 -07:00
Pavel Snajdr 531572b590 Fix panics when truncating/deleting files
There's an union in dbuf_dirty_record_t; dr_brtwrite could evaluate
to B_TRUE if the dirty record is of another type than dl. Adding
more explicit dr type check before trying to access dr_brtwrite.

Fixes two similar panics:

[ 1373.806119] VERIFY0(db->db_level) failed (0 == 1)
[ 1373.807232] PANIC at dbuf.c:2549:dbuf_undirty()
[ 1373.814979]  dump_stack_lvl+0x71/0x90
[ 1373.815799]  spl_panic+0xd3/0x100 [spl]
[ 1373.827709]  dbuf_undirty+0x62a/0x970 [zfs]
[ 1373.829204]  dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs]
[ 1373.831010]  dnode_free_range+0x532/0x1220 [zfs]
[ 1373.833922]  dmu_free_long_range+0x4e0/0x930 [zfs]
[ 1373.835277]  zfs_trunc+0x75/0x1e0 [zfs]
[ 1373.837958]  zfs_freesp+0x9b/0x470 [zfs]
[ 1373.847236]  zfs_setattr+0x161a/0x3500 [zfs]
[ 1373.855267]  zpl_setattr+0x125/0x320 [zfs]
[ 1373.856725]  notify_change+0x1ee/0x4a0
[ 1373.859207]  do_truncate+0x7f/0xd0
[ 1373.859968]  do_sys_ftruncate+0x28e/0x2e0
[ 1373.860962]  do_syscall_64+0x38/0x90
[ 1373.861751]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

[ 1822.381337] VERIFY0(db->db_level) failed (0 == 1)
[ 1822.382376] PANIC at dbuf.c:2549:dbuf_undirty()
[ 1822.389232]  dump_stack_lvl+0x71/0x90
[ 1822.389920]  spl_panic+0xd3/0x100 [spl]
[ 1822.399567]  dbuf_undirty+0x62a/0x970 [zfs]
[ 1822.400583]  dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs]
[ 1822.401752]  dnode_free_range+0x532/0x1220 [zfs]
[ 1822.402841]  dmu_object_free+0x74/0x120 [zfs]
[ 1822.403869]  zfs_znode_delete+0x75/0x120 [zfs]
[ 1822.404906]  zfs_rmnode+0x3f6/0x7f0 [zfs]
[ 1822.405870]  zfs_inactive+0xa3/0x610 [zfs]
[ 1822.407803]  zpl_evict_inode+0x3e/0x90 [zfs]
[ 1822.408831]  evict+0xc1/0x1c0
[ 1822.409387]  do_unlinkat+0x147/0x300
[ 1822.410060]  __x64_sys_unlinkat+0x33/0x60
[ 1822.410802]  do_syscall_64+0x38/0x90
[ 1822.411458]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #15983
2024-04-29 13:50:05 -07:00
Alek P 74101f7e2a vdev props comment and manpage should include zfsd and FreeBSD mentions
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #15968
2024-04-29 13:50:05 -07:00
Don Brady c1c26a77ff Add slow disk diagnosis to ZED
Slow disk response times can be indicative of a failing drive. ZFS
currently tracks slow I/Os (slower than zio_slow_io_ms) and generates
events (ereport.fs.zfs.delay).  However, no action is taken by ZED,
like is done for checksum or I/O errors.  This change adds slow disk
diagnosis to ZED which is opt-in using new VDEV properties:
  VDEV_PROP_SLOW_IO_N
  VDEV_PROP_SLOW_IO_T

If multiple VDEVs in a pool are undergoing slow I/Os, then it skips
the zpool_vdev_degrade().

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15469
2024-04-29 13:50:05 -07:00
Tony Hutter db65272aef [2.2.4-only] Stub RAIDZ enums to prevent conflicts
Stub in the RAIDZ expansions enums for now so that the slow IO
commit merges cleanly.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-04-29 13:50:05 -07:00
Rob N da88fc4ac9 zap_leaf: make l_hash[] variable length to silence UBSAN
When UBSAN is active and OpenZFS is a debug build, the l_hash assert at
the bottom of zap_open_leaf() causes UBSAN to complain.

This follows the example in 786641dcf to shut it up.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15964
2024-04-29 13:50:05 -07:00
Paul Dagnelie 889152ce4a Give a better message from 'zpool get' with invalid pool name
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15942
2024-04-29 13:50:05 -07:00
Rob N 5d859a2e22 xdr: header cleanup
#16047 notes that include/os/freebsd/spl/rpc/xdr.h carried an
(apparently) incompatible license. While looking into it, it seems that
this file is actually unnecessary these days - FreeBSD's kernel XDR has
XDR_CONTROL, xdrmem_control and XDR_GET_BYTES_AVAIL, while userspace has
XDR_CONTROL and xdrmem_control, and our implementation of
XDR_GET_BYTES_AVAIL for libspl works nicely with it. So this removes
that file outright.

To keep the includes in nvpair.c tidy, I've made a few small adjustments
to the Linux headers. By definition, rpc/types.h provides bool_t and is
included before rpc/xdr.h, so I've created rpc/types.h for Linux. This
isn't necessary for userspace; both FreeBSD native and tirpc on Linux
already have these headers set up correctly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16047 
Closes #16051
2024-04-29 13:50:05 -07:00
Robert Evans e0cfa1592d Fix buffer underflow if sysfs file is empty
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Jason Lee <jasonlee@lanl.gov>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16028
Closes #16035
2024-04-29 13:50:05 -07:00
Robert Evans d088fb7d24 ZTS: fix flakiness in cp_files_002_pos
Fix RANDOM to not return zero.

Overwriting with `dd ... count=0` does not test anything.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16029
2024-04-29 13:50:05 -07:00
Cameron Harr 67995229a8 Fix option string, adding -e and fixing order
The recently added '-e' option (PR #15769) missed adding the
new option in the online `zpool status` help command. This
adds the options and reorders a couple of the other options
that were not listed alphabetically.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #16008
2024-04-29 13:50:05 -07:00
Rob N 2ff09e8fed freebsd: fix missing headers in distribution tarball
arc_os.h and freebsd_event.h aren't included in release tarballs, so the
build fails on FreeBSD. This fixes it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15963
2024-04-29 13:50:05 -07:00
Brian Behlendorf 9f1d3db730 Check for minimum partition size
On Linux block devices used for vdevs will by partitioned.  The block
device must be large enough for an 64M partition starting at offset
of 2048 sectors (part1), and a second 64M reserved partition at the
end of the device (part9).

This commit adds a capacity check when creating the GPT label to
immediately detect a device which is too small.  With the existing
code this would be caught slightly latter when attempting to use
the partition.  Catching it sooner let's us print a more useful error.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15898
2024-04-29 13:50:05 -07:00
Dag-Erling Smørgrav 5dda8c0910 Add VERIFY0P() and ASSERT0P() macros.
These macros are similar to VERIFY0() and ASSERT0() but are intended
for pointers, and therefore use uintptr_t instead of int64_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Closes #15225
2024-04-29 13:50:05 -07:00
Dag-Erling Smørgrav d6da6cbd74 Clean up existing VERIFY*() macros.
Chiefly:

- Remove unnecessary parentheses around variable names.
- Remove spaces between the type and variable in casts.
- Make the panic message for VERIFY0() reflect how the macro is used.
- Use %p to format pointers, except in Linux kernel code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Closes #15225
2024-04-22 13:32:33 -07:00
Benda Xu 6732e223bf etc/init.d: decide which variant to use at build time.
Let Debian use the sysv-rc variant of the script, even when OpenRC is
installed. Unlike on Gentoo, OpenRC on Debian consumes both the
sysv-rc scripts and OpenRC ones. ZFS initscripts on Debian should be
the sysv-rc version to provide most compatibility and to integrate
with the rest of initscripts for dependency tracking.

Restrict the substitution in the Makefile to the dedicated list.

This construct is inspired by Mo Zhou's detection of the execution
shell and follows the strategy of Peter in 6ef28c526b.

As of 2024, the initscripts are mostly relevant on Debian, Gentoo and
their derivatives.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Benda Xu <orv@debian.org>
Issue #8063
Issue #8204
Issue #8359
Closes #15977
2024-04-22 09:28:06 -07:00
Benda Xu baaac31655 config/Substfiles.am: restrict to the dedicated list.
We recover the scope of $(SUBSTFILES) to explicitly control what files
are being generated from the corresponding .in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Benda Xu <orv@debian.org>
Closes #15980
2024-04-22 09:28:06 -07:00
Shengqi Chen b0b0d07b13 man: move zfs_prepare_disk.8 to nodist_man_MANS
The commit b53077a added zfs_prepare_disk.8 to the wrong list
dist_man_MANS, in which @zfsexecdir@ will not be properly substituted.
This leads to wrong path in the manpage in generated release tarballs.

Reported-by: Benda Xu <orv@debian.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15979
2024-04-22 09:28:06 -07:00
Umer Saleem 8a56047135 Add support for zfs mount -R <filesystem>
This commit adds support for mounting a dataset along with all of
it's children with '-R' flag for zfs mount. There can be scenarios
where we want to mount all datasets under one hierarchy instead of
mounting all datasets present on system with '-a' flag.

'-R' flag should work on all root and non-root datasets. Usage
information and man page has been updated for zfs mount. A test
for verifying the behavior for '-R' flag is also added.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16015
2024-04-22 09:28:06 -07:00
Rob Norris 9a7ef02f4d Linux 6.9 compat: blk_alloc_disk() now takes two args
There's an extra nullable arg for queue limits. Detect it, and set it to
NULL. Similar change for blk_mq_alloc_disk(), now three args, same
treatment.

Error return now has error encoded in the return, so detect with
IS_ERR() and explicitly NULL our own return.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16027
Closes #16033
2024-04-22 09:23:23 -07:00
Rob Norris 3bd7cd06b7 Linux 6.9 compat: bdev handles are now struct file
bdev_open_by_path() is replaced by bdev_file_open_by_path(), which
returns a plain old struct file*. Release function is gone entirely; the
regular file release function fput() will take care of the bdev
specifics.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16027
Closes #16033
2024-04-22 09:23:23 -07:00
Rob N b9c3040b10 vdev_disk: clean up spa/bdev mode conversion
43e8f6e37 introduced a subtle API misuse, in that it passed the output
from vdev_bdev_mode() back into itself. Fortunately, the
SPA_MODE_(READ|WRITE) bit values exactly map to the FMODE_(READ|WRITE) &
BLK_OPEN_(READ|WRITE) bit values, so it didn't result in a bug, but it
was hard to read and understand, so I cleaned it up.

In doing so, I noticed that the only call to vdev_bdev_mode() without
the "exclusive" flag set was in that misuse, and actually, we never do a
non-exclusive blkdev_get_by_path(). So I've just made exclusive be
always-on.


Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15995
2024-04-22 09:23:23 -07:00
Robert Evans 5dbed50429 Linux 5.18+ compat: Detect filemap_range_has_page
In v5.18 `filemap_range_has_page` moved to `pagemap.h`

`pagemap.h` has been around since 3.10 so just include both

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16034
2024-04-22 09:23:23 -07:00
Fabian-Gruenbichler 3fb0942cc5 udev: correctly handle partition #16 and later
If a zvol has more than 15 partitions, the minor device number exhausts
the slot count reserved for partitions next to the zvol itself. As a
result, the minor number cannot be used to determine the partition
number for the higher partition, and doing so results in wrong named
symlinks being generated by udev.

Since the partition number is encoded in the block device name anyway,
let's just extract it from there instead.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #15904
Closes #15970
2024-04-22 09:23:23 -07:00
Fabian-Gruenbichler fa2cbd4007 zvols: prevent overflow of minor device numbers
currently, the linux kernel allows 2^20 minor devices per major device
number.  ZFS reserves blocks of 2^4 minors per zvol: 1 for the zvol
itself, the other 15 for the first partitions of that zvol. as a result,
only 2^16 such blocks are available for use.

there are no checks in place to avoid overflowing into the major device
number when more than 2^16 zvols are allocated (with volmode=dev or
default). instead of ignoring this limit, which comes with all sorts of
weird knock-on effects, detect this situation and simply fail allocating
the zvol block device early on.

without this safeguard, the kernel will reject the attempt to create an
already existing block device, but ZFS doesn't handle this error and
gets confused about which zvol occupies which minor slot, potentially
resulting in kernel NULL derefs and other issues later on.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #16006
2024-04-22 09:23:23 -07:00
Tony Hutter bb9542a2a0 Linux 6.8 compat: META (#16099)
Update the META file to reflect compatibility with the 6.8 kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
2024-04-22 09:23:23 -07:00
Rob N 72e4996a54 bdev_discard_supported: understand discard_granularity=0
Kernel documentation for the discard_granularity property says:

    A discard_granularity of 0 means that the device does not support
    discard functionality.

Some older kernels had drivers (notably loop, but also some USB-SATA
adapters) that would set the QUEUE_FLAG_DISCARD capability flag, but
have discard_granularity=0. Since 5.10 (torvalds/linux@b35fd7422c) the
discard entry point blkdev_issue_discard() has had a check for this,
which would immediately reject the call with EOPNOTSUPP, and throw a
scary diagnostic message into the log. See #16068.

Since 6.8, the block layer sets a non-zero default for
discard_granularity (torvalds/linux@3c407dc723), and a future kernel
will remove the check entirely[1].

As such, there's no good reason for us to enable discard when
discard_granularity=0. The kernel will never let the request go in
anyway; better that we just disable it so we can report it properly to
the user.

1. https://patchwork.kernel.org/project/linux-block/patch/20240312144826.1045212-2-hch@lst.de/

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit b181b2e604)
2024-04-19 10:19:53 -07:00
Alexander Motin 575872cc37 L2ARC: Relax locking during write
Previous code held ARC state sublist lock throughout all L2ARC
write process, which included number of allocations and even ZIO
issues.  Being blocked in any of those places the code could also
block ARC eviction, that could cause OOM activation or even dead-
lock if system is low on memory or one is too fragmented.

Fix it by dropping the lock as soon as we see a block eligible
for L2ARC writing and pick it up later using earlier inserted
marker.  While there, also reduce scope of hash lock, moving
ZIO allocation and other operations not requiring header access
out of it.  All operations requiring header access move under
hash lock, since L2_WRITING flag does not prevent header eviction
only transition to arc_l2c_only state with L1 header.

To be able to manipulate sublist lock and marker as needed add few
more multilist functions and modify one.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16040
2024-04-19 10:13:38 -07:00
Alexander Motin f4ce02ae42 Small fix to prefetch ranges aggregation
When after #16022 adding new range we aggregate more than two
existing ranges, that should be very rare, only if several streams
overlap, we may need to zero not the last range, but some earlier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16072
2024-04-19 10:13:38 -07:00
Alexander Motin 97d7228f42 Remove db_state DB_NOFILL checks from syncing context
Syncing context should not depend on current state of dbuf, which
could already change several times in later transaction groups,
but rely solely on dirty record for the transaction group being
synced. Some of the checks seem already impossible, while instead
of others I think we should better check for absence of data in
the specific dirty record rather than DB_NOFILL.

Reviewed-by: Robert Evans <evansr@google.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16057
2024-04-19 10:13:38 -07:00
Alexander Motin 026fe79646 Speculative prefetch for reordered requests
Before this change speculative prefetcher was able to detect a stream
only if all of its accesses are perfectly sequential.  It was easy to
implement and is perfectly fine for single-threaded applications.
Unfortunately multi-threaded network servers, such as iSCSI, SMB or
NFS usually have plenty of threads and may often reorder requests,
preventing successful speculation and prefetch.

This change allows speculative prefetcher to detect streams even if
requests are reordered by introducing a list of 9 non-contiguous
ranges up to 16MB ahead of current stream position and filling the
gaps as more requests arrive.  It also allows stream to proceed
even with holes up to a certain configurable threshold (25%).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16022
2024-04-19 10:13:38 -07:00
Alexander Motin 602b5dca7b Fix read errors race after block cloning
Investigating read errors triggering panic fixed in #16042 I've
found that we have a race in a sync process between the moment
dirty record for cloned block is removed and the moment dbuf is
destroyed.  If dmu_buf_hold_array_by_dnode() take a hold on a
cloned dbuf before it is synced/destroyed, then dbuf_read_impl()
may see it still in DB_NOFILL state, but without the dirty record.
Such case is not an error, but equivalent to DB_UNCACHED, since
the dbuf block pointer is already updated by dbuf_write_ready().
Unfortunately it is impossible to safely change the dbuf state
to DB_UNCACHED there, since there may already be another cloning
in progress, that dropped dbuf lock before creating a new dirty
record, protected only by the range lock.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Robert Evans <evansr@google.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16052
2024-04-19 10:13:38 -07:00
Alexander Motin d5fb6abd36 Improve dbuf_read() error reporting
Previous code reported non-ZIO errors only via return value, but
not via parent ZIO.  It could cause NULL-dereference panics due
to dmu_buf_hold_array_by_dnode() ignoring the return value,
relying solely on parent ZIO status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reported by:	Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16042
2024-04-19 10:13:38 -07:00
Alexander Motin 39993c3dfe BRT: Check pool clone stats in more tests
This should allow to catch some leaks, if those happen.

While there fix some cosmetic issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007
2024-04-19 10:13:38 -07:00
Alexander Motin e3c1c9153f BRT: Fix tests to work on non-empty pools
It should not normally happen, but if it does, better to not fail
everything for no good reason, or it may be hard to debug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007
2024-04-19 10:13:38 -07:00
Alexander Motin 2ea370a4e3 BRT: Fix holes cloning.
- When reading L0 block pointers handle buffers without ones and
without dirty records as a holes.  Those appear when dnode size
was increased, but the end was never written, so there are no new
indirection levels to store the pointers.  It makes no sense to
return EAGAIN here, since sync won't create new indirection levels
until there will be actual writes.
 - When cloning blocks set destination hole logical birth time
to the current TXG.  Otherwise if we are cloning over existing
data, newly created holes may not be properly replicated later.
Use BP_SET_BIRTH() when possible to not replicate its logic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15994
Closes #16007
2024-04-19 10:13:38 -07:00
Alexander Motin 3e91a9c525 BRT: Skip getting length in brt_entry_lookup()
Unlike DDT, where ZAP values may have different lengths due to
compression, all BRT entries are identical 8-byte counters.  It
does not make sense to first fetch the length only to assert it.
zap_lookup_uint64() is specifically designed to work with counters
of different size and should return error if something odd found.
Calling it straight allows to save some measurable CPU time.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15950
2024-04-19 10:13:38 -07:00
Alexander Motin c94f730078 BRT: Make BRT block sizes configurable
Similar to DDT make BRT data and indirect block sizes configurable
via module parameters.  I am not sure what would be the best yet,
but similar to DDT 4KB blocks kill all chances of compression on
vdev with ashift=12 or more, that on my tests reaches 3x.

While here, fix documentation for respective DDT parameters.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15967
2024-04-19 10:13:38 -07:00
Alexander Motin 457e62d7ca BRT: Relax brt_pending_apply() locking
Since brt_pending_apply() is running in syncing context, no other
brt_pending_tree accesses are possible for the TXG.  We don't need
to acquire brt_pending_lock here.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15955
2024-04-19 10:13:38 -07:00
Alexander Motin 19bf54b764 ZAP: Massively switch to _by_dnode() interfaces
Before this change ZAP called dnode_hold() for almost every block
access, that was clearly visible in profiler under heavy load, such
as BRT.  This patch makes it always hold the dnode reference between
zap_lockdir() and zap_unlockdir().  It allows to avoid most of dnode
operations between those.  It also adds several new _by_dnode() APIs
to ZAP and uses them in BRT code.  Also adds dmu_prefetch_by_dnode()
variant and uses it in the ZAP code.

After this there remains only one call to dmu_buf_dnode_enter(),
which seems to be unneeded.  So remove the call and the functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15951
2024-04-19 10:13:38 -07:00
Alexander Motin fdd8c0aea1 BRT: Skip duplicate BRT prefetches
If there is a pending entry for this block, then we've already
issued BRT prefetch for it within this TXG, so don't do it again.
BRT vdev lookup and following zap_prefetch_uint64() call can be
pretty expensive and should be avoided when not necessary.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15941
2024-04-19 10:13:38 -07:00
Alexander Motin dced953b62 ZAP: Some cleanups/micro-optimizations
- Remove custom zap_memset(), use regular memset().
- Use PANIC() instead of opaque cmn_err(CE_PANIC).
- Provide entry parameter to zap_leaf_rehash_entry().
- Reduce branching in zap_leaf_array_create() inner loop.
- Remove signedness where it should not be.

Should be no function changes.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15976
2024-04-19 10:13:38 -07:00
Alexander Motin f7c1db6366 BRT: Change brt_pending_tree sorting order
It does not look important how exactly brt_pending_tree is sorted.
When cloning large file, it is quite likely that all of its blocks
have identical physical birth times, so comparing them first does
not provide useful entropy, while accesses additional cache line.
In most cases combination of vdev and offset provides unique result
and physical birth time comparison is not even needed.  Meanwhile,
when traversing the tree inside brt_pending_apply(), it can be
beneficial for dbuf cache and CPU cache hits to group processing
by vdev and so by the per-VDEV BRT ZAPs.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15954
2024-04-19 10:13:38 -07:00
Alexander Motin fa5de0c5cd Update resume token at object receive.
Before this change resume token was updated only on data receive.
Usually it is enough to resume replication without much overlap.
But we've got a report of a curios case, where replication source
was traversed with recursive grep, which through enabled atime
modified every object without modifying any data.  It produced
several gigabytes of replication traffic without a single data
write and so without a single resume point.

While the resume token was not designed to resume from an object,
I've found that the send implementation always sends object before
any data. So by requesting resume from offset 0 we are effectively
resuming from the object, followed (or not) by the data at offset
0, just as we need it.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15927
2024-04-19 10:13:38 -07:00
Alexander Motin 793a2cff2a Linux: Cleanup taskq threads spawn/exit
This changes taskq_thread_should_stop() to limit maximum exit rate
for idle threads to one per 5 seconds.  I believe the previous one
was broken, not allowing any thread exits for tasks arriving more
than one at a time and so completing while others are running.

Also while there:
 - Remove taskq_thread_spawn() calls on task allocation errors.
 - Remove extra taskq_thread_should_stop() call.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15873
2024-04-19 10:13:38 -07:00
Alexander Motin fdd97e0093 Refactor dmu_prefetch().
- Split dmu_prefetch_dnode() from dmu_prefetch() into a separate
function.  It is quite inconvenient to read the code where len = 0
means dnode prefetch instead indirect/data prefetch.  One function
doing both has no benefits, since the code paths are independent.
 - Improve dmu_prefetch() handling of long block ranges.  Instead
of limiting L0 data length to prefetch for to dmu_prefetch_max,
make dmu_prefetch_max limit the actual amount of prefetch at the
specified level, and, if there is more, prefetch all the rest at
higher indirection level.  It should improve random access times
within the prefetched range of any length, reducing importance of
specific dmu_prefetch_max value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15076
2024-04-19 10:13:38 -07:00
Alexander Motin 3b8817db96 ZIL: Update Linux tracing after #15635
While picking parts from #14909 I've missed Linux tracing specific
ones, that went unnoticed in default configurations, but breaks the
build in some.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15730
2024-04-19 10:13:38 -07:00
Alexander Motin 25ea8ce94b ZIL: Improve next log block size prediction
Track history in context of bursts, not individual log blocks. It
allows to not blow away all the history by single large burst of
many block, and same time allows optimizations covering multiple
blocks in a burst and even predicted following burst.  For each
burst account its optimal block size and minimal first block size.
Use that statistics from the last 8 bursts to predict first block
size of the next burst.

Remove predefined set of block sizes. Allocate any size we see fit,
multiple of 4KB, as required by ZIL now.  With compression enabled
by default, ZFS already writes pretty random block sizes, so this
should not surprise space allocator any more.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15635
2024-04-19 10:13:38 -07:00
Alexander Motin 8b1a132de7 ZIO: Optimize zio_flush()
- Generalize vdev_nowritecache handling by traversing through the
VDEV tree and skipping children ZIOs where not supported.
 - Remove intermediate zio_null() in case of several VDEV children.
 - Remove children handling from zio_ioctl().  There are no other
use cases for this code beside DKIOCFLUSHWRITECACHED, and would there
be, I doubt they would so straightforward apply to all VDEV children.

Comparing to removed previous optimization this should improve cases
of redundant ZILs/SLOGs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15515
2024-04-19 10:13:38 -07:00
Alexander Motin 7ea8331009 ZIL: Detect single-threaded workloads
... by checking that previous block is fully written and flushed.
It allows to skip commit delays since we can give up on aggregation
in that case.  This removes zil_min_commit_timeout parameter, since
for single-threaded workloads it is not needed at all, while on very
fast devices even some multi-threaded workloads may get detected as
single-threaded and still bypass the wait.  To give multi-threaded
workloads more aggregation chances increase zfs_commit_timeout_pct
from 5 to 10%, as they should suffer less from additional latency.

Also single-threaded workloads detection allows in perspective better
prediction of the next block size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15381
2024-04-19 10:13:38 -07:00
Rob N 3c5f354a8c zvol_os: fix compile with blk-mq on Linux 4.x
99741bde5 accesses a cached blk-mq hardware context through the mq_hctx
field of struct request. However, this field did not exist until 5.0.
Before that, the private function blk_mq_map_queue() was used to dig it
out of broader queue context. This commit detects this situation, and
handles it with a poor-man's simulation of that function.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16069
2024-04-17 10:10:24 -07:00
Rob N 5c0fe099ec zvol_os: fix build on Linux <3.13
99741bde5 introduced zvol_num_taskqs, but put it behind the HAVE_BLK_MQ
define, preventing builds on versions of Linux that don't have it
(<3.13, incl EL7).

Nothing about it seems dependent on blk-mq, so this just moves it out
from behind that define and so fixes the build.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16062
2024-04-17 10:10:24 -07:00
Ameer Hamza 5fc134ff2f zvol: use multiple taskq
Currently, zvol uses a single taskq, resulting in throughput bottleneck
under heavy load due to lock contention on the single taskq. This patch
addresses the performance bottleneck under heavy load conditions by
utilizing multiple taskqs, thus mitigating lock contention. The number
of taskqs scale dynamically based on the available CPUs in the system,
as illustrated below:

                taskq   total
cpus    taskqs  threads threads
------- ------- ------- -------
1       1       32       32
2       1       32       32
4       1       32       32
8       2       16       32
16      3       11       33
32      5       7        35
64      8       8        64
128     11      12       132
256     16      16       256

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15992
2024-04-17 10:10:24 -07:00
Rob Norris 7ad2616d37 vdev_disk: fix alignment check when buffer has non-zero starting offset
If a linear buffer spans multiple pages, and the first page has a
non-zero starting offset, the checker would not include the offset, and
so would think there was an alignment gap at the end of the first page,
rather than at the start.

That is, for a 16K buffer spread across five pages with an initial 512B
offset:

    [.XXXXXXX][XXXXXXXX][XXXXXXXX][XXXXXXXX][XXXXXXX.]

It would be interpreted as:

    [XXXXXXX.][XXXXXXXX]...

And be rejected as misaligned.

Since it's already a linear ABD, the "linearising" copy would just reuse
the buffer as-is, and the second check would failing, tripping the
VERIFY in vdev_disk_io_rw().

This commit fixes all this by including the offset in the check for
end-of-page alignment.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 1bf649cb0a)
2024-04-12 08:53:48 -07:00
Rob N d0d9dccc61 vdev_disk: ensure trim errors are returned immediately
After 08fd5ccc3, the discard issuing code was organised such that if
requesting an async discard or secure erase failed before the IO was
issued (that is, calling __blkdev_issue_discard() returned an error),
the failed zio would never be executed, resulting in txg_sync hanging
forever waiting for IO to finish.

This commit fixes that by immediately executing a failed zio on error.
To handle the successful synchronous op case, we fake an async op by,
when not using an asynchronous submission method, queuing the successful
result zio as part of the discard handler.

Since it was hard to understand the differences between discard and
secure erase, and sync and async, across different kernel versions, I've
commented and reorganised the code a bit to try and make everything more
contained and linear.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit ba9f587a77)
2024-04-11 12:25:40 -07:00
Rob Norris 28520cad25 vdev_disk: don't touch vbio after its handed off to the kernel
After IO is unplugged, it may complete immediately and vbio_completion
be called on interrupt context. That may interrupt or deschedule our
task. If its the last bio, the vbio will be freed. Then, we get
rescheduled, and try to write to freed memory through vbio->.

This patch just removes the the cleanup, and the corresponding assert.
These were leftovers from a previous iteration of vbio_submit() and were
always "belt and suspenders" ops anyway, never strictly required.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc
Reported-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 917ff75e95)
2024-04-08 10:13:55 -07:00
Robert Evans deb7a84231 Fix corruption caused by mmap flushing problems
1) Make mmap flushes synchronous. Linux may skip flushing dirty pages
   already in writeback unless data-integrity sync is requested.

2) Change zfs_putpage to use TXG_WAIT. Otherwise dirty pages may be
   skipped due to DMU pushing back on TX assign.

3) Add missing mmap flush when doing block cloning.

4) While here, pass errors from putpage to writepage/writepages.

This change fixes corruption edge cases, but unfortunately adds
synchronous ZIL flushes for dirty mmap pages to llseek and bclone
operations. It may be possible to avoid these sync writes later
but would need more tricky refactoring of the writeback code.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #15933 
Closes #16019
2024-03-29 17:10:04 -07:00
Rob Norris eebf00bee9 vdev_disk: default to classic submission for 2.2.x
We don't want to change to brand-new code in the middle of a stable
series, but we want it available to test for people running into page
splitting issues.

This commits make zfs_vdev_disk_classic=1 the default, and updates the
documentation to better explain what's going on.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
2024-03-28 13:29:46 -07:00
Rob Norris d0b3be763f abd_iter_page: don't use compound heads on Linux <4.5
Before 4.5 (specifically, torvalds/linux@ddc58f2), head and tail pages
in a compound page were refcounted separately. This means that using the
head page without taking a reference to it could see it cleaned up later
before we're finished with it. Specifically, bio_add_page() would take a
reference, and drop its reference after the bio completion callback
returns.

If the zio is executed immediately from the completion callback, this is
usually ok, as any data is referenced through the tail page referenced
by the ABD, and so becomes "live" that way. If there's a delay in zio
execution (high load, error injection), then the head page can be freed,
along with any dirty flags or other indicators that the underlying
memory is used. Later, when the zio completes and that memory is
accessed, its either unmapped and an unhandled fault takes down the
entire system, or it is mapped and we end up messing around in someone
else's memory. Both of these are very bad.

The solution on these older kernels is to take a reference to the head
page when we use it, and release it when we're done. There's not really
a sensible way under our current structure to do this; the "best" would
be to keep a list of head page references in the ABD, and release them
when the ABD is freed.

Since this additional overhead is totally unnecessary on 4.5+, where
head and tail pages share refcounts, I've opted to simply not use the
compound head in ABD page iteration there. This is theoretically less
efficient (though cleaning up head page references would add overhead),
but its safe, and we still get the other benefits of not mapping pages
before adding them to a bio and not mis-splitting pages.

There doesn't appear to be an obvious symbol name or config option we
can match on to discover this behaviour in configure (and the mm/page
APIs have changed a lot since then anyway), so I've gone with a simple
version check.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit c6be6ce175)
2024-03-28 13:29:46 -07:00
Rob Norris cb599d27ed vdev_disk: use bio_chain() to submit multiple BIOs
Simplifies our code a lot, so we don't have to wait for each and
reassemble them.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit 72fd834c47)
2024-03-28 13:29:46 -07:00
Rob Norris af3a5bb40d vdev_disk: add module parameter to select BIO submission method
This makes the submission method selectable at module load time via the
`zfs_vdev_disk_classic` parameter, allowing this change to be backported
to 2.2 safely, and disabled in favour of the "classic" submission method
if new problems come up.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit df2169d141)
2024-03-28 13:29:46 -07:00
Rob Norris 51c2bd0def vdev_disk: rewrite BIO filling machinery to avoid split pages
This commit tackles a number of issues in the way BIOs (`struct bio`)
are constructed for submission to the Linux block layer.

The kernel has a hard upper limit on the number of pages/segments that
can be added to a BIO, as well as a separate limit for each device
(related to its queue depth and other scheduling characteristics).

ZFS counts the number of memory pages in the request ABD
(`abd_nr_pages_off()`, and then uses that as the number of segments to
put into the BIO, up to the hard upper limit. If it requires more than
the limit, it will create multiple BIOs.

Leaving aside the fact that page count method is wrong (see below), not
limiting to the device segment max means that the device driver will
need to split the BIO in half. This is alone is not necessarily a
problem, but it interacts with another issue to cause a much larger
problem.

The kernel function to add a segment to a BIO (`bio_add_page()`) takes a
`struct page` pointer, and offset+len within it. `struct page` can
represent a run of contiguous memory pages (known as a "compound page").
In can be of arbitrary length.

The ZFS functions that count ABD pages and load them into the BIO
(`abd_nr_pages_off()`, `bio_map()` and `abd_bio_map_off()`) will never
consider a page to be more than `PAGE_SIZE` (4K), even if the `struct
page` is for multiple pages. In this case, it will load the same `struct
page` into the BIO multiple times, with the offset adjusted each time.

With a sufficiently large ABD, this can easily lead to the BIO being
entirely filled much earlier than it could have been. This is also
further contributes to the problem caused by the incorrect segment limit
calculation, as its much easier to go past the device limit, and so
require a split.

Again, this is not a problem on its own.

The logic for "never submit more than `PAGE_SIZE`" is actually a little
more subtle. It will actually never submit a buffer that crosses a 4K
page boundary.

In practice, this is fine, as most ABDs are scattered, that is a list of
complete 4K pages, and so are loaded in as such.

Linear ABDs are typically allocated from slabs, and for small sizes they
are frequently not aligned to page boundaries. For example, a 12K
allocation can span four pages, eg:

     -- 4K -- -- 4K -- -- 4K -- -- 4K --
    |        |        |        |        |
          :## ######## ######## ######:    [1K, 4K, 4K, 3K]

Such an allocation would be loaded into a BIO as you see:

    [1K, 4K, 4K, 3K]

This tends not to be a problem in practice, because even if the BIO were
filled and needed to be split, each half would still have either a start
or end aligned to the logical block size of the device (assuming 4K at
least).

---

In ideal circumstances, these shortcomings don't cause any particular
problems. Its when they start to interact with other ZFS features that
things get interesting.

Aggregation will create a "gang" ABD, which is simply a list of other
ABDs. Iterating over a gang ABD is just iterating over each ABD within
it in turn.

Because the segments are simply loaded in order, we can end up with
uneven segments either side of the "gap" between the two ABDs. For
example, two 12K ABDs might be aggregated and then loaded as:

    [1K, 4K, 4K, 3K, 2K, 4K, 4K, 2K]

Should a split occur, each individual BIO can end up either having an
start or end offset that is not aligned to the logical block size, which
some drivers (eg SCSI) will reject. However, this tends not to happen
because the default aggregation limit usually keeps the BIO small enough
to not require more than one split, and most pages are actually full 4K
pages, so hitting an uneven gap is very rare anyway.

If the pool is under particular memory pressure, then an IO can be
broken down into a "gang block", a 512-byte block composed of a header
and up to three block pointers. Each points to a fragment of the
original write, or in turn, another gang block, breaking the original
data up over and over until space can be found in the pool for each of
them.

Each gang header is a separate 512-byte memory allocation from a slab,
that needs to be written down to disk. When the gang header is added to
the BIO, its a single 512-byte segment.

Pulling all this together, consider a large aggregated write of gang
blocks. This results a BIO containing lots of 512-byte segments. Given
our tendency to overfill the BIO, a split is likely, and most possible
split points will yield a pair of BIOs that are misaligned. Drivers that
care, like the SCSI driver, will reject them.

---

This commit is a substantial refactor and rewrite of much of `vdev_disk`
to sort all this out.

`vdev_bio_max_segs()` now returns the ideal maximum size for the device,
if available. There's also a tuneable `zfs_vdev_disk_max_segs` to
override this, to assist with testing.

We scan the ABD up front to count the number of pages within it, and to
confirm that if we submitted all those pages to one or more BIOs, it
could be split at any point with creating a misaligned BIO.  If the
pages in the BIO are not usable (as in any of the above situations), the
ABD is linearised, and then checked again. This is the same technique
used in `vdev_geom` on FreeBSD, adjusted for Linux's variable page size
and allocator quirks.

`vbio_t` is a cleanup and enhancement of the old `dio_request_t`. The
idea is simply that it can hold all the state needed to create, submit
and return multiple BIOs, including all the refcounts, the ABD copy if
it was needed, and so on. Apart from what I hope is a clearer interface,
the major difference is that because we know how many BIOs we'll need up
front, we don't need the old overflow logic that would grow the BIO
array, throw away all the old work and restart. We can get it right from
the start.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit 06a196020e)
2024-03-28 13:29:46 -07:00
Rob Norris 03ff875e09 vdev_disk: make read/write IO function configurable
This is just setting up for the next couple of commits, which will add a
new IO function and a parameter to select it.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit c4a13ba483)
2024-03-28 13:29:46 -07:00
Rob Norris 13b5348848 vdev_disk: reorganise vdev_disk_io_start
Light reshuffle to make it a bit more linear to read and get rid of a
bunch of args that aren't needed in all cases.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit 867178ae1d)
2024-03-28 13:29:46 -07:00
Rob Norris 4820185031 vdev_disk: rename existing functions to vdev_classic_*
This is just renaming the existing functions we're about to replace and
grouping them together to make the next commits easier to follow.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit f3b85d706b)
2024-03-28 13:29:46 -07:00
Rob Norris 52a2af6fd1 abd: add page iterator
The regular ABD iterators yield data buffers, so they have to map and
unmap pages into kernel memory. If the caller only wants to count
chunks, or can use page pointers directly, then the map/unmap is just
unnecessary overhead.

This adds adb_iterate_page_func, which yields unmapped struct page
instead.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit 390b448726)
2024-03-28 13:29:46 -07:00
Rob Norris 220bb7341e linux 5.4 compat: page_size()
Before 5.4 we have to do a little math.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit df04efe321)
2024-03-28 13:29:46 -07:00
Rob N 58211157bf Linux 6.8 compat: use splice_copy_file_range() for fallback
Linux 6.8 removes generic_copy_file_range(), which had been reduced to a
simple wrapper around splice_copy_file_range(). Detect that function
directly and use it if generic_ is not available.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15930
Closes #15931
(cherry picked from commit ef08a4d406)
2024-03-21 09:35:17 -07:00
Tony Hutter c883088df8 Tag zfs-2.2.3
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-02-21 09:26:51 -08:00
Alexander Motin c0c4866f8a dmu: Allow buffer fills to fail
When ZFS overwrites a whole block, it does not bother to read the
old content from disk. It is a good optimization, but if the buffer
fill fails due to page fault or something else, the buffer ends up
corrupted, neither keeping old content, nor getting the new one.

On FreeBSD this is additionally complicated by page faults being
blocked by VFS layer, always returning EFAULT on attempt to write
from mmap()'ed but not yet cached address range.  Normally it is
not a big problem, since after original failure VFS will retry the
write after reading the required data.  The problem becomes worse
in specific case when somebody tries to write into a file its own
mmap()'ed content from the same location.  In that situation the
only copy of the data is getting corrupted on the page fault and
the following retries only fixate the status quo.  Block cloning
makes this issue easier to reproduce, since it does not read the
old data, unlike traditional file copy, that may work by chance.

This patch provides the fill status to dmu_buf_fill_done(), that
in case of error can destroy the corrupted buffer as if no write
happened.  One more complication in case of block cloning is that
if error is possible during fill, dmu_buf_will_fill() must read
the data via fall-back to dmu_buf_will_dirty().  It is required
to allow in case of error restoring the buffer to a state after
the cloning, not not before it, that would happen if we just call
dbuf_undirty().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15665
2024-02-20 15:53:02 -08:00
Tony Hutter b62fd2cef9 ZTS: Skip cross-fs bclone tests if FreeBSD < 14.0
Skip cross filesystem block cloning tests on FreeBSD if running
less than version 14.0.  Cross filesystem copy_file_range() was
added in FreeBSD 14.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15901
2024-02-16 09:33:26 -08:00
Tony Hutter d92fbe2150 [zfs-2.2.3] ZTS: Use correct bclone module param name on FreeBSD
The bclone module names are not prefixed with 'zfs' on FreeBSD.
This was causing test failues.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-02-16 09:33:05 -08:00
Bi11 a4978d2605 zdb: Fix false leak report for BRT objects
Fix a misreport in 'zdb -d' where it falsely marked
BRT objects as leaked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Yuxin Wang <yuxinwang9999@gmail.com>
Closes #15882
2024-02-12 17:03:17 -08:00
Dex Wood a6f6c881ff Add Ntfy notification support to ZED
This commit adds the zed_notify_ntfy() function and hooks it
into zed_notify(). This will allow ZED to send notifications
to ntfy.sh or a self-hosted Ntfy service, which can be received
on a desktop or mobile device. It is configured with ZED_NTFY_TOPIC,
ZED_NTFY_URL, and ZED_NTFY_ACCESS_TOKEN variables in zed.rc.

Reviewed-by: @classabbyamp
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dex Wood <slash2314@gmail.com>
Closes #15584
2024-02-12 14:32:11 -08:00
Bi11 fc3d34bd08 BRT: Fix slop space calculation with block cloning
Similar to deduplication, the size of data duplicated by block cloning
should not be included in the slop space calculation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Yuxin Wang <yuxinwang9999@gmail.com>
Closes #15874
2024-02-12 14:04:27 -08:00
Rob N 36116b4612 zfs list: add '-t fs' and '-t vol' options (#15883)
Because "filesystem" and "volume" are just too long!

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15864
(cherry picked from commit a5a725440b)
2024-02-12 14:04:27 -08:00
Tony Hutter b699dacb4a [zfs-2.2.3] Enable zfs_bclone_enabled on cp_files tests
cp_files_002_pos uses BRT, so enable block cloning in setup/cleanup.
This is only something we need to do in zfs-2.2.3, since 2.2.x ships
with block cloning disabled by default.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-02-12 14:04:21 -08:00
the-Chain-Warden-thresh d22bf6a9bd LUA: Backport CVE-2020-24370's patch
CVE-2020-24370 is a security vulnerability in lua. Although the CVE
description in CVE-2020-24370 said that this CVE only affected lua
5.4.0, according to lua this CVE actually existed since lua 5.2. The
root cause of this CVE is the negation overflow that occurs when you
try to take the negative of 0x80000000. Thus, this CVE also exists in
openzfs. Try to backport the fix to the lua in openzfs since the
original fix is for 5.4 and several functions have been changed.

https://github.com/advisories/GHSA-gfr4-c37g-mm3v
https://nvd.nist.gov/vuln/detail/CVE-2020-24370
https://www.lua.org/bugs.html#5.4.0-11
https://github.com/lua/lua/commit/a585eae6e7ada1ca9271607a4f48dfb1786

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ChenHao Lu <18302010006@fudan.edu.cn>
Closes #15847
2024-02-08 15:22:16 -08:00
Cameron Harr 40e20d808c Add 'zpool status -e' flag to see unhealthy vdevs
When very large pools are present, it can be laborious to find
reasons for why a pool is degraded and/or where an unhealthy vdev
is. This option filters out vdevs that are ONLINE and with no errors
to make it easier to see where the issues are. Root and parents of
unhealthy vdevs will always be printed.

Testing:
ZFS errors and drive failures for multiple vdevs were simulated with
zinject.

Sample vdev listings with '-e' option
- All vdevs healthy
    NAME        STATE     READ WRITE CKSUM
    iron5       ONLINE       0     0     0

- ZFS errors
    NAME        STATE     READ WRITE CKSUM
    iron5       ONLINE       0     0     0
      raidz2-5  ONLINE       1     0     0
        L23     ONLINE       1     0     0
        L24     ONLINE       1     0     0
        L37     ONLINE       1     0     0

- Vdev faulted
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     FAULTED      0     0     0  too many errors

- Vdev faults and data errors
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-1  DEGRADED     0     0     0
        L2      FAULTED      0     0     0  too many errors
      raidz2-5  ONLINE       1     0     0
        L23     ONLINE       1     0     0
        L24     ONLINE       1     0     0
        L37     ONLINE       1     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     FAULTED      0     0     0  too many errors

- Vdev missing
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     UNAVAIL      3     1     0

- Slow devices when -s provided with -e
    NAME        STATE     READ WRITE CKSUM  SLOW
    iron5       DEGRADED     0     0     0     -
      raidz2-5  DEGRADED     0     0     0     -
        L10     FAULTED      0     0     0     0  external device fault
        L51     ONLINE       0     0     0    14

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #15769
2024-02-08 15:22:16 -08:00
Mauricio Faria de Oliveira 9bb8d26bd5 zed: fix typo in variable ZED_POWER_OFF_ENCLO*US*RE_SLOT_ON_FAULT
Replace ENCLO_US_RE with ENCLO_SU_RE in the name of the variable.

Note this changes the user-visible string in zed.rc, thus might
break current users with the wrong string, but it's ~2 months
since zfs-2.2.0 tag is out, thus should not be widespread yet.

Mechanical change:

    $ grep -rl ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT
    cmd/zed/zed.d/zed.rc
    cmd/zed/zed.d/statechange-slot_off.sh

    $ sed -i 's/ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT/<linebreak>
                ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT/g' \
      cmd/zed/zed.d/zed.rc \
      cmd/zed/zed.d/statechange-slot_off.sh

    $ grep -rl ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT
    $

Fixes 11fbcacf37
("zed: Add zedlet to power off slot when drive is faulted")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Closes #15651
2024-02-08 15:22:16 -08:00
Umer Saleem 08fd5ccc38 Improve performance for zpool trim on linux
On Linux, ZFS uses blkdev_issue_discard in vdev_disk_io_trim to issue
trim command which is synchronous.

This commit updates vdev_disk_io_trim to use __blkdev_issue_discard,
which is asynchronous. Unfortunately there isn't any asynchronous
version for blkdev_issue_secure_erase, so performance of secure trim
will still suffer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15843
2024-02-06 12:58:55 -08:00
Tony Hutter 00d85a98ea BRT: Fix FICLONE/FICLONERANGE shortened copy
On Linux the ioctl_ficlonerange() and ioctl_ficlone() system calls
are expected to either fully clone the specified range or return an
error.  The range may be for an entire file.  While internally ZFS
supports cloning partial ranges there's no way to return the length
cloned to the caller so we need to make this all or nothing.

As part of this change support for the REMAP_FILE_CAN_SHORTEN flag
has been added.  When REMAP_FILE_CAN_SHORTEN is set zfs_clone_range()
will return a shortened range when encountering pending dirty records.
When it's clear zfs_clone_range() will block and wait for the records
to be written out allowing the blocks to be cloned.

Furthermore, the file range lock is held over the region being cloned
to prevent it from being modified while cloning.  This doesn't quite
provide an atomic semantics since if an error is encountered only a
portion of the range may be cloned.  This will be converted to an
error if REMAP_FILE_CAN_SHORTEN was not provided and returned to the
caller.  However, the destination file range is left in an undefined
state.

A test case has been added which exercises this functionality by
verifying that `cp --reflink=never|auto|always` works correctly.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15728
Closes #15842
2024-02-06 10:01:15 -08:00
Mark Johnston 9ef15845f5 Fix the FreeBSD userspace build (#15716)
- Mark some parameters to zpool_power*() as unused.
- Add a stub zpool_disk_wait().

Fixes: a9520e6e5 ("zpool: Add slot power control, print power status")

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-01-30 13:33:36 -08:00
Tony Hutter 69142125d7 zpool: Add slot power control, print power status
Add `zpool` flags to control the slot power to drives.  This assumes
your SAS or NVMe enclosure supports slot power control via sysfs.

The new `--power` flag is added to `zpool offline|online|clear`:

    zpool offline --power <pool> <device>    Turn off device slot power
    zpool online --power <pool> <device>     Turn on device slot power
    zpool clear --power <pool> [device]      Turn on device slot power

If the ZPOOL_AUTO_POWER_ON_SLOT env var is set, then the '--power'
option is automatically implied for `zpool online` and `zpool clear`
and does not need to be passed.

zpool status also gets a --power option to print the slot power status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mart Frauenlob <AllKind@fastest.cc>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15662
2024-01-29 15:12:06 -08:00
Tony Hutter 59112ca27d zed: misc vdev_enc_sysfs_path fixes
There have been rare cases where the VDEV_ENC_SYSFS_PATH value that zed
gets passed is stale.  To mitigate this, dynamically check the sysfs
path at the time of zed event processing, and use the dynamic value if
possible.  Note that there will be other times when we can not
dynamically detect the sysfs path (like if a disk disappears) and have
to rely on the old value for things like turning on the fault LED.  That
is to say, we can't just blindly use the dynamic path in every case.

Also:
	- Add enclosure sysfs entry when running 'zpool add'
	- Fix 'slot' and 'enc' zpool.d scripts for nvme

Reviewed-by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15462
2024-01-29 15:11:56 -08:00
Tony Hutter 992d8871eb ZTS: Add dirty dnode stress test
Add a test for the dirty dnode SEEK_HOLE/SEEK_DATA bug described in
https://github.com/openzfs/zfs/issues/15526

The bug was fixed in https://github.com/openzfs/zfs/pull/15571 and
was backported to 2.2.2 and 2.1.14.  This test case is just to
make sure it does not come back.

seekflood.c originally written by Rob Norris.

Reviewed-by: Graham Perrin <grahamperrin@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15608
2024-01-29 15:06:14 -08:00
Rob Norris e6ca28c970 Linux 6.8 compat: handle mnt_idmap user_namespace change
struct mnt_idmap no longer has a struct user_namespace within it. Work
around this by creating a temporary with the copy of the map we need
taken from the idmap.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 14:53:29 -08:00
Rob Norris cbd51c5f24 Linux 6.8 compat: fix inode permission tests
The name inode_permission is now defined in the kernel. Rename ours to
test_permission, in line with most of our other tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 14:53:29 -08:00
Rob Norris 09e6724e1e Linux 6.8 compat: replace MAX_ORDER define
MAX_ORDER has been renamed to MAX_PAGE_ORDER. Rather than just
redefining it, instead define our own name and set it consistently from
the start.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 14:53:29 -08:00
Rob Norris 7466e09a49 Linux 6.8 compat: implement strlcpy fallback
Linux has removed strlcpy in favour of strscpy. This implements a
fallback implementation of strlcpy for this case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 14:53:29 -08:00
Rob Norris ce782d0804 Linux 6.8 compat: update for new bdev access functions
blkdev_get_by_path() and blkdev_put() have been replaced by
bdev_open_by_path() and bdev_release(), which return a "handle" object
with the bdev object itself inside.

This adds detection for the new functions, and macros to handle the old
and new forms consistently.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 14:53:29 -08:00
Rob Norris 64afc4e66e Linux 6.8 compat: make test functions static
The kernel is now being compiled with -Wmissing-prototypes. Most of our
test stub functions had no prototype, and failed to compile. Since they
don't need to be visible anywhere else, just make them all static.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15805
2024-01-29 14:53:29 -08:00
Brian Behlendorf 621dfaff5c Linux 6.7 compat: META
Update the META file to reflect compatibility with the 6.7 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15833
2024-01-29 14:53:29 -08:00
Paul Dagnelie ab653603f8 Don't assert mg_initialized due to device addition race
During device removal stress tests, we noticed that we were tripping 
the assertion that mg_initialized was true. After investigation, it was 
determined that the mg in question was the embedded log metaslab 
group for a newly added vdev; the normal mg had been initialized (by 
metaslab_sync_reassess, via vdev_sync_done). However, because the spa 
config alloc lock is not held as writer across both calls to 
metaslab_sync_reassess, it is possible for an allocation to happen 
between the two metaslab_groups being initialized. Because the metaslab 
code doesn't check the group in question, just the vdev's main mg, it 
is possible to get past the initial check in vdev_allocatable and 
later fail due to the assertion.

We simply remove the assertions. We could also consider locking the 
ALLOC lock around the reassess calls in vdev_sync_done, but that risks 
deadlocks. We could check the actual target mg in vdev_allocatable, 
but that risks racing with a passivation that comes in after that 
check but before the assertion. We still won't be able to actually 
allocate from the metaslab group if no metaslabs are ready, so this 
change shouldn't break anything.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15818
2024-01-29 14:53:29 -08:00
Chris Davidson acc7cd8e99 Update man pages to time(1) from time(2)
zpool-iostat.8: Updated time(2) -> time(1) to align to manual page
zpool-list.8: Updated time(2) -> time(1) to align to manual page
zpool-status.8: Updated time(2) -> time(1) to align to manual page
zpool-wait.8: Update time(2) -> time(1) to align to manual page

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christopher Davidson <christopher.davidson@gmail.com>
Closes #15823
2024-01-29 14:53:29 -08:00
Brian Behlendorf dd0874cf7e ZTS: Allow longer run time for zdb_args_pos
The zdb_args_pos test may take slightly longer than 600 seconds to run
on some of the CI builders.  To prevent this from causing failures allow
up to 1200 seconds for tests in this group.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15826
2024-01-29 14:53:29 -08:00
Andrew Innes 7cd666d54b Move nodes into correct subgraphs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Closes #15828
2024-01-29 14:53:29 -08:00
Rob N 0606ce2055 zpool wait: print timestamp before the header
list, status and iostat all display the -T timestamp before the header,
but wait showed it after. Make it be like the others.

Reported-by: Kyle Evans <kevans@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15825
2024-01-29 14:53:29 -08:00
Ameer Hamza dd3a0a2715 Update vdev devid and physpath if changed between imports
If devid or physpath for a vdev changes between imports, ensure it is
updated to the new value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15816
2024-01-29 14:53:29 -08:00
Tino Reichardt 9ad150446f ZTS: Update deprecated Github Action version numbers
GitHub Actions is transitioning from Node 16 to Node 20.

So we need to update these:
- actions/checkout@v3 -> v4
- actions/download-artifact@v3 -> v4
- actions/upload-artifact@v3 -> v4 and some minor changes

Update also the documentation of the testings workflow.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Andrew Innes <andrew.c12@gmail.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15820
2024-01-29 14:53:29 -08:00
Richard Yao 9da745f5de Switch to CodeQL to detect prohibited function use
The LLVM/Clang developers pointed out that using the CPP to detect use
of functions that our QA policies prohibit risks invoking undefined
behavior. To resolve this, we configure CodeQL to detect forbidden
function usage.

Note that cpp in the context of CodeQL refers to C/C++, rather than the
C PreProcessor, which C++ also uses. It really should have been written
cxx, but that ship sailed a long time ago. This misuse of the term cpp
is retained in the CodeQL configuration for consistency with upstream
CodeQL.

As a side benefit, verbose make no longer is a wall of text showing a
bunch of CPP macros, which can make debugging slightly easier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #15819 
Closes #14134
2024-01-29 14:53:29 -08:00
Tino Reichardt cfa29b9945 ZTS: Apply small changes for speeding up the tests
The Github Action Runner got some new hardware metrics.  We should use
the provided and empty disk which is pre-mounted at /mnt now.

Disk1: 89GiB -> rootfs + bootfs with ~80MB/s -> don't care
Disk2: 64GiB -> /mnt with 420MB/s -> new testing ssd

This commit will mount the new disk to /var/tmp and provide hopefully
some speedups within our testings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Andrew Innes <andrew.c12@gmail.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15811
2024-01-29 14:53:29 -08:00
Val Packett 09a7961364 FreeBSD: Fix bootstrapping tools under Linux/musl
musl libc has deprecated LFS64 aliases, so bootstrapping FreeBSD tools
under musl distros has been failing with stat64 errors.

Apply the aliases under non-glibc Linux to fix this problem.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Val Packett <val@packett.cool>
Closes #15780
2024-01-29 14:53:29 -08:00
Tino Reichardt 276be5357c linux spl: fix typo in top comment of spl-condvar.c
Credential Implementation -> Condition Variables Implementation

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15782
2024-01-29 14:53:29 -08:00
Lalufu 424d06a298 Make sure all necessary RPM path macros are defined
When building (s)rpm files through the Makefile, a directory structure
is created in /tmp to hold the various files.

In case the user running the command has overridden some of the RPM path
settings through their user profile (for example in `~/.rpmmacros`),
these paths do not line up with the configuration, and the build fails.

Make sure all paths used are properly defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ralf Ertzinger <ralf@skytale.net>
Closes #15756
2024-01-29 14:53:29 -08:00
youzhongyang 6b64acc157 Make spl_kmem_cache size check consistent
On Linux x86_64, kmem cache can have size up to 4M,
however increasing spl_kmem_cache_slab_limit can lead
to crash due to the size check inconsistency.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #15757
2024-01-29 14:53:29 -08:00
Ameer Hamza a2e71db664 Add path handling for aux vdevs in label_path
If the AUX vdev is added using UUID, importing the pool falls back AUX
vdev to open it with disk name instead of UUID due to the absence of
path information for AUX vdevs. Since AUX label now have path
information, this PR adds path handling for it in `label_path`.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15737
2024-01-29 14:53:29 -08:00
Ameer Hamza eb4a36bcef Extend aux label to add path information
Pool import logic uses vdev paths, so it makes sense to add path
information on AUX vdev as well.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15737
2024-01-29 14:53:29 -08:00
Ameer Hamza 52cee9a3eb fix: Uber block label not always found for aux vdevs
When spare or l2cache (aux) vdev is added during pool creation,
spa->spa_uberblock is not dumped until that point. Subsequently,
the aux label is never synchronized after its initial creation,
resulting in the uberblock label remaining undumped. The uberblock
is crucial for lib_blkid in identifying the ZFS partition type. To
address this issue, we now ensure sync of the uberblock label once
if it's not dumped initially.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15737
2024-01-29 14:53:29 -08:00
Brian Behlendorf 2006ac1f4a Fix "out of memory" error
Drop the no_memory() call from zpool_in_use() when reading the
label fails and instead return the error to the caller.  This
prevents a misleading "internal error: out of memory" error
when the label can't be read.  This will result in is_spare()
returning B_FALSE instead of aborting, which is already safely
handled.

Furthermore, on Linux it's possible for EREMOTEIO to returned
by an NVMe device if the device has been low-level formatted
and not rescanned.  In this case we want to fallback to the
legacy scanning method and read any of the labels we can.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #13538
Closes #15747
2024-01-29 14:53:29 -08:00
Benjamin Sherman 509526ad21 fix: preserve linux kmod signature in zfs-kmod rpm spec
This change provides rpm spec macros to sign the zfs and spl kmods as
the final step after the %install scriptlet. This is needed since the
find-debuginfo.sh script strips out debug symbols plus signatures.

Kernel module signing only occurs when the required files are present
as typically required in the Linux source tree:
- certs/signing_key.pem
- certs/signing_key.x509

The method for overriding the default __spec_install_post macro is
inspired by (and largely copied from) the Fedora kernel.spec.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Benjamin Sherman <benjamin@holyarmy.org>
Closes #15744
2024-01-29 14:53:29 -08:00
Stefan Lendl 4db88c37cc fix(mount): do not truncate shares not zfs mount
When running zfs share -a resetting the exports.d/zfs.exports makes
sense the get a clean state.
Truncating was also called with zfs mount which would not populate the
file again.
Add test to verify shares persist after mount -a.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stefan Lendl <s.lendl@proxmox.com>
Closes #15607 
Closes #15660
2024-01-29 14:53:29 -08:00
Mark Johnston 8b1c6db3d2 Fix a potential use-after-free in zfs_setsecattr()
In general, VOPs must not load the "z_log" field until having called
zfs_enter_verify_zp().

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15752
2024-01-29 14:53:29 -08:00
Mark Johnston 22e4f08c30 Linux: Defer loading the object set in zfs_setattr()
We need to wait until after having done a zfs_enter() to load some
fields from the zfsvfs structure.  Otherwise a use-after-free is
possible in the face of a concurrent rollback.

Other functions in this file are careful to avoid this bug, I believe
this is the only instance.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15752
2024-01-29 14:53:29 -08:00
Rich Ercolani 7bccf98a73 Make zdb -R scale less poorly
zdb -R with :d tries to use gzip decompression 9 times per size.
There's absolutely no reason for that, they're all the same
decompressor.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15726
2024-01-29 14:53:29 -08:00
Rich Ercolani 4d4972ed98 Stop wasting time on malloc in snprintf_zstd_header
Profiling zdb -vvvvv on datasets with a lot of zstd blocks, we find
ourselves spending quite a lot of time on malloc/free, because we
allocate a 16M abd each call, and never free it, so we're leaking
16M per call as well.

This seems sub-optimal. So let's just keep the buffer around and
reuse it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15721
2024-01-29 14:53:29 -08:00
Pawel Jakub Dawidek 3425484eb9 Fix file descriptor leak on pool import.
Descriptor leak can be easily reproduced by doing:

	# zpool import tank
	# sysctl kern.openfiles
	# zpool export tank; zpool import tank
	# sysctl kern.openfiles

We were leaking four file descriptors on every import.

Similar leak most likely existed when using file-based VDEVs.

External-issue: https://reviews.freebsd.org/D43529
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15630
2024-01-26 13:38:25 -08:00
Brian Behlendorf 9e0304c363 ZTS: Apply zfs_bclone_enabled to bclone tests
If block cloning is disabled by default then enable it when running
the bclone tests.  Follow up to #15529.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15796
2024-01-22 16:15:03 -08:00
Tino Reichardt c1161e2851 fix: variable type with zfs-tests/cmd/clonefile.c
Compiling on arm64 freebsd-13.2 and arm64 almalinux-8 brings currently
this error:

```
  CC       tests/zfs-tests/cmd/clonefile.o
tests/zfs-tests/cmd/clonefile.c:166:43: error: result of comparison of \
constant -1 with expression of type 'char' is always true \
[-Werror,-Wtautological-constant-out-of-range-compare]
        while ((c = getopt(argc, argv, "crfdq")) != -1) {
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~~
1 error generated.
gmake[2]: *** [Makefile:8675: tests/zfs-tests/cmd/clonefile.o] Error 1
```

Fix: use correct variable type `int`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15783
2024-01-19 12:28:02 -08:00
Pawel Jakub Dawidek ef527958c6 Fix cloning into mmaped and cached file.
If the destination file is mmaped and the mmaped region was already
read, so it is cached, we need to update mmaped pages after successful
clone using update_pages().

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Pointed out by: Ka Ho Ng <khng@freebsd.org>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15772
2024-01-19 12:28:02 -08:00
Umer Saleem d2f7b2e557 ZTS: Test for clone, mmap and write for block cloning
For block cloning, if we mmap the cloned file and write from the
map into the file, it triggers a panic in dbuf_redirty() on Linux.

The same scenario causes data corruption on FreeBSD. Both these
issues are fixed under PR#15656 and PR#15665.

It would be good to add a test for this scenario in ZTS. The test
program and issue was produced by @robn.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15717
2024-01-19 12:28:02 -08:00
Brian Behlendorf 83c0ccc7cf Enable block_cloning tests on FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15749
2024-01-19 12:28:02 -08:00
Pawel Jakub Dawidek c16d103422 Block cloning tests.
The test mostly focus on testing various corner cases.
The tests take a long time to run, so for the common.run runfile
we randomly select a hundred tests.
To run all the bclone tests, bclone.run runfile should be used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15631
2024-01-19 12:28:02 -08:00
Umer Saleem f94a77951d Test LWB buffer overflow for block cloning
PR#15634 removes 128K into 2x68K LWB split optimization, since it
was found to cause LWB buffer overflow while trying to write 128KB
TX_CLONE_RANGE record with 1022 block pointers into 68KB buffer,
with multiple VDEVs ZIL.

This commit adds a test for this particular scenario by writing
maximum sizes TX_CLONE_RANE record with 1022 block pointers into
68KB buffer, with two SLOG devices.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15672
2024-01-19 12:28:02 -08:00
Ameer Hamza d8b0b6032b ZTS: Add test cases for block cloning replay
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15614
2024-01-19 12:28:02 -08:00
Ameer Hamza 387f003be3 ZTS: block_cloning: Use numeric sort for get_same_blocks
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15614
2024-01-19 12:28:02 -08:00
Kevin Jin 07cf973fe9 Autotrim High Load Average Fix
Switch from cv_wait() to cv_wait_idle() in vdev_autotrim_wait_kick(),
which should mitigate the high load average while waiting.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes #15781
2024-01-18 11:33:29 -08:00
Rob N 2ecc2dfe42 Linux 6.7 compat: zfs_setattr fix atime update
In db4fc559c I messed up and changed this bit of code to set the inode
atime to an uninitialised value, when actually it was just supposed to
loading the atime from the inode to be stored in the SA. This changes it
to what it should have been.

Ensure times change by the right amount Previously, we only checked
if the times changed at all, which missed a bug where the atime was
being set to an undefined value.

Now ensure the times change by two seconds (or thereabouts), ensuring
we catch cases where we set the time to something bonkers

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15762
Closes #15773
2024-01-17 08:59:28 -08:00
Shengqi Chen 9ecd112dc1 compact: workaround for GPL-only symbols on riscv from Linux 6.2
Since Linux 6.2, the implementation of flush_dcache_page on riscv
references GPL-only symbol `PageHuge`, breaking the build of zfs.

This patch uses existing mechanism to override flush_dcache_page,
removing the call to `PageHuge`. According to comments in kernel,
it is only used to do some check against HugeTLB pages, which only
exist in userspace. ZFS uses flush_dcache_page only on kernel pages,
thus this patch will not introduce any behaviour change.

See also: torvalds/linux@d33deda, openzfs/zfs@589f59b

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #14974 
Closes #15627
2024-01-16 13:27:29 -08:00
Mark Johnston a00231a3fc spa: Let spa_taskq_param_get()'s addition of a newline be optional
For FreeBSD sysctls, we don't want the extra newline, since the
sysctl(8) utility will format strings appropriately.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15719
2024-01-16 11:32:19 -08:00
Mark Johnston 9181e94f0b spa: Fix FreeBSD sysctl handlers
sbuf_cpy() resets the sbuf state, which is wrong for sbufs allocated by
sbuf_new_for_sysctl().  In particular, this code triggers an assertion
failure in sbuf_clear().

Simplify by just using sysctl_handle_string() for both reading and
setting the tunable.

Fixes: 6930ecbb7 ("spa: make read/write queues configurable")
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15719
2024-01-16 11:32:19 -08:00
Rob Norris 3bd23fd78d freebsd: fix compile for spa_taskq_read/spa_taskq_write params
Missed in #15695, backporting #15675.

Signed-off-by: Rob Norris <robn@despairlabs.com>
2024-01-16 11:32:19 -08:00
Alexander Motin ac592318b8 Fix livelist assertions for dedup and cloning
Two block pointers in livelist pointing to the same location may
be caused not only by dedup, but also by block cloning. We should
not assert D bit set in them.

Two block pointers in livelist pointing to the same location may
have different logical birth time in case of dedup or cloning. We
should assert identical physical birth time instead.

Assert identical physical block size between pointers in addition
to checksum, since that is what checksums are calculated on.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15732
2024-01-12 12:53:00 -08:00
Alexander Motin 152a775eac Improve block sizes checks during cloning
- Fail if source block is smaller than destination.  We can only
grow blocks, not shrink them.
 - Fail if we do not have full znode range lock.  In that case grow
is not even called.  We should improve zfs_rangelock_cb() somehow
to know when cloning needs to grow the block size unlike write.
 - Fail of we tried to resize, but failed.  There are many reasons
for it to fail that we can not predict at this level, so be ready
for them.  Unlike write, that may proceed after growth failure,
block cloning can't and must return error.

This fixes assertion inside dmu_brt_clone() when it sees different
number of blocks held in destination than it got block pointers.
Builds without ZFS_DEBUG returned EXDEV, so are not affected much.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15724 
Closes #15735
2024-01-12 12:53:00 -08:00
Shengqi Chen 976bf9b6a6 Linux 6.2 compat: add check for kernel_neon_* availability
This patch adds check for `kernel_neon_*` symbols on arm and arm64
platforms to address the following issues:

1. Linux 6.2+ on arm64 has exported them with `EXPORT_SYMBOL_GPL`, so
   license compatibility must be checked before use.
2. On both arm and arm64, the definitions of these symbols are guarded
   by `CONFIG_KERNEL_MODE_NEON`, but their declarations are still
   present. Checking in configuration phase only leads to MODPOST
   errors (undefined references).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15711 
Closes #14555 
Closes: #15401
2024-01-12 12:38:27 -08:00
chrisperedun f71c16a661 Don't panic on unencrypted block in encrypted dataset
While 763ca47 closes the situation of block cloning creating
unencrypted records in encrypted datasets, existing data still causes
panic on read. Setting zfs_recover bypasses this but at the cost of
potentially ignoring more serious issues.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Peredun <chris.peredun@ixsystems.com>
Closes #15677
2024-01-08 16:11:39 -08:00
Alexander Motin 9c40ae0219 dbuf: Set dr_data when unoverriding after clone
Block cloning normally creates dirty record without dr_data.  But if
the block is read after cloning, it is moved into DB_CACHED state and
receives the data buffer.  If after that we call dbuf_unoverride()
to convert the dirty record into normal write, we should give it the
data buffer from dbuf and release one.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15654
Closes #15656
2024-01-08 16:11:39 -08:00
Alexander Motin a701548eb4 dbuf: Handle arcbuf assignment after block cloning
In some cases dbuf_assign_arcbuf() may be called on a block that
was recently cloned.  If it happened in current TXG we must undo
the block cloning first, since the only one dirty record per TXG
can't and shouldn't mean both cloning and overwrite same time.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15653
2024-01-08 16:11:39 -08:00
Alexander Motin b13c91bb29 DMU: Fix lock leak on dbuf_hold() error
dmu_assign_arcbuf_by_dnode() should drop dn_struct_rwlock lock in
case dbuf_hold() failed.  I don't have reproduction for this, but
it looks inconsistent with dmu_buf_hold_noread_by_dnode() and co.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15644
2024-01-08 16:11:39 -08:00
Alexander Motin e09356fa05 BRT: Limit brt_vdev_dump() to only one vdev
Without this patch on pool of 60 vdevs with ZFS_DEBUG enabled clone
takes much more time than copy, while heavily trashing dbgmsg for
no good reason, repeatedly dumping all vdevs BRTs again and again,
even unmodified ones.

I am generally not sure this dumping is not excessive, but decided
to keep it for now, just restricting its scope to more reasonable.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15625
2024-01-08 16:11:39 -08:00
Alexander Motin 1e1d748cae ZIL: Remove 128K into 2x68K LWB split optimization
To improve 128KB block write performance in case of multiple VDEVs
ZIL used to spit those writes into two 64KB ones.  Unfortunately it
was found to cause LWB buffer overflow, trying to write maximum-
sizes 128KB TX_CLONE_RANGE record with 1022 block pointers into
68KB buffer, since unlike TX_WRITE ZIL code can't split it.

This is a minimally-invasive temporary block cloning fix until the
following more invasive prediction code refactoring.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15634
2024-01-08 16:11:39 -08:00
Alexander Motin dea2d3c6cd zdb: Dump encrypted write and clone ZIL records
Block pointers are not encrypted in TX_WRITE and TX_CLONE_RANGE
records, so we can dump them, that may be useful for debugging.

Related to #15543.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15629
2024-01-08 16:11:39 -08:00
oromenahar 121924575e Allow block cloning across encrypted datasets
When two datasets share the same master encryption key, it is safe
to clone encrypted blocks. Currently only snapshots and clones
of a dataset share with it the same encryption key.

Added a test for:
- Clone from encrypted sibling to encrypted sibling with
  non encrypted parent
- Clone from encrypted parent to inherited encrypted child
- Clone from child to sibling with encrypted parent
- Clone from snapshot to the original datasets
- Clone from foreign snapshot to a foreign dataset
- Cloning from non-encrypted to encrypted datasets
- Cloning from encrypted to non-encrypted datasets

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Original-patch-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15544
2024-01-08 16:11:39 -08:00
Alexander Motin e11b3eb1c6 ZIL: Do not clone blocks from the future
ZIL claim can not handle block pointers cloned from the future,
since they are not yet allocated at that point.  It may happen
either if the block was just written when it was cloned, or if
the pool was frozen or somehow else rewound on import.

Handle it from two sides: prevent cloning of blocks with physical
birth time from not yet synced or frozen TXG, and abort ZIL claim
if we still detect such blocks due to rewind or something else.

While there, assert that any cloned blocks we claim are really
allocated by calling metaslab_check_free().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15617
2024-01-08 16:11:39 -08:00
Alexander Motin 3b8f227362 ZIL: Remove TX_CLONE_RANGE replay for ZVOLs.
zil_claim_clone_range() takes references on cloned blocks before ZIL
replay.  Later zil_free_clone_range() drops them after replay or on
dataset destroy.  The total balance is neutral.  It means we do not
need to do anything (drop the references) for not implemented yet
TX_CLONE_RANGE replay for ZVOLs.

This is a logical follow up to #15603.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15612
2024-01-08 16:11:39 -08:00
Alexander Motin e48195c816 ZIO: Add overflow checks for linear buffers
Since we use a limited set of kmem caches, quite often we have unused
memory after the end of the buffer.  Put there up to a 512-byte canary
when built with debug to detect buffer overflows at the free time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15553
2024-01-08 16:11:39 -08:00
Alexander Motin ad47eca195 ZIL: Assert record sizes in different places
This should make sure we have log written without overflows.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15517
2024-01-08 16:11:39 -08:00
Alexander Motin 2e259c6f00 L2ARC: Restrict write size to 1/4 of the device
PR #15457 exposed weird logic in L2ARC write sizing. If it appeared
bigger than device size, instead of liming write it reset all the
system-wide tunables to their default.  Aside of being excessive,
it did not actually help with the problem, still allowing infinite
loop to happen.

This patch removes the tunables reverting logic, but instead limits
L2ARC writes (or at least eviction/trim) to 1/4 of the capacity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15519
2024-01-08 16:11:39 -08:00
Alexander Motin a8c29a79df Linux: Reclaim unused spl_kmem_cache_reclaim
It is unused for 3 years since #10576.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15507
2024-01-08 16:11:39 -08:00
Alexander Motin f13593619b FreeBSD: Optimize large kstat outputs
- Use sbuf_new_for_sysctl() to reduce double-buffering on sysctl
output.
- Use much faster sbuf_cat() instead of sbuf_printf("%s").

Together it reduces `sysctl kstat.zfs.misc.dbufs` time from minutes
to seconds, making dbufstat almost usable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15495
2024-01-08 16:11:39 -08:00
Alan Somers c34fe8dcbc Update the kstat dataset_name when renaming a zvol
Add a dataset_kstats_rename function, and call it when renaming
a zvol on FreeBSD and Linux.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #15482
Closes #15486
2024-01-08 16:11:39 -08:00
Alexander Motin 2a59b6bfa9 ABD: Be more assertive in iterators
Once we verified the ABDs and asserted the sizes we should never
see premature ABDs ends.  Assert that and remove extra branches
from production builds.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15428
2024-01-08 16:11:39 -08:00
Rob Norris db2db50e37 spa: make read/write queues configurable
We are finding that as customers get larger and faster machines
(hundreds of cores, large NVMe-backed pools) they keep hitting
relatively low performance ceilings. Our profiling work almost always
finds that they're running into bottlenecks on the SPA IO taskqs.
Unfortunately there's often little we can advise at that point, because
there's very few ways to change behaviour without patching.

This commit adds two load-time parameters `zio_taskq_read` and
`zio_taskq_write` that can configure the READ and WRITE IO taskqs
directly.

This achieves two goals: it gives operators (and those that support
them) a way to tune things without requiring a custom build of OpenZFS,
which is often not possible, and it lets us easily try different config
variations in a variety of environments to inform the development of
better defaults for these kind of systems.

Because tuning the IO taskqs really requires a fairly deep understanding
of how IO in ZFS works, and generally isn't needed without a pretty
serious workload and an ability to identify bottlenecks, only minimal
documentation is provided. Its expected that anyone using this is going
to have the source code there as well.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
2023-12-22 13:25:07 -08:00
Brian Behlendorf d530d5d8a5 Linux 6.5 compat: check BLK_OPEN_EXCL is defined
On some systems we already have blkdev_get_by_path() with 4 args
but still the old FMODE_EXCL and not BLK_OPEN_EXCL defined.
The vdev_bdev_mode() function was added to handle this case
but there was no generic way to specify exclusive access.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15692
2023-12-21 16:19:48 -08:00
Brian Behlendorf 3c502e376b ZTS: Disable io_uring test on CentOS 9
The io_uring test fails on CentOS 9 with the following fio error.
Disable the test for the benefit of the CI until this can be fully
investigated.  This basic test passes as expected on newer kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15636
2023-12-21 15:44:43 -08:00
Rob Norris 03b84099d9 linux 6.7 compat: rework shrinker setup for heap allocations
6.7 changes the shrinker API such that shrinkers must be allocated
dynamically by the kernel. To accomodate this, this commit reworks
spl_register_shrinker() to do something similar against earlier kernels.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
2023-12-21 11:03:08 -08:00
Rob Norris 18a9185165 linux 6.7 compat: handle superblock shrinker member change
In 6.7 the superblock shrinker member s_shrink has changed from being an
embedded struct to a pointer. Detect this, and don't take a reference if
it already is one.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
2023-12-21 11:03:08 -08:00
Rob Norris 3c13601a12 linux 6.7 compat: use inode atime/mtime accessors
6.6 made i_ctime inaccessible; 6.7 has done the same for i_atime and
i_mtime. This extends the method used for ctime in b37f29341 to atime
and mtime as well.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
2023-12-21 11:03:08 -08:00
Rob Norris b3626f0a35 linux 6.7 compat: simplify current_time() check
6.7 changed the names of the time members in struct inode, so we can't
assign back to it because we don't know its name. In practice this
doesn't matter though - if we're missing current_time(), then we must be
on <4.9, and we know our fallback will need to return timespec.

Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://github.com/sponsors/robn
2023-12-21 11:03:08 -08:00
Tony Hutter 494aaaed89 Tag zfs-2.2.2
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2023-11-29 14:08:46 -08:00
rmacklem 522414da3b FreeBSD: Fix ZFS so that snapshots under .zfs/snapshot are NFS visible
Call vfs_exjail_clone() for mounts created under .zfs/snapshot
to fill in the mnt_exjail field for the mount.  If this is not
done, the snapshots under .zfs/snapshot with not be accessible
over NFS.

This version has the argument name in vfs.h fixed to match that
of the name in spl_vfs.c, although it really does not matter.

External-issue: https://reviews.freebsd.org/D42672
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Closes #15563
2023-11-29 14:08:46 -08:00
Alexander Motin a8c256046b ZIL: Call brt_pending_add() replaying TX_CLONE_RANGE
zil_claim_clone_range() takes references on cloned blocks before ZIL
replay.  Later zil_free_clone_range() drops them after replay or on
dataset destroy.  The total balance is neutral.  It means on actual
replay we must take additional references, which would stay in BRT.

Without this blocks could be freed prematurely when either original
file or its clone are destroyed.  I've observed BRT being emptied
and the feature being deactivated after ZIL replay completion, which
should not have happened.  With the patch I see expected stats.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15603
2023-11-29 13:08:25 -08:00
Martin Matuška eb34de04d7 zdb: fix printf() length for uint64_t devid
Bug introduced in 213d682967.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@FreeBSD.org>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15606
2023-11-29 13:08:25 -08:00
Jaron Kent-Dobias d813aa8530 Linux 6.6 compat: fix configure error with clang (#15558)
With Linux v6.6.x and clang 16, a configure step fails on a warning that
later results in an error while building, due to 'ts' being
uninitialized. Add a trivial initialization to silence the warning.

Signed-off-by: Jaron Kent-Dobias <jaron@kent-dobias.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2023-11-28 15:19:07 -08:00
AllKind 3b267e72de zfs-dkms: fix shell-init error message
If all zfs dkms modules have been removed, a shell-init error message
may appear, because /var/lib/dkms/zfs does no longer exist.
Resolve this by leaving the directory earlier on.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15576
2023-11-28 15:19:07 -08:00
Alan Somers 349fb77f11 FreeBSD: Fix the build on FreeBSD 12
It was broken for several reasons:
* VOP_UNLOCK lost an argument in 13.0.  So OpenZFS should be using
  VOP_UNLOCK1, but a few direct calls to VOP_UNLOCK snuck in.
* The location of the zlib header moved in 13.0 and 12.1.  We can drop
  support for building on 12.0, which is EoL.
* knlist_init lost an argument in 13.0.  OpenZFS change 9d0887402b
  assumed 13.0 or later.
* FreeBSD 13.0 added copy_file_range, and OpenZFS change 67a1b03791
  assumed 13.0 or later.

Sponsored-by: Axcient
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #15551
2023-11-28 15:19:07 -08:00
Rob N 2a953e0ac9 dmu_buf_will_clone: fix race in transition back to NOFILL
Previously, dmu_buf_will_clone() would roll back any dirty record, but
would not clean out the modified data nor reset the state before
releasing the lock. That leaves the last-written data in db_data, but
the dbuf in the wrong state.

This is eventually corrected when the dbuf state is made NOFILL, and
dbuf_noread() called (which clears out the old data), but at this point
its too late, because the lock was already dropped with that invalid
state.

Any caller acquiring the lock before the call into
dmu_buf_will_not_fill() can find what appears to be a clean, readable
buffer, and would take the wrong state from it: it should be getting the
data from the cloned block, not from earlier (unwritten) dirty data.

Even after the state was switched to NOFILL, the old data was still not
cleaned out until dbuf_noread(), which is another gap for a caller to
take the lock and read the wrong data.

This commit fixes all this by properly cleaning up the previous state
and then setting the new state before dropping the lock. The
DBUF_VERIFY() calls confirm that the dbuf is in a valid state when the
lock is down.

Sponsored-by: Klara, Inc.
Sponsored-By: OpenDrives Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15566
Closes #15526
2023-11-28 12:59:00 -08:00
Akash B e4985bf5a1 zdb: Fix zdb '-O|-r' options with -e/exported zpool
zdb with '-e' or exported zpool doesn't work along with
'-O' and '-r' options as we process them before '-e' has
been processed.

Below errors are seen:

~> zdb -e pool-mds65/mdt65 -O oi.9/0x200000009:0x0:0x0
failed to hold dataset 'pool-mds65/mdt65': No such file or directory

~> zdb -e pool-oss0/ost0 -r file1 /tmp/filecopy1 -p.
failed to hold dataset 'pool-oss0/ost0': No such file or directory
zdb: internal error: No such file or directory

We need to make sure to process '-O|-r' options after the
'-e' option has been processed, which imports the pool to
the namespace if it's not in the cachefile.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #15532
2023-11-28 12:56:43 -08:00
Rob Norris e96675a7b1 zdb: show BRT statistics and dump its contents
Same idea as the dedup stats, but for block cloning.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15541
2023-11-28 12:56:43 -08:00
Rob Norris d702f86eaf brt: lift internal definitions into _impl header
So that zdb (and others!) can get at the BRT on-disk structures.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15541
2023-11-28 12:56:43 -08:00
Tony Hutter 41c4599cba ZTS: Fix zfs_load-key failures on F39
The zfs_load-key tests were failing on F39 due to their use of the
deprecated ssl.wrap_socket function.  This commit updates the test to
instead use ssl.SSLContext() as described in
https://stackoverflow.com/a/65194957.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15534
Closes #15550
2023-11-28 12:56:09 -08:00
Alexander Motin 56a2a0981e ZIL: Do not encrypt block pointers in lr_clone_range_t
In case of crash cloned blocks need to be claimed on pool import.
It is only possible if they (lr_bps) and their count (lr_nbps) are
not encrypted but only authenticated, similar to block pointer in
lr_write_t.  Few other fields can be and are still encrypted.

This should fix panic on ZIL claim after crash when block cloning
is actively used.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Reviewed-by: Sean Eric Fagan <sef@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15543
Closes #15513
2023-11-28 11:17:52 -08:00
Rob N 9b9b09f452 dnode_is_dirty: check dnode and its data for dirtiness
Over its history this the dirty dnode test has been changed between
checking for a dnodes being on `os_dirty_dnodes` (`dn_dirty_link`) and
`dn_dirty_record`.

  de198f2d9 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
  2531ce372 Revert "Report holes when there are only metadata changes"
  ec4f9b8f3 Report holes when there are only metadata changes
  454365bba Fix dirty check in dmu_offset_next()
  66aca2473 SEEK_HOLE should not block on txg_wait_synced()

Also illumos/illumos-gate@c543ec060d illumos/illumos-gate@2bcf0248e9

It turns out both are actually required.

In the case of appending data to a newly created file, the dnode proper
is dirtied (at least to change the blocksize) and dirty records are
added.  Thus, a single logical operation is represented by separate
dirty indicators, and must not be separated.

The incorrect dirty check becomes a problem when the first block of a
file is being appended to while another process is calling lseek to skip
holes. There is a small window where the dnode part is undirtied while
there are still dirty records. In this case, `lseek(fd, 0, SEEK_DATA)`
would not know that the file is dirty, and would go to
`dnode_next_offset()`. Since the object has no data blocks yet, it
returns `ESRCH`, indicating no data found, which results in `ENXIO`
being returned to `lseek()`'s caller.

Since coreutils 9.2, `cp` performs sparse copies by default, that is, it
uses `SEEK_DATA` and `SEEK_HOLE` against the source file and attempts to
replicate the holes in the target. When it hits the bug, its initial
search for data fails, and it goes on to call `fallocate()` to create a
hole over the entire destination file.

This has come up more recently as users upgrade their systems, getting
OpenZFS 2.2 as well as a newer coreutils. However, this problem has been
reproduced against 2.1, as well as on FreeBSD 13 and 14.

This change simply updates the dirty check to check both types of dirty.
If there's anything dirty at all, we immediately go to the "wait for
sync" stage, It doesn't really matter after that; both changes are on
disk, so the dirty fields should be correct.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15571
Closes #15526
2023-11-28 09:15:48 -08:00
Brian Behlendorf 89fcb8c6f9 Revert "Tune zio buffer caches and their alignments"
This reverts commit bd7a02c251 which
can trigger an unlikely existing bio alignment issue on Linux.
This change is good, but the underlying issue it exposes needs to
be resolved before this can be re-applied.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #15533
2023-11-28 09:03:58 -08:00
Tony Hutter 55dd24c4cc Tag zfs-2.2.1
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2023-11-20 13:20:56 -08:00
Tony Hutter 78287023ce ZTS: Fix 'could not unmount datasets' on Alma 9
Many tests are failing on AlmaLinux 9 because ZTS could not destroy the
pool in cleanup.  This was due to $PWD being set to '.' instead of the
expected full path.  This patch sets $PWD to the full path.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2023-11-20 13:20:56 -08:00
Tony Hutter 479dca51c6 zfs-2.2.1: Disable block cloning by default
Disable block cloning by default to mitigate possible data corruption
(see #15529 and #15526).

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2023-11-16 14:23:03 -08:00
Rich Ercolani 87e9e82865 Add a tunable to disable BRT support.
Copy the disable parameter that FreeBSD implemented, and extend it to
work on Linux as well, until we're sure this is stable.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15529
2023-11-16 14:23:03 -08:00
Umer Saleem 0733fe2aa5 Packaging: Auto-generate changelog during configure (#15528)
Auto-generate changelog based off on @VERSION@ during configure,
so that it is not needed to be update with new releases / version
updates.

Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2023-11-16 14:23:03 -08:00
Tony Hutter fd836dfe24 Linux 6.6 compat: META
Update the META file to reflect compatibility with the 6.6 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15520
2023-11-16 14:23:03 -08:00
Tony Hutter e92a680c70 Workaround UBSAN errors for variable arrays
This gets around UBSAN errors when using arrays at the end of
structs.  It converts some zero-length arrays to variable length
arrays and disables UBSAN checking on certain modules.

It is based off of the patch from #15460.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Co-authored-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Issue #15145
Closes #15510
2023-11-16 14:23:03 -08:00
Umer Saleem f1659cc782 ZTS: Test for all known zpool feature sets
zpool_create_features_007_pos only tested for compat-2020 feature
set. It would be useful to test for all known features sets. If
any additional feature is found enabled that is not present in
compatibility list or feature set, it should be caught and
reported earlier.

This commit also removes encryption from openzfsonosx-1.8.1
compatibility list. Encryption enables bookmark_v2, since it is
a dependency of encryption, but not listed in openzfsonoxx-1.8.1
compatibility list.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15505
2023-11-16 14:23:03 -08:00
Umer Saleem f863ac3d0f Update zpool-features.7 for grub2 compatibility list updates
This commit updates zpool-features.7 man page to add newly added
zpool features to grub2 compatibility list.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15505
2023-11-16 14:23:03 -08:00
AllKind f6d2e5c075 Workaround to allow openzfs-zfs-dkms install on Ubuntu
As shown in #15404#issuecomment-1765002181, Ubuntu kernel has
'Provides: zfs-dkms', which will cause uninstall of the kernel, when
attempting to install openzfs-zfs-dkms.
As a workaround remove the 'Conflicts: zfs-dkms' definition from
the debian control file.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15503
2023-11-16 14:23:03 -08:00
Low-power f2fe4d51a8 Linux: reject read/write mapping to immutable file only on VM_SHARED
Private read/write mapping can't be used to modify the mapped files, so
they will remain be immutable. Private read/write mappings are usually
used to load the data segment of executable files, rejecting them will
rendering immutable executable files to stop working.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #15344
2023-11-16 14:23:03 -08:00
MigeljanImeri 76663fe372 Fix accounting error for pending sync IO ops in zpool iostat
Currently vdev_queue_class_length is responsible for checking how long
the queue length is, however, it doesn't check the length when a list
is used, rather it just returns whether it is empty or not. To fix this
I added a counter variable to vdev_queue_class to keep track of the sync
IO ops, and changed vdev_queue_class_length to reference this variable
instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: MigeljanImeri <ImeriMigel@gmail.com>
Closes #15478
2023-11-16 14:23:03 -08:00
Umer Saleem 44c8ff9b0c Linux 6.6 compat: fix implicit conversion error with debug build
With Linux v6.6.0 and GCC 12, when debug build is configured,
implicit conversion error is raised while converting
'enum <anonymous>' to 'boolean_t'. Use 'B_TRUE' instead of
'true' to fix the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15489
2023-11-16 14:23:03 -08:00
Umer Saleem f0ffcc3adc Remove obsolete_counts from grub2 compatibility list
PR#15459 add all read-only compatible zpool features to grub2
compatibility list. 'obsolete_counts' is a read-only features that
depends on 'device_removal' feature which is not read-only and
is marked as ZFEATURE_FLAG_MOS. Creating a pool with grub2
compatibility enables 'device_removal' feature as well, which is
not desired.

This commit removes the 'obsolete_counts' feature from
grub2 compatibility list, as GRUB only supports read-only
compatible features.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15499
2023-11-16 14:23:03 -08:00
AllKind e534ba5ce7 Fix dkms installation of deb packages created with Alien.
Alien does not honour the %posttrans hook.
So move the dkms uninstall/install scripts to the
 %pre/%post hooks in case of package install/upgrade.
In case of package removal, handle that in %preun.
Add removal of all old dkms modules.
Add checking for broken 'dkms status'. Handle that as
good as possible and warn the user about it.
Also add more verbose messages about what we are doing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15415
2023-11-16 14:23:03 -08:00
Umer Saleem 1c7048357d Add all read-only compatible zpool features to grub2 compatibility
GRUB opens the boot pool in read-only mode. All read-only
compatible features for zpool can be enabled and added to
grub2 compatibility, as GRUB does not open the boot-pool
for write.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15459
2023-11-16 14:23:03 -08:00
Alexander Motin 3ec4ea68d4 Unify arc_prune_async() code
There is no sense to have separate implementations for FreeBSD and
Linux.  Make Linux code shared as more functional and just register
FreeBSD-specific prune callback with arc_add_prune_callback() API.

Aside of code cleanup this should fix excessive pruning on FreeBSD:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274698

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Johnston <markj@FreeBSD.org>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15456
2023-11-08 12:15:41 -08:00
Alexander Motin bd7a02c251 Tune zio buffer caches and their alignments
We should not always use PAGESIZE alignment for caches bigger than
it and SPA_MINBLOCKSIZE otherwise.  Doing that caches for 5, 6, 7,
10 and 14KB rounded up to 8, 12 and 16KB respectively make no sense.
Instead specify as alignment the biggest power-of-2 divisor.  This
way 2KB and 6KB caches are both aligned to 2KB, while 4KB and 8KB
are aligned to 4KB.

Reduce number of caches to half-power of 2 instead of quarter-power
of 2.  This removes caches difficult for underlying allocators to
fit into page-granular slabs, such as: 2.5, 3.5, 5, 7, 10KB, etc.
Since these caches are mostly used for transient allocations like
ZIOs and small DBUF cache it does not worth being too aggressive.
Due to the above alignment issue some of those caches were not
working properly any way.  6KB cache now finally has a chance to
work right, placing 2 buffers into 3 pages, that makes sense.

Remove explicit alignment in Linux user-space case.  I don't think
it should be needed any more with the above fixes.

As result on FreeBSD instead of such numbers of pages per slab:

vm.uma.zio_buf_comb_16384.keg.ppera: 4
vm.uma.zio_buf_comb_14336.keg.ppera: 4
vm.uma.zio_buf_comb_12288.keg.ppera: 3
vm.uma.zio_buf_comb_10240.keg.ppera: 3
vm.uma.zio_buf_comb_8192.keg.ppera: 2
vm.uma.zio_buf_comb_7168.keg.ppera: 2
vm.uma.zio_buf_comb_6144.keg.ppera: 2   <= Broken
vm.uma.zio_buf_comb_5120.keg.ppera: 2
vm.uma.zio_buf_comb_4096.keg.ppera: 1
vm.uma.zio_buf_comb_3584.keg.ppera: 7   <= Hard to free
vm.uma.zio_buf_comb_3072.keg.ppera: 3
vm.uma.zio_buf_comb_2560.keg.ppera: 2
vm.uma.zio_buf_comb_2048.keg.ppera: 1
vm.uma.zio_buf_comb_1536.keg.ppera: 2
vm.uma.zio_buf_comb_1024.keg.ppera: 1
vm.uma.zio_buf_comb_512.keg.ppera: 1

I am now getting such:

vm.uma.zio_buf_comb_16384.keg.ppera: 4
vm.uma.zio_buf_comb_12288.keg.ppera: 3
vm.uma.zio_buf_comb_8192.keg.ppera: 2
vm.uma.zio_buf_comb_6144.keg.ppera: 3   <= Fixed, 2 in 3 pages
vm.uma.zio_buf_comb_4096.keg.ppera: 1
vm.uma.zio_buf_comb_3072.keg.ppera: 3
vm.uma.zio_buf_comb_2048.keg.ppera: 1
vm.uma.zio_buf_comb_1536.keg.ppera: 2
vm.uma.zio_buf_comb_1024.keg.ppera: 1
vm.uma.zio_buf_comb_512.keg.ppera: 1

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15452
2023-11-08 12:15:41 -08:00
Alexander Motin e82e68400a DMU: Do not pre-read holes during write
dmu_tx_check_ioerr() pre-reads blocks that are going to be dirtied
as part of transaction to both prefetch them and check for errors.
But it makes no sense to do it for holes, since there are no disk
reads to prefetch and there can be no errors.  On the other side
those blocks are anonymous, and they are freed immediately by the
dbuf_rele() without even being put into dbuf cache, so we just
burn CPU time on decompression and overheads and get absolutely
no result at the end.

Use of dbuf_hold_impl() with fail_sparse parameter allows to skip
the extra work, and on my tests with sequential 8KB writes to empty
ZVOL with 32KB blocks shows throughput increase from 1.7 to 2GB/s.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15371
2023-11-08 12:15:41 -08:00
Coleman Kane 3f67e012e4 Linux 6.6 compat: fsync_bdev() has been removed in favor of sync_blockdev()
In Linux commit 560e20e4bf6484a0c12f9f3c7a1aa55056948e1e, the
fsync_bdev() function was removed in favor of sync_blockdev() to do
(roughly) the same thing, given the same input. This change
conditionally attempts to call sync_blockdev() if fsync_bdev() isn't
discovered during configure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15263
2023-11-08 12:15:41 -08:00
Coleman Kane 21875dd090 Linux 6.6 compat: generic_fillattr has a new u32 request_mask added at arg2
In commit 0d72b92883c651a11059d93335f33d65c6eb653b, a new u32 argument
for the request_mask was added to generic_fillattr. This is the same
request_mask for statx that's present in the most recent API implemented
by zpl_getattr_impl. This commit conditionally adds it to the
zpl_generic_fillattr(...) macro, as well as the zfs_getattr_fast(...)
implementation, when configure determines it's present in the kernel's
generic_fillattr(...).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15263
2023-11-08 12:15:41 -08:00
Coleman Kane fe9d409e90 Linux 6.6 compat: use inode_get/set_ctime*(...)
In Linux commit 13bc24457850583a2e7203ded05b7209ab4bc5ef, direct access
to the i_ctime member of struct inode was removed. The new approach is
to use accessor methods that exclusively handle passing the timestamp
around by value. This change adds new tests for each of these functions
and introduces zpl_* equivalents in include/os/linux/zfs/sys/zpl.h. In
where the inode_get/set_ctime*() functions exist, these zpl_* calls will
be mapped to the new functions. On older kernels, these macros just wrap
direct-access calls. The code that operated on an address of ip->i_ctime
to call ZFS_TIME_DECODE() now will take a local copy using
zpl_inode_get_ctime(), and then pass the address of the local copy when
performing the ZFS_TIME_DECODE() call, in all cases, rather than
directly accessing the member.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15263
Closes #15257
2023-11-08 12:15:41 -08:00
shodanshok 7aef672b77 Read prefetched buffers from L2ARC
Prefetched buffers are currently read from L2ARC if, and only if,
l2arc_noprefetch is set to non-default value of 0. This means that
a streaming read which can be served from L2ARC will instead engage
the main pool.

For example, consider what happens when a file is sequentially read:
- application requests contiguous data, engaging the prefetcher;
- ARC buffers are initially marked as prefetched but, as the calling
application consumes data, the prefetch tag is cleared;
- these "normal" buffers become eligible for L2ARC and are copied to it;
- re-reading the same file will *not* engage L2ARC even if it contains
the required buffers;
- main pool has to suffer another sequential read load, which (due to
most NCQ-enabled HDDs preferring sequential loads) can dramatically
increase latency for uncached random reads.

In other words, current behavior is to write data to L2ARC (wearing it)
without using the very same cache when reading back the same data. This
was probably useful many years ago to preserve L2ARC read bandwidth but,
with current SSD speed/size/price, it is vastly sub-optimal.

Setting l2arc_noprefetch=1, while enabling L2ARC to serve these reads,
means that even prefetched but unused buffers will be copied into L2ARC,
further increasing wear and load for potentially not-useful data.

This patch enable prefetched buffer to be read from L2ARC even when
l2arc_noprefetch=1 (default), increasing sequential read speed and
reducing load on the main pool without polluting L2ARC with not-useful
(ie: unused) prefetched data. Moreover, it clear users confusion about
L2ARC size increasing but not serving any IO when doing sequential
reads.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #15451
2023-11-06 16:47:51 -08:00
Thomas Bertschinger f9a9aea126 Add mutex_enter_interruptible() for interruptible sleeping IOCTLs
Many long-running ZFS ioctls lock the spa_namespace_lock, forcing
concurrent ioctls to sleep for the mutex. Previously, the only
option is to call mutex_enter() which sleeps uninterruptibly. This
is a usability issue for sysadmins, for example, if the admin runs
`zpool status` while a slow `zpool import` is ongoing, the admin's
shell will be locked in uninterruptible sleep for a long time.

This patch resolves this admin usability issue by introducing
mutex_enter_interruptible() which sleeps interruptibly while waiting
to acquire a lock. It is implemented for both Linux and FreeBSD.

The ZFS_IOC_POOL_CONFIGS ioctl, used by `zpool status`, is changed to
use this new macro so that the command can be interrupted if it is
issued during a concurrent `zpool import` (or other long-running
operation).

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Closes #15360
2023-11-06 16:47:41 -08:00
Tony Hutter 8ba748d414 Revert "zvol: Temporally disable blk-mq"
This reverts commit aefb6a2bd6.

aefb6a2bd temporally disabled blk-mq until we could fix a fix for

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15439
2023-11-06 16:47:32 -08:00
Tony Hutter e860cb0200 zvol: Remove broken blk-mq optimization
This fix removes a dubious optimization in zfs_uiomove_bvec_rq()
that saved the iterator contents of a rq_for_each_segment().  This
optimization allowed restoring the "saved state" from a previous
rq_for_each_segment() call on the same uio so that you wouldn't
need to iterate though each bvec on every zfs_uiomove_bvec_rq() call.
However, if the kernel is manipulating the requests/bios/bvecs under
the covers between zfs_uiomove_bvec_rq() calls, then it could result
in corruption from using the "saved state".  This optimization
results in an unbootable system after installing an OS on a zvol
with blk-mq enabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15351
2023-11-06 16:47:24 -08:00
ofthesun9 86c3ed40e1 "ARC prefetch metadata accesses:" appears twice in the output.
The first occurrence should be "ARC prefetch data accesses:"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: ofthesun9 <olivier@ofthesun.net>
Closes #15427
2023-11-06 16:47:14 -08:00
Alexander Motin 6e41aca519 Trust ARC_BUF_SHARED() more
In my understanding ARC_BUF_SHARED() and arc_buf_is_shared() should
return identical results, except the second also asserts it deeper.
The first is much cheaper though, saving few pointer dereferences.
Replace production arc_buf_is_shared() calls with ARC_BUF_SHARED(),
and call arc_buf_is_shared() in random assertions, while making it
even more strict.

On my tests this in half reduces arc_buf_destroy_impl() time, that
noticeably reduces hash_lock congestion under heavy dbuf eviction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15397
2023-11-06 16:47:05 -08:00
Alexander Motin 79f7de5752 Remove lock from dsl_pool_need_dirty_delay()
Torn reads/writes of dp_dirty_total are unlikely: on 64-bit systems
due to register size, while on 32-bit due to memory constraints.
And even if we hit some race, the code implementing the delay takes
the lock any way.

Removal of the poll-wide lock acquisition saves ~1% of CPU time on
8-thread 8KB write workload.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15390
2023-11-06 16:46:55 -08:00
VaibhavB 0ef1964c79 run-zts test procfs/pool_state failed with uncorrectable I/O failure
Once we trigger the zpool scrub, all zpool/zfs command gets stuck for 
180 seconds. Post 180 seconds zpool/zfs commands gets start executing 
however few more seconds(10s) it take to update the status. hence 
sleeping for 200 seconds so that we get the correct status.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: vaibhav.bhanawat <vaibhav.bhanawat@delphix.com>
Closes #15364
2023-11-06 16:46:49 -08:00
Alexander Motin eaa62d9951 Properly pad struct tx_cpu to cache line
We already use ____cacheline_aligned in many places, so add one more
instead of seems arbitrary char tc_pad[8].

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15402
2023-11-06 16:46:44 -08:00
dennisfriedrichsen 8ca95d78c5 Fix typo in tests/zfs-tests/tests/functional/cli_user/misc/misc.cfg
Reviewed-by: Rob N <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com>
Closes #15417
2023-11-06 16:46:37 -08:00
Olivier Certner edebca5dfc FreeBSD: taskq: Remove unused declaration
Variable 'uma_align_cache' has not been used since commit "FreeBSD: Use
a hash table for taskqid lookups" (3933305ea).  Moreover, it is soon
going to become private to FreeBSD's UMA in 15.0-CURRENT (main),
14.0-STABLE (stable/14) and 13.2-STABLE (stable/13).  Should accessing
this information become necessary again, one will have to use the new
accessors for recent versions.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olivier Certner <olce.freebsd@certner.fr>
Closes #15416
2023-11-06 16:46:32 -08:00
Colin Percival 1cc1bf4fa7 Set spa_ccw_fail_time=0 when expanding a vdev.
When a vdev is to be expanded -- either via `zpool online -e` or via
the autoexpand option -- a SPA_ASYNC_CONFIG_UPDATE request is queued
to be handled via an asynchronous worker thread (spa_async_thread).
This normally happens almost immediately; but will be delayed up to
zfs_ccw_retry_interval seconds (default 5 minutes) if an attempt to
write the zpool configuration cache failed.

When FreeBSD boots ZFS-root VM images generated using `makefs -t zfs`,
the zpoolupgrade rc.d script runs `zpool upgrade`, which modifies the
pool configuration and triggers an attempt to write to the cache file.
This attempted write fails because the filesystem is still mounted
read-only at this point in the boot process, triggering a 5-minute
cooldown before SPA_ASYNC_CONFIG_UPDATE requests will be handled by
the asynchronous worker thread.

When expanding a vdev, reset the "when did a configuration cache
write last fail" value so that the SPA_ASYNC_CONFIG_UPDATE request
will be handled promptly.  A cleaner but more intrusive option would
be to use separate SPA_ASYNC_ flags for "configuration changed" and
"try writing the configuration cache again", but with FreeBSD 14.0
coming very soon I'd prefer to leave such refactoring for a later
date.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colin Percival <cperciva@FreeBSD.org>
Closes #15405
2023-11-06 16:46:25 -08:00
Don Brady 0bcd1151f0 Fix ZED auto-replace for VDEVs using by-id paths
The change is simple -- restore the original code so that the VDEV 
path is updated when using by-id paths.  The more challenging part 
was to devise a second ZTS test, that would test auto-replace for 
'by-id' and help prevent a future regression.

With that new test, we can now do an A|B test with , and without, 
the fix to confirm that auto-replace for by-id paths works. The 
existing auto-replace test, functional/fault/auto_replace_001_pos, 
will confirm that we didn't break auto-replace for 'by-vdev' paths.

In the original functional/fault/auto_replace_001_pos test, the disk 
wipe (using dd) was not effective in removing the partitioning since 
the kernel was never informed of the wipe.

Added a call to wipefs(8) so that the kernel is informed and ZED will 
re-partition the device.
    
Added a validation step that the re-partitioning occurred by
confirming  that the GPT partition UUID changes.

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15363
2023-11-06 16:45:14 -08:00
Tony Hutter 78fd79eacd Add zfs_prepare_disk script for disk firmware install
Have libzfs call a special `zfs_prepare_disk` script before a disk is
included into the pool.  The user can edit this script to add things
like a disk firmware update or a disk health check.  Use of the script
is totally optional. See the zfs_prepare_disk manpage for full details.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15243
2023-11-06 16:45:07 -08:00
John Wren Kennedy 6d693e20a2 Large sync writes perform worse with slog
For synchronous write workloads with large IO sizes, a pool configured
with a slog performs worse than one with an embedded zil:

sequential_writes 1m sync ios, 16 threads
  Write IOPS:              1292          438   -66.10%
  Write Bandwidth:      1323570       448910   -66.08%
  Write Latency:       12128400     36330970      3.0x

sequential_writes 1m sync ios, 32 threads
  Write IOPS:              1293          430   -66.74%
  Write Bandwidth:      1324184       441188   -66.68%
  Write Latency:       24486278     74028536      3.0x

The reason is the `zil_slog_bulk` variable. In `zil_lwb_write_open`,
if a zil block is greater than 768K, the priority of the write is
downgraded from sync to async. Increasing the value allows greater
throughput. To select a value for this PR, I ran an fio workload with
the following values for `zil_slog_bulk`:

    zil_slog_bulk    KiB/s
    1048576         422132
    2097152         478935
    4194304         533645
    8388608         623031
    12582912        827158
    16777216       1038359
    25165824       1142210
    33554432       1211472
    50331648       1292847
    67108864       1308506
    100663296      1306821
    134217728      1304998

At 64M, the results with a slog are now improved to parity with an
embedded zil:

sequential_writes 1m sync ios, 16 threads
  Write IOPS:               438         1288      2.9x
  Write Bandwidth:       448910      1319062      2.9x
  Write Latency:       36330970     12163408   -66.52%

sequential_writes 1m sync ios, 32 threads
  Write IOPS:               430         1290      3.0x
  Write Bandwidth:       441188      1321693      3.0x
  Write Latency:       74028536     24519698   -66.88%

None of the other tests in the performance suite (run with a zil or
slog) had a significant change, including the random_write_zil tests,
which use multiple datasets.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com>
Closes #14378
2023-11-06 16:33:23 -08:00
Alexander Motin b76724ae47 FreeBSD: Improve taskq wrapper
- Group tqent_task and tqent_timeout_task into a union.  They are
never used same time. This shrinks taskq_ent_t from 192 to 160 bytes.
 - Remove tqent_registered.  Use tqent_id != 0 instead.
 - Remove tqent_cancelled.  Use taskqueue pending counter instead.
 - Change tqent_type into uint_t.  We don't need to pack it any more.
 - Change tqent_rc into uint_t, matching refcount(9).
 - Take shared locks in taskq_lookup().
 - Call proper taskqueue_drain_timeout() for TIMEOUT_TASK in
taskq_cancel_id() and taskq_wait_id().
 - Switch from CK_LIST to regular LIST.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15356
2023-11-06 16:33:18 -08:00
Martin Matuška 459c99ff23 Fix block cloning between unencrypted and encrypted datasets
Block cloning from an encrypted dataset into an unencrypted dataset
and vice versa is not possible. The current code did allow cloning
unencrypted files into an encrypted dataset causing a panic when
these were accessed. Block cloning between encrypted and encrypted
is currently supported on the same filesystem only.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Rob N <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15464
Closes #15465
2023-11-06 10:40:50 -08:00
Brian Behlendorf 95785196f2 Tag 2.2.0
New Features
- Block cloning (#13392)
- Linux container support (#14070, #14097, #12263)
- Scrub error log (#12812, #12355)
- BLAKE3 checksums (#12918)
- Corrective "zfs receive"
- Vdev and zpool user properties

Performance
- Fully adaptive ARC (#14359)
- SHA2 checksums (#13741)
- Edon-R checksums (#13618)
- Zstd early abort (#13244)
- Prefetch improvements (#14603, #14516, #14402, #14243, #13452)
- General optimization (#14121, #14123, #14039, #13680, #13613,
  #13606, #13576, #13553, #12789, #14925, #14948)

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-10-12 16:19:17 -07:00
Jason King 2bba9fd479 Zpool can start allocating from metaslab before TRIMs have completed
When doing a manual TRIM on a zpool, the metaslab being TRIMmed is
potentially re-enabled before all queued TRIM zios for that metaslab
have completed. Since TRIM zios have the lowest priority, it is 
possible to get into a situation where allocations occur from the 
just re-enabled metaslab and cut ahead of queued TRIMs to the same 
metaslab.  If the ranges overlap, this will cause corruption.

We were able to trigger this pretty consistently with a small single 
top-level vdev zpool (i.e. small number of metaslabs) with heavy 
parallel write activity while performing a manual TRIM against a 
somewhat 'slow' device (so TRIMs took a bit of time to complete). 
With the patch, we've not been able to recreate it since. It was on 
illumos, but inspection of the OpenZFS trim code looks like the 
relevant pieces are largely unchanged and so it appears it would be 
vulnerable to the same issue.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason King <jking@racktopsystems.com>
Illumos-issue: https://www.illumos.org/issues/15939
Closes #15395
2023-10-12 11:05:20 -07:00
Brian Behlendorf 30ee2ee8ec spec: define _bashcompletiondir if undefined
Always define _bashcompletiondir in the spec file to a reasonable value
when it is undefined.  Required for `rpmbuild --rebuild <srpm>`.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15396
2023-10-11 16:58:06 -07:00
Brian Behlendorf d7b6e470ff ZTS: Debug zfs_share_concurrent_shares failure
Update zfs_share_concurrent_shares test case to wait a few seconds
and recheck that the filesystem isn't shared.  The intent here is
determine the nature of the error and if it may be a race.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by:  Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15379
2023-10-10 19:19:09 -07:00
Brian Behlendorf 04186d33be CI: Move perl script to dist_noinst_DATA
Everything listed in dist_noinst_SCRIPTS is assumed to be a shell
script, this generates a shellcheck SC1071 error since perl is not
supported.  Move update_authors.pl to dist_noinst_DATA with the
other perl scripts.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob N <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15392
2023-10-10 19:19:09 -07:00
Daniel Berlin 810fc49a3e Ensure we call fput when cloning fails due to different devices.
Right now, zpl_ioctl_ficlone and zpl_ioctl_ficlonerange do not call
put on the src fd if the source and destination are on two different
devices.  This leaves the source file held open in this case.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Daniel Berlin <dberlin@dberlin.org>
Closes #15386
2023-10-10 19:19:09 -07:00
Brian Behlendorf 75a7740574 ZTS: Remove zfs_allow_010_pos expection for FreeBSD
This issue should now be address by PR #15376 and the exception
for this test case be removed.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by:  Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15382
2023-10-10 19:19:09 -07:00
Tony Hutter a80e1f1c90 zvol: Temporally disable blk-mq
There was a report of zvol data loss (#15351) after enabling blk-mq on a
zvol backed with 16k physical block sized disks.  Out of an abundance of
caution, do not allow the user to enable blk-mq until we can look into
the issue.

Note that blk-mq was not enabled by default on zvols.  It was always
opt-in via the zvol_use_blk_mq module parameter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Addresses: #15351
Closes #15378
2023-10-10 19:19:09 -07:00
Rob Norris 111ae3364c AUTHORS: update with missing names
This is generated by scripts/update_authors.pl. I've looked over the
results fairly closely and while I don't think they're bad, they could
be improved somewhat, but also, I don't know if its good form to just
update this without explicit consent from those named.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15374
2023-10-10 19:19:09 -07:00
Rob Norris 3990273ffe update_authors: add missing names from commits to AUTHORS
Full description of what's happening in comments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15374
2023-10-10 19:19:09 -07:00
Rob Norris da93b72c91 mailmap: initial, trying to tidy up a lot of the commit history
This comes from the observation that a huge number of commit author
fields look quite strange (to my eyes), but quite often the
Signed-off-by: trailer has the correct name. For these I have updated
the name where it was obvious how to do so, however, I have not created
a mapping for the commit email to the Signed-off-by email, as whatever I
choose for email will become the prime candidate for inclusion in the
AUTHORS file, and care needs to be taken when acting without explicit
consent.

There's a small handful of commits that look like they were done on
local machines, or CI hosts, or similar, where the git authorship config
wasn't set up properly. Its obvious what this should look like, so I've
just done them.

The remainder is mapping Github noreply emails to either an
obviously-correct Signed-off-by trailer, or to a an author from another
commit. This was mostly done by hand, so there may be errors, but I
think its close. I do not understand where these come from - I know that
they're what commits made via Github web look like when there's no real
address set on the account, but I find it hard to believe that so many
of these came through the web, especially given the complexity of most
of the changes. I suspect there's some kind of merge helper tool in play
here. Regardless, the history is set now, and this tries to get it back
on track.

Obviously, all of this helps the history look tidy, but this also feeds
into the AUTHORS update script. See next commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15374
2023-10-10 19:19:09 -07:00
Umer Saleem 9fa06c5574 ZTS: Fix verify_fs_mount in delegate_common.kshlib
verify_fs_mount expects the dataset to remain unmounted after
updating the mountpoint property in delegate_common.kshlib.

This commit updates verify_fs_mount and uses nomount parameter
for zfs set to update the mountpoint property without mounting
the dataset.

This fixes the zfs_allow_010_pos test case, which was failing on
FreeBSD after the behavior update in setting the mountpoint
property.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15376
2023-10-10 19:19:09 -07:00
Brian Behlendorf 8d47d2d579 ZTS: Move zpool_import_hostid_changed* tests to Linux runfile
Relocate the zpool_import_hostid_changed* test cases to the Linux
runfile until these tests are modified to run cleanly on FreeBSD.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15377
2023-10-10 19:19:09 -07:00
Alexander Motin f6e6e77ed8 FreeBSD: Reduce divergence from in-tree sources
This includes random small tweaks, primarily a build fixes, required
when ZFS is built as part of FreeBSD base.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15368
2023-10-10 19:19:09 -07:00
Sam James 120d1787d7 config/zfs-build.m4: add Gentoo's bash-completion path
Followup e69ade32e1 by adding Gentoo's
bash completion path.

We should probably consider using/honouring the standard --with-bashcompletiondir
autoconf option as well, but that's something to do later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Sam James <sam@gentoo.org>
Closes #15372
2023-10-10 19:19:09 -07:00
Brian Behlendorf 2407f30bda Tag 2.2.0-rc5
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-10-07 09:14:37 -07:00
Alexander Motin 9be8ddfb3c ZIL: Reduce maximum size of WR_COPIED to 7.5K
Benchmarks show that at certain write sizes range lock/unlock take
not so much time as extra memory copy.  The exact threshold is not
obvious due to other overheads, but it is definitely lower than
~63KB used before.  Make it configurable, defaulting at 7.5KB,
that is 8KB of nearest malloc() size minus itx and lr structs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15353
2023-10-07 09:08:20 -07:00
siv0 3755cde22a rpm: Fix make rpm on Debian/Ubuntu
The recent patch to change the bash completion install location based
on the Distribution, ignored that it should still be possible to
create RPMs on Debian derived systems. Additionally `make deb` itself
creates RPMs and converts them via `alien`.

This patch adds the bashcompletiondir variable to the rpm defines and
uses this for the location, where to get the bash completion file.

It still changes the location on Debian/Ubuntu systems in the final
packages from /etc/bash_completion.d to
/usr/share/bash-completion/completions

Fixes: e69ade32e1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15355
Closes #15365
2023-10-07 09:08:20 -07:00
Rob Norris 33d7c2d165 import: require force when cachefile hostid doesn't match on-disk
Previously, if a cachefile is passed to zpool import, the cached config
is mostly offered as-is to ZFS_IOC_POOL_TRYIMPORT->spa_tryimport(), and
the results are taken as the canonical pool config and handed back to
ZFS_IOC_POOL_IMPORT.

In the course of its operation, spa_load() will inspect the pool and
build a new config from what it finds on disk. However, it then
regenerates a new config ready to import, and so rightly sets the hostid
and hostname for the local host in the config it returns.

Because of this, the "require force" checks always decide the pool is
exported and last touched by the local host, even if this is not true,
which is possible in a HA environment when MMP is not enabled. The pool
may be imported on another head, but the import checks still pass here,
so the pool ends up imported on both.

(This doesn't happen when a cachefile isn't used, because the pool
config is discovered in userspace in zpool_find_import(), and that does
find the on-disk hostid and hostname correctly).

Since the systemd zfs-import-cache.service unit uses cachefile imports,
this can lead to a system returning after a crash with a "valid"
cachefile on disk and automatically, quietly, importing a pool that has
already been taken up by a secondary head.

This commit causes the on-disk hostid and hostname to be included in the
ZPOOL_CONFIG_LOAD_INFO item in the returned config, and then changes the
"force" checks for zpool import to use them if present.

This method should give no change in behaviour for old userspace on new
kernels (they won't know to look for the new config items) and for new
userspace on old kernels (the won't find the new config items).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15290
2023-10-07 09:08:20 -07:00
Rob Norris 2919784be2 tests: add tests for zpool import behaviour when hostid changes
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15290
2023-10-07 09:08:20 -07:00
Rob N 8495536f7f zfsconcepts: add description of block cloning
Here I'm trying to succinctly introduce the concept, the basics of its
construction, how its different to dedup, how to use it, and where its
limitations lie, in four paragraphs and with enough searchable terms to
help the reader find more information both within OpenZFS and elsewhere.

Phew.

Sponsored-By: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15362
2023-10-07 09:08:20 -07:00
Alexander Motin bcd010d3a5 Reduce number of metaslab preload taskq threads.
Before this change ZFS created threads for 50% of CPUs for each top-
level vdev.  Plus it created the same number of threads for embedded
log groups (that have only one metaslab and don't need any preload).
As result, on system with 80 CPUs and pool of 60 vdevs this resulted
in 4800 metaslab preload threads, that is absolutely insane.

This patch changes the preload threads to 50% of CPUs in one taskq
per pool, so on the mentioned system it will be only 40 threads.

Among other things this fixes zdb on the mentioned system and pool
on FreeBSD, that failed to create so many threads in one process.

Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15319
2023-10-07 09:08:20 -07:00
Martin Matuška c27277daac CI: add FreeBSD build with Cirrus CI
As a first step for automatic FreeBSD testing add a build and install
for FreeBSD versions 12.4, 13.2 and 14-snapshot using Cirrus CI.

Reviewed-by: Jose Luis Duran 
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15332
2023-10-07 09:08:20 -07:00
Rob N bf54da84fb tests/block_cloning: sync before write in fallback test
We're still seeing this test fail intermittently (that is, the clone
happens), which must mean the write and the clone can still be happening
on different txgs.

It might be that there's still activity after the pool is created. So
here we force a sync before starting the write.

Sponsored-By: Klara Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15359
2023-10-07 09:08:20 -07:00
Alexander Motin 3158b5d718 ARC: Drop different size headers for crypto
To reduce memory usage ZFS crypto allocated bigger by 56 bytes ARC
headers only when specific block was encrypted on disk.  It was a
nice optimization, except in some cases the code reallocated them
on fly, that invalidated header pointers from the buffers.  Since
the buffers use different locking, it created number of races, that
were originally covered (at least partially) by b_evict_lock, used
also to protection evictions.  But it has gone as part of #14340.
As result, as was found in #15293, arc_hdr_realloc_crypt() ended
up unprotected and causing use-after-free.

Instead of introducing some even more elaborate locking, this patch
just drops the difference between normal and protected headers. It
cost us additional 56 bytes per header, but with couple patches
saving 24 bytes, the net growth is only 32 bytes with total header
size of 232 bytes on FreeBSD, that IMHO is acceptable price for
simplicity.  Additional locking would also end up consuming space,
time or both.

Reviewe-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15293
Closes #15347
2023-10-07 09:08:20 -07:00
Alexander Motin ba7797c8db ARC: Remove b_bufcnt/b_ebufcnt from ARC headers
In most cases we do not care about exact number of buffers linked
to the header, we just need to know if it is zero, non-zero or one.
That can easily be checked just looking on b_buf pointer or in some
cases derefencing it.

b_ebufcnt is read only once, and in that case we already traverse
the list as part of arc_buf_remove(), so second traverse should not
be expensive.

This reduces L1 ARC header size by 8 bytes and full crypto header by
16 bytes, down to 176 and 232 bytes on FreeBSD respectively.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15350
2023-10-07 09:08:20 -07:00
Alexander Motin bc77a0c85e ARC: Remove b_cv from struct l1arc_buf_hdr
Earlier as part of #14123 I've removed one use of b_cv.  This patch
reuses the same approach to remove the other one from much more
rare code path.

This saves 16 bytes of L1 ARC header on FreeBSD (reducing it from
200 to 184 bytes) and seems even more on Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15340
2023-10-07 09:08:20 -07:00
Andrew Turner 1611b8e56e Add BTI landing pads to the AArch64 SHA2 assembly
The Arm Branch Target Identification (BTI) extension guards against
branching to an unintended instruction.

To support BTI add the landing pad instructions to the SHA2 functions.
These are from the hint space so are a nop on hardware that lacks BTI
support or if BTI isn't enabled.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Andrew Turner <andrew.turner4@arm.com>
Closes #14862
Closes #15339
2023-10-04 12:36:21 -07:00
Umer Saleem 8015e2ea66 Add '-u' - nomount flag for zfs set
This commit adds '-u' flag for zfs set operation. With this flag,
mountpoint, sharenfs and sharesmb properties can be updated
without actually mounting or sharing the dataset.

Previously, if dataset was unmounted, and mountpoint property was
updated, dataset was not mounted after the update. This behavior
is changed in #15240. We mount the dataset whenever mountpoint
property is updated, regardless if it's mounted or not.

To provide the user with option to keep the dataset unmounted and
still update the mountpoint without mounting the dataset, '-u'
flag can be used.

If any of mountpoint, sharenfs or sharesmb properties are updated
with '-u' flag, the property is set to desired value but the
operation to (re/un)mount and/or (re/un)share the dataset is not
performed and dataset remains as it was before.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15322
2023-10-03 15:41:46 -07:00
Umer Saleem c53bc3837c Improve the handling of sharesmb,sharenfs properties
For sharesmb and sharenfs properties, the status of setting the
property is tied with whether we succeed to share the dataset or
not. In case sharing the dataset is not successful, this is
treated as overall failure of setting the property. In this case,
if we check the property after the failure, it is set to on.

This commit updates this behavior and the status of setting the
share properties is not returned as failure, when we fail to
share the dataset.

For sharenfs property, if access list is provided, the syntax
errors in access list/host adresses are not validated until after
setting the property during postfix phase while trying to
share the dataset. This is not correct, since the property has
already been set when we reach there.

Syntax errors in access list/host addresses are validated while
validating the property list, before setting the property and
failure is returned to user in this case when there are errors
in access list.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15240
2023-10-03 15:41:46 -07:00
Umer Saleem e9dc31c74e Update the behavior of mountpoint property
There are some inconsistencies in the handling of mountpoint
property. This commit updates the behavior and makes it
consistent.

If mountpoint property is set when dataset is unmounted, this
would update the mountpoint property. The mountpoint could be
valid or invalid in this case. Setting the mountpoint property
would result in success in this case. Dataset would still be
unmounted here.

On the other hand, if dataset is mounted and mountpoint
property is updated to something invalid where mount cannot be
successful, for example, setting the mountpoint inside a readonly
directory. This would unmount the dataset, set the mountpoint
property to requested value and tries to mount the dataset. The
mount operation returns error and this error is treated as
overall failure of setting the property while the property is
actually set.

To make the behavior consistent in case dataset is mounted or
unmounted, we should try to mount the dataset whenever mountpoint
property is updated. This would result in mounting the datasets
if canmount property is set to on, regardless if the dataset was
previously unmounted.

The failure in mount operation while setting the mountpoint
property should not be treated as failure, since the property is
actually set now to user requested value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15240
2023-10-03 15:41:46 -07:00
Stoiko Ivanov b04b13ae79 contrib: debian: drop bashcompletion mangling after install
tested by running:
```
./configure --with-config=user; cp -a contrib/debian .
dpkg-buildpackage -b -uc -us
```
on a Debian 12 based system.

and checking where the completion file got installed.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15304
2023-10-03 09:06:07 -07:00
Stoiko Ivanov 7b1d421adf contrib: debian: switch to dh-sequence-dkms
Follows b191f9a13d3005621ead9a727b811892264505ef from Debian's
packaging team at:
https://salsa.debian.org/zfsonlinux-team/zfs/

The previous build-dependency is kept as option, to still be able to
build on older Debian based distros (e.g. Ubuntu 20.04).

Without this building on Debian 12/bookworm does not work, as `dkms`
is a virtual package.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15304
2023-10-03 09:06:07 -07:00
Stoiko Ivanov db5c3b4c76 contrib: bash_completion.d: make install destination vendor dependent
Certain Linux distributions (Debian/Ubuntu at least) expect
bash-completion snippets to be installed in
/usr/share/bash-completion/completions instead of
/etc/bash_completion.d.

This patch sets the bashcompletiondir variable based on the vendor,
inspired by similar settings for initdir and initconfdir.

It seems that commit 612b8dff5b
caused the file to be installed in the first-place (thus the error
when building debian packages only became apparent when testing a
2.2.0-rc4 build)

The change only sets the variable in Makefile context - the
rpm/zfs.spec.in file has the path hardcoded as
%{_sysconfdir}/bash_completion.d/zfs, but since running
```
./configure --sysconfdir=/myetc  ; make rpm
```
also results in all relevant files to be installed in /etc instead of
/myetc I assume this can remain as is.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #15304
2023-10-03 09:06:07 -07:00
Chunwei Chen 0d870a1775 Fix invalid pointer access in trace_dbuf.h
In dnode_destroy, dn_objset is invalidated. However, it will later call
into dbuf_destroy, in which DTRACE_SET_STATE will try to access spa_name
via dn_objset causing illegal pointer access.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #15333
2023-10-03 09:06:07 -07:00
George Amanakis 608741d062 Report ashift of L2ARC devices in zdb
Commit 8af1104f does not actually store the ashift of cache devices in
their label. However, in order to facilitate reporting the ashift
through zdb, we enable this in the present commit. We also document
how the retrieval of the ashift is done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #15331
2023-10-03 09:06:07 -07:00
Alexander Motin 3079bf2e6c Restrict short block cloning requests
If we are copying only one block and it is smaller than recordsize
property, do not allow destination to grow beyond one block if it
is not there yet.  Otherwise the destination will get stuck with
that block size forever, that can be as small as 512 bytes, no
matter how big the destination grow later.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15321
2023-10-03 09:06:07 -07:00
Brian Behlendorf b34bf2d5f6 Tweak rebuild in-flight hard limit
Vendor testing shows we should be able to get a little more
performance if we further relax the hard limit which we're hitting.

Authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15324
2023-10-03 09:06:07 -07:00
Akash B 229ca7d738 Fix ENOSPC for extended quota
When unlinking multiple files from a pool at 100% capacity, it
was possible for ENOSPC to be returned after the first few unlinks.
This issue was fixed previously by PR #13172 but then this was
again introduced by PR #13839.

This is resolved using the existing mechanism of returning ERESTART
when over quota as long as we know enough space will shortly be
available after processing the pending deferred frees.

Also, updated the existing testcase which reliably reproduced the
issue without this patch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #15312
2023-09-28 14:28:21 -07:00
Paul Dagnelie 9e36c5769f Don't allocate from new metaslabs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15307
Closes #15308
2023-09-28 14:28:21 -07:00
Paul Dagnelie d38f4664a6 Reduce trim min size even lower for tests to reduce flakiness
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15315
2023-09-28 14:28:21 -07:00
Paul Dagnelie 99dc1fc340 ZTS: Fix introduced test bug in block_cloning_copyfilerange
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15316
2023-09-28 14:28:21 -07:00
Brian Behlendorf ba4dbbdae7 ZTS: Add additional exceptions
"zfs_share_concurrent_shares" may fail on FreeBSD and some Linux
distributions (fedora).  Move it to the common list.

"zfs_allow_010_pos" has been observed to fail on FreeBSD 13.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15306
2023-09-28 14:28:21 -07:00
Paul Dagnelie 8526b12f3d Set timeout before creating pool in test
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15309
2023-09-28 14:28:21 -07:00
Paul Dagnelie 0ce1b2ca19 Invoke zdb by guid to avoid import errors
The problem that was occurring is basically that a device was removed 
by ztest and replaced with another device. It was then reguided. The 
import then failed because there were two possible imports with the 
same name; one with the new guid, and one with the old. This can 
happen because the label writes from the device removal/replacement 
can be subject to ztest's error injection. 

The other ways to fix this would be to change the error injection to 
not trigger on removals (which may not be technically feasible), or 
to change the import code to not report configurations that are so 
short on devices (which would potentially have unpleasant end-user 
effects when trying to recover from data losses/device configuration 
issues).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15298
2023-09-28 14:28:21 -07:00
Alexander Motin 0aabd6b482 ZIL: Avoid dbuf_read() in ztest_get_data()
While working on similar patches for zfs and zvol in #15153 I've
forgot about ztest.  Update it also so that we test the same code
paths as use in production.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15301
2023-09-28 14:28:21 -07:00
Rob N 5f30698670 tests/block_cloning: try harder to stay on same txg in fallback test
We've observed this test failing intermittently. When it does, the
"same block" check shows that both files have the same content, that is,
the file was cloned.

The only way this could have happened is if the open txg moved between
the dd and clonefile calls. That's possible because although we set
zfs_txg_timeout to be large, that only affects the wait time in the sync
thread at the start of a new txg; it doesn't change anything if its
currently waiting or working.

So here we just force the txgs to move immediately before, which should
get both operations onto the same txg as intented.

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris Rob Norris <rob.norris@klarasystems.com>
Closes #15303
2023-09-22 16:13:20 -07:00
Rob N a199cac6cd status: report pool suspension state under failmode=continue
When failmode=continue is set and the pool suspends, both 'zpool status'
and the 'zfs/pool/state' kstat ignore it and report the normal vdev tree
state. There's no clear indicator that the pool is suspended. This is
unlike suspend in failmode=wait, or suspend due to MMP check failure,
which both report "SUSPENDED" explicitly.

This commit changes it so SUSPENDED is reported for failmode=continue
the same as for other modes.

Rationale:

The historical behaviour of failmode=continue is roughly, "press on as
though all is well". To this end, the fact that the pool had suspended
was not shown, to maintain the façade that all is well.

Its unclear why hiding this information was considered appropriate. One
possibility is that it was expected that a true pool fault would always
be reported as DEGRADED or FAULTED, and that the pool could not suspend
without these happening.

That is not necessarily true, as vdev health and suspend state are only
loosely connected, such that a pool in (apparent) good health can be
suspended for good reasons, and of course a degraded pool does not lead
to suspension. Even if that expectation were true, there's still a
difference in urgency - a degraded pool may not need to be attended to
for hours, while a suspended pool is most often unusable until an
operator intervenes.

An operator that has set failmode=continue has presumably done so
because their workload is one that can continue to operate in a useful
way when the pool suspends. In this case the operator still needs a
clear indicator that there is a problem that needs attending to.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15297
2023-09-22 16:13:20 -07:00
Paul Dagnelie 729507d309 Fix occasional rsend test crashes
We have occasional crashes in the rsend tests. Debugging revealed 
that this is because the send_worker thread is getting EINTR from 
splice(). This happens when a non-fatal signal is received during 
the syscall. We should retry the syscall, rather than exiting failure.
Tweak the loop to only break if the splice is finished or we receive 
a non-EINTR error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15273
2023-09-22 16:13:20 -07:00
Rob N 3af63683fe cmd: add 'help' subcommand to zpool and zfs
'program help subcommand' is a reasonably common pattern for
multifunction command-line programs. This commit adds support for that
style to the zpool and zfs commands.

When run as 'zpool help [<topic>]' or 'zfs help [<topic>]', executes the
'man' program on the PATH with the most likely manpage name for the
requested topic: "zpool-<topic>" or "zfs-<topic>" for subcommands, or
"zpool<topic>" or "zfs<topic>" for the "concepts" and "props" topics.
If no topic is supplied, uses the top "zpool" or "zfs" pages.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15288
2023-09-22 16:13:20 -07:00
Paul Dagnelie 9aa1a2878e Fix incorrect expected error in ztest
There is an occasional ztest failure that looks like ztest: attach 
(/var/tmp/zloop-run/ztest.13a 570425344, draid1-1-0 532152320, 1) 
returned 22, expected 95. This is because the value that we return 
is EINVAL, but expected_error is set incorrectly.

Change the expected_error value to match both the comment and the 
actual error value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15295
2023-09-22 16:13:20 -07:00
Paul Dagnelie cc75c816c5 Fix l2arc_apply_transforms ztest crash
In #13375 we modified the allocation size of the buffer that we use 
to apply l2arc transforms to be the size of the arc hdr we're using, 
rather than the allocation size that will be in place on the disk, 
because sometimes the hdr size is larger. Unfortunately, sometimes 
the allocation size is larger, which means that we overflow the buffer 
in that case. This change modifies the allocation to be the max of 
the two values

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15177
Closes #15248
2023-09-22 16:13:20 -07:00
Rob N 1c2aee7a52 tests: install missing PAM tests
'pam_change_unmounted' and 'pam_recursive' both exist and are referenced
by the test run config, but weren't being installed and so are excluded.
This gets them installed so they will run as expected.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15291
2023-09-22 16:13:20 -07:00
Alexander Motin 62677576a7 ZIL: Fix potential race on flush deferring.
zil_lwb_set_zio_dependency() can not set write ZIO dependency on
previous LWB's write ZIO if one is already in done handler and set
state to LWB_STATE_WRITE_DONE.  So theoretically done handler of
next LWB's write ZIO may run before done handler of previous LWB
write ZIO completes.  In such case we can not defer flushes, since
the flush issue process is not locked.

This may fix some reported assertions of lwb_vdev_tree not being
empty inside zil_free_lwb().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15278
2023-09-20 16:41:23 -07:00
Mateusz Guzik f7a07d76ee Retire z_nr_znodes
Added in ab26409db7 ("Linux 3.1 compat, super_block->s_shrink"), with
the only consumer which needed the count getting retired in 066e825221
("Linux compat: Minimum kernel version 3.10").

The counter gets in the way of not maintaining the list to begin with.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #15274
2023-09-19 08:52:06 -07:00
Tony Hutter 54c6fbd378 zed: Allow autoreplace and fault LEDs for removed vdevs
Allow zed to autoreplace vdevs marked as REMOVED.  Also update
statechange-led zedlet to toggle fault LEDs for REMOVED vdevs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15281
2023-09-19 08:52:06 -07:00
наб 0ce7a068e9 check-zstd-symbols: also ignore __pfx_ symbols
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b341b20d648bb7e9a3307c33163e7399f0913e66

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #15282 
Closes #15284
2023-09-19 08:52:06 -07:00
Laura Hild 228b064d1b Remove implication that child disks aren't vdevs in zpoolconcepts(7)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Laura Hild <lsh@jlab.org>
Closes #15247
2023-09-19 08:52:06 -07:00
ednadolski-ix b9b9cdcdb1 update max_variance limit in zdb_block_size_histogram test for CI
Commit 2d7843401a had previously
updated this hardcoded limit to allow for CI testing. As there
is no deterministic pass/fail value, the need has arisen for
one more small increase.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Edmund Nadolski <edmund.nadolski@ixsystems.com>
Closes #15252
2023-09-19 08:52:06 -07:00
George Amanakis 11943656f9 Update the MOS directory on spa_upgrade_errlog()
spa_upgrade_errlog() does not update the MOS directory when the
head_errlog feature is enabled. In this case if spa_errlog_sync() is not
called, the MOS dir references the old errlog_last and errlog_sync
objects. Thus when doing a scrub a panic will occur:

Call Trace:
 dump_stack+0x6d/0x8b
 panic+0x101/0x2e3
 spl_panic+0xcf/0x102 [spl]
 delete_errlog+0x124/0x130 [zfs]
 spa_errlog_sync+0x256/0x260 [zfs]
 spa_sync_iterate_to_convergence+0xe5/0x250 [zfs]
 spa_sync+0x2f7/0x670 [zfs]
 txg_sync_thread+0x22d/0x2d0 [zfs]
 thread_generic_wrapper+0x83/0xa0 [spl]
 kthread+0x104/0x140
 ret_from_fork+0x1f/0x40

Fix this by updating the related MOS directory objects in
spa_upgrade_errlog().

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #15279 
Closes #15277
2023-09-19 08:51:00 -07:00
Tony Hutter c011ef8c91 Linux 6.5 compat: META (#15265)
Update the META file to reflect compatibility with the 6.5
kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-09-19 08:50:01 -07:00
Andrea Righi cacc599aa2 Linux 6.5 compat: spl: properly unregister sysctl entries
When register_sysctl_table() is unavailable we fail to properly
unregister sysctl entries under "kernel/spl".

This leads to errors like the following when spl is unloaded/reloaded,
making impossible to properly reload the spl module:

[  746.995704] sysctl duplicate entry: /kernel/spl/kmem/slab_kvmem_total

Fix by cleaning up all the sub-entries inside "kernel/spl" when the
spl module is unloaded.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Closes #15239
2023-09-19 08:50:01 -07:00
Andrea Righi c7ee59a160 Linux 6.5 compat: safe cleanup in spl_proc_fini()
If we fail to create a proc entry in spl_proc_init() we may end up
calling unregister_sysctl_table() twice: one in the failure path of
spl_proc_init() and another time during spl_proc_fini().

Avoid the double call to unregister_sysctl_table() and while at it
refactor the code a bit to reduce code duplication.

This was accidentally introduced when the spl code was
updated for Linux 6.5 compatibility.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Closes #15234 
Closes #15235
2023-09-19 08:50:01 -07:00
Coleman Kane 58a707375f Linux 6.5 compat: Use copy_splice_read instead of filemap_splice_read
Using the filemap_splice_read function for the splice_read handler was
leading to occasional data corruption under certain circumstances. Favor
using copy_splice_read instead, which does not demonstrate the same
erroneous behavior under the tested failure cases.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15164
2023-09-19 08:50:01 -07:00
Coleman Kane 5a22de144a Linux 6.5 compat: replace generic_file_splice_read with filemap_splice_read
The generic_file_splice_read function was removed in Linux 6.5 in favor
of filemap_splice_read. Add an autoconf test for filemap_splice_read and
use it if it is found as the handler for .splice_read in the
file_operations struct. Additionally, ITER_PIPE was removed in 6.5. This
change removes the ITER_* macros that OpenZFS doesn't use from being
tested in config/kernel-vfs-iov_iter.m4. The removal of ITER_PIPE was
causing the test to fail, which also affected the code responsible for
setting the .splice_read handler, above. That behavior caused run-time
panics on Linux 6.5.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15155
2023-09-19 08:50:01 -07:00
Coleman Kane 31a4673c05 Linux 6.5 compat: register_sysctl_table removed
Additionally, the .child element of ctl_table has been removed in 6.5.
This change adds a new test for the pre-6.5 register_sysctl_table()
function, and uses the old code in that case. If it isn't found, then
the parentage entries in the tables are removed, and the register_sysctl
call is provided the paths of "kernel/spl", "kernel/spl/kmem", and
"kernel/spl/kstat" directly, to populate each subdirectory over three
calls, as is the new API.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15138
2023-09-19 08:50:01 -07:00
Brian Atkinson 3a68f3c50f Revert "Linux 6.5 compat: register_sysctl_table removed"
This reverts commit b35374fd64 as there
are error messages when loading the SPL module. Errors seemed to be tied
to duplicate a duplicate entry.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #15134
2023-09-19 08:50:01 -07:00
Coleman Kane 8be6308e85 Linux 4.20 compat: wrapper function for iov_iter type access
An iov_iter_type() function to access the "type" member of the struct
iov_iter was added at one point. Move the conditional logic to decide
which method to use for accessing it into a macro and simplify the
zpl_uio_init code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15100
2023-09-19 08:50:01 -07:00
Coleman Kane 0bf2c5365e Linux 6.4 compat: iter_iov() function now used to get old iov member
The iov_iter->iov member is now iov_iter->__iov and must be accessed via
the accessor function iter_iov(). Create a wrapper that is conditionally
compiled to use the access method appropriate for the target kernel
version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15100
2023-09-19 08:50:01 -07:00
Coleman Kane d76de9fb17 Linux 6.5 compat: blkdev changes
Multiple changes to the blkdev API were introduced in Linux 6.5. This
includes passing (void* holder) to blkdev_put, adding a new
blk_holder_ops* arg to blkdev_get_by_path, adding a new blk_mode_t type
that replaces uses of fmode_t, and removing an argument from the release
handler on block_device_operations that we weren't using. The open
function definition has also changed to take gendisk* and blk_mode_t, so
update it accordingly, too.

Implement local wrappers for blkdev_get_by_path() and
vdev_blkdev_put() so that the in-line calls are cleaner, and place the
conditionally-compiled implementation details inside of both of these
local wrappers. Both calls are exclusively used within vdev_disk.c, at
this time.

Add blk_mode_is_open_write() to test FMODE_WRITE / BLK_OPEN_WRITE
The wrapper function is now used for testing using the appropriate
method for the kernel, whether the open mode is writable or not.

Emphasize fmode_t arg in zvol_release is not used

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15099
2023-09-19 08:50:01 -07:00
Coleman Kane c0f075c06b Linux 6.5 compat: use disk_check_media_change when it exists
When disk_check_media_change() exists, then define
zfs_check_media_change() to simply call disk_check_media_change() on
the bd_disk member of its argument. Since disk_check_media_change()
is newer than when revalidate_disk was present in bops, we should
be able to safely do this via a macro, instead of recreating a new
implementation of the inline function that forces revalidation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15101
2023-09-19 08:50:01 -07:00
Coleman Kane 6c2fc56916 Linux 6.5 compat: register_sysctl_table removed
Additionally, the .child element of ctl_table has been removed in 6.5.
This change adds a new test for the pre-6.5 register_sysctl_table()
function, and uses the old code in that case. If it isn't found, then
the parentage entries in the tables are removed, and the register_sysctl
call is provided the paths of "kernel/spl", "kernel/spl/kmem", and
"kernel/spl/kstat" directly, to populate each subdirectory over three
calls, as is the new API.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15098
2023-09-19 08:50:01 -07:00
Alexander Motin e96fbdba34 Add more constraints for block cloning.
- We cannot clone into files with smaller block size if there is
more than one block, since we can not grow the block size.
 - Block size must be power-of-2 if destination offset != 0, since
there can be no multiple blocks of non-power-of-2 size.

The first should handle the case when destination file has several
blocks but still is not bigger than one block of the source file.
The second fixes panic in dmu_buf_hold_array_by_dnode() on attempt
to concatenate files with equal but non-power-of-2 block sizes.

While there, assert that error is reported if we made no progress.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
2023-09-10 14:02:52 -07:00
Brian Behlendorf 739db06ce7 Tag 2.2.0-rc4
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-09-07 16:11:40 -07:00
Volker Mauel 4da8c7d11e Intel QAT 1.7 compatibility
Based on the intel QAT samples which are bundled in the 1.x drivers, 
this is the preferred approach since api version 1.6.  See:

https://www.intel.de/content/www/de/de/download/19734/intel-quickassist-technology-driver-for-linux-hw-version-1-x.html?

Reviewed-by: Weigang Li <weigang.li@intel.com>
Signed-off-by: Volker Mauel <volkermauel@gmail.com>
Closes #15190
2023-09-07 16:10:52 -07:00
Umer Saleem 32949f2560 Relax error reporting in zpool import and zpool split
For zpool import and zpool split, zpool_enable_datasets is called
to mount and share all datasets in a pool. If there is an error
while mounting or sharing any dataset in the pool, the status of
import or split is reported as failure. However, the changes do
show up in zpool list.

This commit updates the error reporting in zpool import and zpool
split path. More descriptive messages are shown to user in case
there is an error during mount or share. Errors in mount or share
do not effect the overall status of zpool import and zpool split.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15216
2023-09-02 10:30:38 -07:00
Alexander Motin 79ac1b29d5 ZIL: Change ZIOs issue order.
In zil_lwb_write_issue(), after issuing lwb_root_zio/lwb_write_zio,
we have no right to access lwb->lwb_child_zio. If it was not there,
the first two ZIOs may have already completed and freed the lwb.
ZIOs issue in opposite order from children to parent should keep
the lwb valid till the end, since the lwb can be freed only after
lwb_root_zio completion callback.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15233
2023-09-02 10:30:38 -07:00
Alexander Motin 7dc2baaa1f ZIL: Revert zl_lock scope reduction.
While I have no reports of it, I suspect possible use-after-free
scenario when zil_commit_waiter() tries to dereference zcw_lwb
for lwb already freed by zil_sync(), while zcw_done is not set.
Extension of zl_lock scope as it was originally should block
zil_sync() from freeing the lwb, closing this race.

This reverts #14959 and couple chunks of #14841.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15228
2023-09-02 10:30:38 -07:00
Alexander Motin 5a7cb0b065 ZIL: Tune some assertions.
In zil_free_lwb() we should first assert lwb_state or the rest of
assertions can be misleading if it is false.

Add lwb_state assertions in zil_lwb_add_block() to make sure we are
not trying to add elements to lwb_vdev_tree after it was processed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15227
2023-09-02 10:30:38 -07:00
Dimitry Andric 400f56e3f8 dmu_buf_will_clone: change assertion to fix 32-bit compiler warning
Building module/zfs/dbuf.c for 32-bit targets can result in a warning:

In file included from
/usr/src/sys/contrib/openzfs/include/sys/zfs_context.h:97,
                 from /usr/src/sys/contrib/openzfs/module/zfs/dbuf.c:32:
/usr/src/sys/contrib/openzfs/module/zfs/dbuf.c: In function
'dmu_buf_will_clone':
/usr/src/sys/contrib/openzfs/lib/libspl/include/assert.h:116:33: error:
cast from pointer to integer of different size
[-Werror=pointer-to-int-cast]
  116 |         const uint64_t __left = (uint64_t)(LEFT);
  \
      |                                 ^
/usr/src/sys/contrib/openzfs/lib/libspl/include/assert.h:148:25: note:
in expansion of macro 'VERIFY0'
  148 | #define ASSERT0         VERIFY0
      |                         ^~~~~~~
/usr/src/sys/contrib/openzfs/module/zfs/dbuf.c:2704:9: note: in
expansion of macro 'ASSERT0'
 2704 |         ASSERT0(dbuf_find_dirty_eq(db, tx->tx_txg));
      |         ^~~~~~~

This is because dbuf_find_dirty_eq() returns a pointer, which if
pointers are 32-bit results in a warning about the cast to uint64_t.

Instead, use the ASSERT3P() macro, with == and NULL as second and third
arguments, which should work regardless of the target's bitness.

Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #15224
2023-09-01 09:33:33 -07:00
Serapheim Dimitropoulos 63159e5bda checkstyle: fix action failures
Reviewed-by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #15220
2023-09-01 09:33:33 -07:00
Paul Dagnelie 7eabb0af37 Try to clarify wording to reduce zpool add incidents
Try to clarify wording to reduce zpool add incidents.
Add an attach example.

Reviewed-by: Rich Ercolani <Rincebrain@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15179
2023-08-27 08:25:42 -07:00
Rich Ercolani c65aaa8387 Avoid save/restoring AMX registers to avoid a SPR erratum
Intel SPR erratum SPR4 says that if you trip into a vmexit while
doing FPU save/restore, your AMX register state might misbehave...
and by misbehave, I mean save all zeroes incorrectly, leading to
explosions if you restore it.

Since we're not using AMX for anything, the simple way to avoid
this is to just not save/restore those when we do anything, since
we're killing preemption of any sort across our save/restores.

If we ever decide to use AMX, it's not clear that we have any
way to mitigate this, on Linux...but I am not an expert.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14989
Closes #15168
2023-08-27 08:25:42 -07:00
Brian Behlendorf e99e684b33 zed: update zed.d/statechange-slot_off.sh
The statechange-slot_off.sh zedlet which was added in #15200
needed to be installed so it's included by the packages.

Additional testing has also shown that multiple retries are
often needed for the script to operate reliably.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15210
2023-08-27 08:25:42 -07:00
наб 1b696429c1 Make zoned/jailed zfsprops(7) make more sense.
- Distribute zfs-[un]jail.8 on FreeBSD and zfs-[un]zone.8 on Linux
- zfsprops.7: mirror zoned/jailed, only available on respective platforms

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #15161
2023-08-27 08:25:42 -07:00
Rob N 084ff4abd2 tests/block_cloning: rename and document get_same_blocks helper
`get_same_blocks` is a helper to compare two files and return a list of
the blocks that are clones of each other. Its very necessary for block
cloning tests.

Previously it was incorrectly called `unique_blocks`, which is the
_inverse_ of what it does (an early version did list unique blocks; it
was changed but the name was not). So if nothing else, it should be
called `duplicate_blocks`.

But, keeping the details of a clone operation in your head is actually
quite difficult, without the additional overhead of wondering how the
tools work. So I've renamed it to better describe what it does, added a
usage note, and changed it to return block indexes from 0 instead of 1,
to match how L0 blocks are normally counted.

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by:  Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15181
2023-08-26 11:18:11 -07:00
Serapheim Dimitropoulos ab999406fe Update outdated assertion from zio_write_compress
As part of some internal gang block testing within Delphix
we hit the assertion removed by this patch. The assertion
was triggered by a ZIO that had two copies and was a gang
block making the following expression equal to 3:
```
MIN(zp->zp_copies + BP_IS_GANG(bp), spa_max_replication(spa))
```
and failing when we expected the above to be equal to
`BP_GET_NDVAS(bp)`.

The assertion is no longer valid since the following commit:
```
commit 14872aaa4f
Author: Matthew Ahrens <matthew.ahrens@delphix.com>
Date:   Mon Feb 6 09:37:06 2023 -0800

  EIO caused by encryption + recursive gang
```

The above commit changed gang block headers so they can't
have more than 2 copies but the assertion in question from
this PR was never updated.

Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #15180
2023-08-26 11:18:11 -07:00
Tony Hutter d19304ffee zed: Add zedlet to power off slot when drive is faulted
If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then
power off the drive's slot in the enclosure if it becomes FAULTED.
This can help silence misbehaving drives.  This assumes your drive
enclosure fully supports slot power control via sysfs.

Reviewed-by: @AllKind
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15200
2023-08-25 13:33:40 -07:00
Rob N 92f095a903 copy_file_range: fix fallback when source create on same txg
In 019dea0a5 we removed the conversion from EAGAIN->EXDEV inside
zfs_clone_range(), but forgot to add a test for EAGAIN to the
copy_file_range() entry points to trigger fallback to a content copy.

This commit fixes that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15170
Closes #15172
2023-08-25 13:33:40 -07:00
Umer Saleem 645a7e4d95 Move zinject from openzfs-zfs-test to openzfs-zfsutils
For Native Debian packaging, zinject binary and man page is
packaged in ZFS test package. zinject is not not directly related
to ZTS and should be packaged with other utilities, like it is
present in zfs_<ver>.rpm/deb packages.

This commit moves zinject binary and man page from openzfs-zfs-test
to openzfs-zfsutils package.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15160
2023-08-25 13:33:40 -07:00
Rafael Kitover 95649854ba dracut: support mountpoint=legacy for root dataset
Support mountpoint=legacy for the root dataset in the dracut zfs support
scripts.

mountpoint=/ or mountpoint=/sysroot also works.

Change zfs-env-bootfs.service to add zfsutil to BOOTFSFLAGS only for
root datasets with mountpoint != legacy.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Rafael Kitover <rkitover@gmail.com>
Closes #15149
2023-08-25 13:33:40 -07:00
oromenahar 895cb689d3 zfs_clone_range should return a descriptive error codes
Return the more descriptive error codes instead of `EXDEV` when
the parameters don't match the requirements of the clone function.
Updated the comments in `brt.c` accordingly.
The first three errors are just invalid parameters, which zfs can
not handle.
The fourth error indicates that the block which should be cloned
is created and cloned or modified in the same transaction
group (`txg`).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15148
2023-08-25 13:33:40 -07:00
наб 6bdc7259d1 libzfs: sendrecv: send_progress_thread: handle SIGINFO/SIGUSR1
POSIX timers target the process, not the thread (as does SIGINFO),
so we need to block it in the main thread which will die if interrupted.

Ref: https://101010.pl/@ed1conf@bsd.network/110731819189629373
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #15113
2023-08-25 13:33:40 -07:00
Ryan Lahfa 1e488eec60 linux/spl/kmem_cache: undefine kmem_cache_alloc before defining it
When compiling a kernel with bcachefs and zfs,
the two macros will collide, making it impossible
to have both filesystems.

It is sufficient to just undefine the macro before calling it.

On why this should be in ZFS rather than bcachefs, currently,
bcachefs is not a in-tree filesystem, but,
it has a reasonably high chance of getting included soon.

This avoids the breakage in ZFS early,
this patch may be distributed downstream in NixOS
and is already used there.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Lahfa <ryan@lahfa.xyz>
Closes #15144
2023-08-25 13:33:40 -07:00
Mateusz Piotrowski c418edf1d3 Fix some typos
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Closes #15141
2023-08-25 13:33:40 -07:00
Alexander Motin df8c9f351d ZIL: Second attempt to reduce scope of zl_issuer_lock.
The previous patch #14841 appeared to have significant flaw, causing
deadlocks if zl_get_data callback got blocked waiting for TXG sync.  I
already handled some of such cases in the original patch, but issue
 #14982 shown cases that were impossible to solve in that design.

This patch fixes the problem by postponing log blocks allocation till
the very end, just before the zios issue, leaving nothing blocking after
that point to cause deadlocks.  Before that point though any sleeps are
now allowed, not causing sync thread blockage.  This require slightly
more complicated lwb state machine to allocate blocks and issue zios
in proper order.  But with removal of special early issue workarounds
the new code is much cleaner now, and should even be more efficient.

Since this patch uses null zios between write, I've found that null
zios do not wait for logical children ready status in zio_ready(),
that makes parent write to proceed prematurely, producing incorrect
log blocks.  Added ZIO_CHILD_LOGICAL_BIT to zio_wait_for_children()
fixes it.

Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15122
2023-08-25 11:58:44 -07:00
Alexander Motin bb31ded68b ZIL: Replay blocks without next block pointer.
If we get next block allocation error during log write, we trigger
transaction commit.  But the block we have just completed is still
written and transactions it covers will be acknowledged normally.
If after that we ignore the block during replay just because it is
the last in the chain, we may not replay some transactions that we
have acknowledged as synced, that is not right.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15132
2023-08-25 11:58:44 -07:00
Alexander Motin c1801cbe59 ZIL: Avoid dbuf_read() before dmu_sync().
In most cases dmu_sync() works with dirty records directly and does
not need actual data. The only exception is dmu_sync_late_arrival().
To save some CPU time use dmu_buf_hold_noread*() in z*_get_data()
and explicitly call dbuf_read() in dmu_sync_late_arrival(). There
is also a chance that by that time TXG will already be synced and
we won't have to do it at all.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15153
2023-08-25 11:58:44 -07:00
Alexander Motin ffaedf0a44 Remove fastwrite mechanism.
Fastwrite was introduced many years ago to improve ZIL writes spread
between multiple top-level vdevs by tracking number of allocated but
not written blocks and choosing vdev with smaller count.  It suposed
to reduce ZIL knowledge about allocation, but actually made ZIL to
even more actively report allocation code about the allocations,
complicating both ZIL and metaslabs code.

On top of that, it seems ZIO_FLAG_FASTWRITE setting in dmu_sync()
was lost many years ago, that was one of the declared benefits. Plus
introduction of embedded log metaslab class solved another problem
with allocation rotor accounting both normal and log allocations,
since in most cases those are now in different metaslab classes.

After all that, I'd prefer to simplify already too complicated ZIL,
ZIO and metaslab code if the benefit of complexity is not obvious.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15107
2023-08-25 11:58:44 -07:00
Alexander Motin 02ce9030e6 Avoid waiting in dmu_sync_late_arrival().
The transaction there does not produce any dirty data or log blocks,
so it should not be throttled. All other cases wait for TXG sync, by
which time the log block we are writing will be obsolete, so we can
skip waiting and just return error here instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15096
2023-08-25 11:58:44 -07:00
Serapheim Dimitropoulos 0ae7bfc0a4 zpool_vdev_remove() should handle EALREADY error return
When the vdev properties features was merged an extra check
was added in `spa_vdev_remove_top_check()` which checked
whether the vdev that we want to remove is already being
removed and if so return an EALREADY error.

```
static int
spa_vdev_remove_top_check(vdev_t *vd)
{
	... <snip> ...
	/*
	 * This device is already being removed
	 */
	if (vd->vdev_removing)
		return (SET_ERROR(EALREADY));
```

Before that change we'd still fail with an error but it
was a more generic one - here is the check that failed
later in the same function:
```
	/*
	 * There can not be a removal in progress.
	 */
	if (spa->spa_removing_phys.sr_state == DSS_SCANNING)
		return (SET_ERROR(EBUSY));
```

Changing the error code returned from that function changed
the behavior of the removal's library interface exposed to
the userland - `spa_vdev_remove()` now returns `EZFS_UNKNOWN`
instead of `EZFS_EBUSY` that was returning before.

This patch adds logic to make `spa_vdev_remove()` mindful
of the new EALREADY code and propagating `EZFS_EBUSY`
reverting to the previously established semantics of that
function.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #15013
Closes #15129
2023-08-02 08:54:09 -07:00
наб bd1eab16eb linux: zfs: ctldir: set [amc]time to snapshot's creation property
If looking up a snapdir inode failed, hold pool config – hold the 
snapshot – get its creation property – release it – release it, 
then use that as the [amc]time in the allocated inode. If that 
fails then fall back to current time. No performance impact since 
this is only done when allocating a new snapdir inode.
                                                       
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #15110
Closes #15117
2023-08-02 08:53:45 -07:00
Zach Dykstra b3c1807d77 readmmap.c: fix building with MUSL libc
glibc includes sys/types.h from stdlib.h. This is not the case for MUSL,
so explicitly include it. Fixes usage of uint_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Zach Dykstra <dykstra.zachary@gmail.com>
Closes #15130
2023-08-02 08:53:06 -07:00
oromenahar b5e2456333 Check the return value in clonefile test
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15128
2023-08-02 08:52:40 -07:00
Rob N c47f0f4417 linux/copy_file_range: properly request a fallback copy on Linux <5.3
Before Linux 5.3, the filesystem's copy_file_range handler had to signal
back to the kernel that we can't fulfill the request and it should
fallback to a content copy. This is done by returning -EOPNOTSUPP.

This commit converts the EXDEV return from zfs_clone_range to
EOPNOTSUPP, to force the kernel to fallback for all the valid reasons it
might be unable to clone. Without it the copy_file_range() syscall will
return EXDEV to userspace, breaking its semantics.

Add test for copy_file_range fallbacks.  copy_file_range should always
fallback to a content copy whenever ZFS can't service the request with
cloning.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15131
2023-08-02 08:52:40 -07:00
Rob N 12f2b1f65e zdb: include cloned blocks in block statistics
This gives `zdb -b` support for clone blocks.

Previously, it didn't know what clones were, so would count their space
allocation multiple times and then report leaked space (or, in debug,
would assert trying to claim blocks a second time).

This commit fixes those bugs, and reports the number of clones and the
space "used" (saved) by them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15123
2023-08-02 08:52:40 -07:00
Brian Behlendorf 4a104ac047 Tag 2.2.0-rc3
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-07-27 16:15:44 -07:00
oromenahar c24a480631 BRT should return EOPNOTSUPP
Return the more descriptive EOPNOTSUPP instead of EXDEV when the
storage pool doesn't support block cloning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15097
2023-07-27 16:11:54 -07:00
Rob Norris 36d1a3ef4e zts: block cloning tests
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
Closes #405
Closes #13349
2023-07-26 08:46:58 -07:00
Rob Norris 2768dc04cc linux: implement filesystem-side copy/clone functions for EL7
Redhat have backported copy_file_range and clone_file_range to the EL7
kernel using an "extended file operations" wrapper structure. This
connects all that up to let cloning work there too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 3366ceaf3a linux: implement filesystem-side clone ioctls
Prior to Linux 4.5, the FICLONE etc ioctls were specific to BTRFS, and
were implemented as regular filesystem-specific ioctls. This implements
those ioctls directly in OpenZFS, allowing cloning to work on older
kernels.

There's no need to gate these behind version checks; on later kernels
Linux will simply never deliver these ioctls, instead calling the
approprate VFS op.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 5d12545da8 linux: implement filesystem-side copy/clone functions
This implements the Linux VFS ops required to service the file
copy/clone APIs:

  .copy_file_range    (4.5+)
  .clone_file_range   (4.5-4.19)
  .dedupe_file_range  (4.5-4.19)
  .remap_file_range   (4.20+)

Note that dedupe_file_range() and remap_file_range(REMAP_FILE_DEDUP) are
hooked up here, but are not implemented yet.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris a3ea8c8ee6 dbuf_sync_leaf: check DB_READ in state assertions
Block cloning introduced a new state transition from DB_NOFILL to
DB_READ. This occurs when a block is cloned and then read on the
current txg.

In this case, the clone will move the dbuf to DB_NOFILL, and then the
read will be issued for the overidden block pointer. If that read is
still outstanding when it comes time to write, the dbuf will be in
DB_READ, which is not handled by the checks in dbuf_sync_leaf, thus
tripping the assertions.

This updates those checks to allow DB_READ as a valid state iff the
dirty record is for a BRT write and there is a override block pointer.
This is a safe situation because the block already exists, so there's
nothing that could change from underneath the read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Original-patch-by: Kay Pedersen <mail@mkwg.de>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 0426e13271 dmu_buf_will_clone: only check that current txg is clean
dbuf_undirty() will (correctly) only removed dirty records for the given
(open) txg. If there is a dirty record for an earlier closed txg that
has not been synced out yet, then db_dirty_records will still have
entries on it, tripping the assertion.

Instead, change the assertion to only consider the current txg. To some
extent this is redundant, as its really just saying "did dbuf_undirty()
work?", but it it doesn't hurt and accurately expresses our
expectations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Original-patch-by: Kay Pedersen <mail@mkwg.de>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 8aa4f0f0fc brt_vdev_realloc: use vmem_alloc for large allocation
bv_entcount can be a relatively large allocation (see comment for
BRT_RANGESIZE), so get it from the big allocator.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Rob Norris 7698503dca zfs_clone_range: use vmem_malloc for large allocation
Just silencing the warning about large allocations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Closes #15050
2023-07-26 08:46:58 -07:00
Brian Behlendorf b9aa32ff39 zed: Reduce log noise for large JBODs
For large JBODs the log message "zfs_iter_vdev: no match" can
account for the bulk of the log messages (over 70%).  Since this
message is purely informational and not that useful we remove it.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15086
Closes #15094
2023-07-26 08:46:58 -07:00
Brian Behlendorf 571762b290 Linux 6.4 compat: META
Update the META file to reflect compatibility with the 6.4 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15095
2023-07-26 08:46:58 -07:00
Alexander Motin 991834f5dc Remove zl_issuer_lock from zil_suspend().
This locking was recently added as part of #14979. But appears it
is illegal to take zl_issuer_lock while holding dp_config_rwlock,
taken by dsl_pool_hold().  It causes deadlock with sync thread in
spa_sync_upgrades().  On a second thought, we should not
need this locking, since zil_commit_impl() we call below takes
zl_issuer_lock, that should sufficiently protect zl_suspend reads,
combined with other logic from #14979.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15103
2023-07-25 13:54:02 -07:00
Alexander Motin 41a0f66279 ZIL: Fix config lock deadlock.
When we have some LWBs closed and their ZIOs ready to be issued, we
can not afford sleeping on config lock if somebody else try to lock
it as writer, or it will cause a deadlock.

To solve it, move spa_config_enter() from zil_lwb_write_issue() to
zil_lwb_write_close() under zl_issuer_lock to enforce lock ordering
with other threads.  Now if we can't immediately lock config, issue
all previously closed LWBs so that they could drop their config
locks after completion, and only then allow sleeping on our lock.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15078
Closes #15080
2023-07-25 13:54:02 -07:00
Umer Saleem c79d1bae75 Update changelog for OpenZFS 2.2.0 release
This commit updates changelog for native Debian packages for
OpenZFS 2.2.0 release.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15104
2023-07-25 09:01:27 -07:00
Brian Behlendorf 70232483b4 Tag 2.2.0-rc2
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-07-21 16:36:34 -07:00
Rob N c5273e0c31 shellcheck: disable "unreachable command" check [SC2317]
This new check in 0.9.0 appears to have some issues with various forms
of "early return", like trap, exit and return. This is tripping up (at
least):

  cmd/zed/zed.d/history_event-zfs-list-cacher.sh
  /etc/zfs/zfs-functions

Its not obvious what its complaining about or what the remedy is, so it
seems sensible to disable this check for now.

See also:

  https://www.shellcheck.net/wiki/SC2317
  https://github.com/koalaman/shellcheck/issues/2542
  https://github.com/koalaman/shellcheck/issues/2613

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15089
2023-07-21 16:35:12 -07:00
Rob N 685ae4429f metaslab: tuneable to better control force ganging
metaslab_force_ganging isn't enough to actually force ganging, because
it still only forces 3% of the time. This adds
metaslab_force_ganging_pct so we can configure how often to force
ganging.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15088
2023-07-21 16:35:12 -07:00
Alexander Motin 81be809a25 Adjust prefetch parameters.
- Reduce maximum prefetch distance for 32bit platforms to 8MB as it
was previously.  Those systems didn't grow much probably, so better
stay conservative there.
 - Retire array_rd_sz tunable, blocking prefetch for large requests.
We should not penalize applications trying to be more efficient. The
speculative prefetcher by itself has reasonable distance limits, and
1MB is not much at all these days.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15072
2023-07-21 16:35:12 -07:00
Alexander Motin 8a6fde8213 Add explicit prefetches to bpobj_iterate().
To simplify error handling bpobj_iterate_blkptrs() iterates through
the list of block pointers backwards.  Unfortunately speculative
prefetcher is currently unable to detect such patterns, that makes
each block read there synchronous and very slow on HDD pools.

According to my tests, added explicit prefetch reduces time needed
to asynchronously delete 8 snapshots of 4 million blocks each from
20 seconds to less than one, that should free sync thread for other
useful work, such as async writes, scrub, etc.

While there, plug one memory leak in case of bpobj_open() error and
harmonize some variable names.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15071
2023-07-21 16:35:12 -07:00
Alan Somers b6f618f8ff Don't emit cksum_{actual_expected} in ereport.fs.zfs.checksum events
With anything but fletcher-4, even a tiny change in the input will cause
the checksum value to change completely.  So knowing the actual and
expected checksums doesn't provide much more information than "they
don't match".  The harm in sending them is simply that they bloat the
event.  In particular, on FreeBSD the event must fit into a 1016 byte
buffer.

Fixes #14717 for mirrored pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #14717
Closes #15052
2023-07-21 16:35:12 -07:00
Alan Somers 51a2b59767 Don't emit checksum histograms in ereport.fs.zfs.checksum events
The checksum histograms were intended to be used with ATA and parallel
SCSI, which are obsolete.  With modern storage hardware, they will
almost always look like white noise; all bits will be wrong.  They only
serve to bloat the event.  That's a particular problem on FreeBSD, where
events must fit into a 1016 byte buffer.

This fixes issue #14717 for RAIDZ pools, but not for mirror pools.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #15052
2023-07-21 16:35:12 -07:00
Tony Hutter 8c81c0b05d zed: Fix zed ASSERT on slot power cycle
We would see zed assert on one of our systems if we powered off a
slot.  Further examination showed zfs_retire_recv() was reporting
a GUID of 0, which in turn would return a NULL nvlist.  Add
in a check for a zero GUID.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15084
2023-07-21 16:35:12 -07:00
Chunwei Chen b221f43943 Fix zpl_test_super race with zfs_umount
We cannot call zpl_enter in zpl_test_super, because zpl_test_super is
under spinlock so we can't sleep, and also because zpl_test_super is
called without sb->s_umount taken, so it's possible we would race with
zfs_umount and call zpl_enter on freed zfsvfs.

Here's an stack trace when this happens:
[ 2379.114837] VERIFY(cvp->cv_magic == CV_MAGIC) failed
[ 2379.114845] PANIC at spl-condvar.c:497:__cv_broadcast()
[ 2379.114854] Kernel panic - not syncing: VERIFY(cvp->cv_magic == CV_MAGIC) failed
[ 2379.115012] Call Trace:
[ 2379.115019]  dump_stack+0x74/0x96
[ 2379.115024]  panic+0x114/0x2f6
[ 2379.115035]  spl_panic+0xcf/0xfc [spl]
[ 2379.115477]  __cv_broadcast+0x68/0xa0 [spl]
[ 2379.115585]  rrw_exit+0xb8/0x310 [zfs]
[ 2379.115696]  rrm_exit+0x4a/0x80 [zfs]
[ 2379.115808]  zpl_test_super+0xa9/0xd0 [zfs]
[ 2379.115920]  sget+0xd1/0x230
[ 2379.116033]  zpl_mount+0xdc/0x230 [zfs]
[ 2379.116037]  legacy_get_tree+0x28/0x50
[ 2379.116039]  vfs_get_tree+0x27/0xc0
[ 2379.116045]  path_mount+0x2fe/0xa70
[ 2379.116048]  do_mount+0x80/0xa0
[ 2379.116050]  __x64_sys_mount+0x8b/0xe0
[ 2379.116052]  do_syscall_64+0x35/0x50
[ 2379.116054]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 2379.116057] RIP: 0033:0x7f9912e8b26a

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #15077
2023-07-21 16:35:12 -07:00
Ameer Hamza e037327bfe spa_min_alloc should be GCD, not min
Since spa_min_alloc may not be a power of 2, unlike ashifts, in the
case of DRAID, we should not select the minimal value among several
vdevs. Rounding to a multiple of it is unlikely to work for other
vdevs. Instead, using the greatest common divisor produces smaller
yet more reasonable results.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15067
2023-07-21 16:35:12 -07:00
Yuri Pankov 1a2e486d25 Don't panic if setting vdev properties is unsupported for this vdev type
Check that vdev has valid zap and bail out early.

While here, move objid selection out of the loop, it's not going to
change.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes #15063
2023-07-21 16:35:12 -07:00
Ameer Hamza d8011707cc Ignore pool ashift property during vdev attachment
Ashift can be set for a vdev only during its creation, and the
top-level vdev does not change when a vdev is attached or replaced.
The ashift property should not be used during attachment, as it
does not allow attaching/replacing a vdev if the pool's ashift
property is increased after the existing vdev was created. Instead,
we should be able to attach the vdev if the attached vdev can
satisfy the ashift requirement with its parent.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15061
2023-07-21 16:35:12 -07:00
Wojciech Małota-Wójcik f5f5a2db95 Rollback before zfs root is mounted
On my machines I observe random failures caused by rollback happening 
after zfs root is mounted. I've observed two types of failures:

- zfs-rollback-bootfs.service fails saying that rollback must be
  done just before mounting the dataset
- boot process fails and rescue console is entered.

After making this modification and testing it for couple of days 
none of those problems have been observed anymore.

I don't know if `dracut-mount.service` is still needed in the 
`After` directive. Maybe someone else is able to address this?

Reviewed-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Signed-off-by: Wojciech Małota-Wójcik <59281144+outofforest@users.noreply.github.com>
Closes #15025
2023-07-21 16:35:12 -07:00
Alexander Motin 83b0967c1f Do not request data L1 buffers on scan prefetch.
Set ARC_FLAG_NO_BUF when prefetching data L1 buffers for scan.  We
do not prefetch data L0 buffers, so we do not need the L1 buffers,
only want them to be ready in ARC. This saves some CPU time on the
buffers decompression.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15029
2023-07-21 16:35:12 -07:00
Coleman Kane 73ba5df31a Linux 6.5 compat: disk_check_media_change() was added
The disk_check_media_change() function was added which replaces
bdev_check_media_change.  This change was introduced in 6.5rc1
444aa2c58cb3b6cfe3b7cc7db6c294d73393a894 and the new function takes a
gendisk* as its argument, no longer a block_device*. Thus, bdev->bd_disk
is now used to pass the expected data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15060
2023-07-21 16:35:12 -07:00
Coleman Kane 1bc244ae93 Linux 6.5 compat: BLK_STS_NEXUS renamed to BLK_STS_RESV_CONFLICT
This change was introduced in Linux commit
7ba150834b840f6f5cdd07ca69a4ccf39df59a66

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15059
2023-07-21 16:35:12 -07:00
Coleman Kane 931dc70550 Linux 6.5 compat: intptr_t definition is canonically signed
Make the version here match that elsewhere in the kernel and system
headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #15058
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-07-21 16:35:12 -07:00
Yuri Pankov 5299f4f289 set autotrim default to 'off' everywhere
As it turns out having autotrim default to 'on' on FreeBSD never really
worked due to mess with defines where userland and kernel module were
getting different default values (userland was defaulting to 'off',
module was thinking it's 'on').

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes #15079
2023-07-21 16:35:12 -07:00
Alan Somers f917cf1c03 Fix the ZFS checksum error histograms with larger record sizes
My analysis in PR #14716 was incorrect.  Each histogram bucket contains
the number of incorrect bits, by position in a 64-bit word, over the
entire record.  8-bit buckets can overflow for record sizes above 2k.
To forestall that, saturate each bucket at 255.  That should still get
the point across: either all bits are equally wrong, or just a couple
are.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by: Axcient
Closes #15049
2023-07-21 16:35:12 -07:00
Alexander Motin 56ed389a57 Fix raw receive with different indirect block size.
Unlike regular receive, raw receive require destination to have the
same block structure as the source.  In case of dnode reclaim this
triggers two special cases, requiring special handling:
 - If dn_nlevels == 1, we can change the ibs, but dnode_set_blksz()
should not dirty the data buffer if block size does not change, or
durign receive dbuf_dirty_lightweight() will trigger assertion.
 - If dn_nlevels > 1, we just can't change the ibs, dnode_set_blksz()
would fail and receive_object would trigger assertion, so we should
destroy and recreate the dnode from scratch.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15039

(cherry picked from commit c4e8742149)
2023-07-20 08:58:29 -07:00
Alexander Motin e613e4bbe3 Avoid extra snprintf() in dsl_deadlist_merge().
Since we are already iterating the ZAP, we have exact string key to
remove, we do not need to call zap_remove_int() with the int key we
just converted, we can call zap_remove() for the original string.

This should make no functional change, only a micro-optimization.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15056

(cherry picked from commit fdba8cbb79)
2023-07-20 08:58:29 -07:00
Alexander Motin b4e630b00c Add missed DMU_PROJECTUSED_OBJECT prefetch.
It seems 9c5167d19f "Project Quota on ZFS" missed to add prefetch
for DMU_PROJECTUSED_OBJECT during scan (scrub/resilver).  It should
not cause visible problems, but may affect scub/resilver performance.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15024
2023-07-20 08:58:29 -07:00
Mateusz Guzik bf6cd30796 FreeBSD: catch up to __FreeBSD_version 1400093
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #15036
2023-07-20 08:58:29 -07:00
Alexander Motin 1266cebf87 FreeBSD: Fix build on stable/13 after 1302506.
Starting approximately from version 1302506 vn_lock_pair() grown two
additional arguments following head.  There is a one week hole, but
that is closet reference point we have.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #15047
2023-07-20 08:58:29 -07:00
Brian Behlendorf 009d3288de Tag 2.2.0-rc1
New features:
- Fully adaptive ARC eviction (#14359)
- Block cloning (#13392)
- Scrub error log (#12812, #12355)
- Linux container support (#14070, #14097, #12263)
- BLAKE3 Checksums (#12918)
- Corrective "zfs receive" (#9372)

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-06-30 12:07:09 -07:00
Prakash Surya 945e39fc3a Enable tuning of ZVOL open timeout value
The default timeout for ZVOL opens may not be sufficient for all cases,
so we should enable the value to be more easily tuned to account for
systems where the default value is insufficient.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #15023
2023-06-30 11:34:05 -07:00
Brian Behlendorf ac8ae18d22 Revert "spa.h: use IN_BASE instead of IN_FREEBSD_BASE"
This reverts commit 77a3bb1f47.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2023-06-30 10:03:46 -07:00
Rich Ercolani 2b10e32561 Pack our DDT ZAPs a bit denser.
The DDT is really inefficient on 4k and up vdevs, because it always
allocates 4k blocks, and while compression could save us somewhat
at ashift 9, that stops being true.

So let's change the default to 32 KiB, which seems like a reasonable
compromise between improved space savings and inflated write sizes
for DDT updates.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14654
2023-06-30 09:42:02 -07:00
Rob N 61ab05cac7 ddt_addref: remove unnecessary phys fill when refcount is 0
The previous comment wondered if this case could happen; it turns out
that it really can't.

This block can only be entered if dde_type and dde_class are "real";
that only happens when a ddt entry has been previously synced to a ddt
store, that is, it was created on a previous txg. Since its gone through
that sync, its dde_refcount must be >0.

ddt_addref() is called from brt_pending_apply(), which is called at the
beginning of spa_sync(), before pending DMU writes/frees are issued.
Freeing a dedup block is the only thing that can decrement dde_refcount,
so there's no way for it to drop to zero before applying the clone bumps
it.

Further, even if it _could_ go to zero, it wouldn't be necessary to fill
the entry from the block. The phys content is not cleared until the free
is issued, which happens when the refcount goes to zero, when the last
real free comes through. The cloned block should be identical to what's
in the phys already, so the fill should be a no-op anyway.

I've replaced this with an assertion because this is all very dependent
on the ordering in which BRT and DDT changes are applied, and that might
change in the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-By: Klara, Inc.
Closes #15004
2023-06-30 09:01:58 -07:00
Alexander Motin 233425a153 Again fix race between zil_commit() and zil_suspend().
With zl_suspend read in zil_commit() not protected by any locks it
is possible for new ZIL writes to be in progress while zil_destroy()
called by zil_suspend() freeing them.  This patch closes the race
by taking zl_issuer_lock in zil_suspend() and adding the second
zl_suspend check to zil_get_commit_list(), protected by the lock.
It allows all already queued transactions to be logged normally,
while blocks any new ones, calling txg_wait_synced() for the TXGs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14979
2023-06-30 08:59:39 -07:00
Alexander Motin b4a0873092 Some ZIO micro-optimizations.
- Pack struct zio_prop by 4 bytes from 84 to 80.
 - Skip new child ZIO locking while linking to parent.  The newly
allocated ZIO is not externally visible yet, so nobody should care.
 - Skip io_bp_copy writes when not used (write && non-debug).

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14985
2023-06-30 08:54:00 -07:00
Alexander Motin fa7b2390d4 Do not report bytes skipped by scan as issued.
Scan process may skip blocks based on their birth time, DVA, etc.
Traditionally those blocks were accounted as issued, that caused
reporting of hugely over-inflated numbers, having nothing to do
with actual disk I/O.  This change utilizes never used field in
struct dsl_scan_phys to account such skipped bytes, allowing to
report how much data were actually scrubbed/resilvered and what
is the actual I/O speed.  While formally it is an on-disk format
change, it should be compatible both ways, so should not need a
feature flag.

This should partially address the same issue as c85ac731a0, but
from a different perspective, complementing it.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #15007
2023-06-30 08:47:13 -07:00
Arshad Hussain 6052060c13 Don't use hard-coded 'size' value in snprintf()
This patch changes the passing of "size" to snprintf
from hard-coded (openended) to sizeof(errbuf). This
is bringing to standard with rest of the code where-
ever 'errbuf' is used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #15003
2023-06-30 08:37:26 -07:00
Alexander Motin eda32dca92 Fix remount when setting multiple properties.
The previous code was checking zfs_is_namespace_prop() only for the
last property on the list.  If one was not "namespace", then remount
wasn't called.  To fix that move zfs_is_namespace_prop() inside the
loop and remount if at least one of properties was "namespace".

Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15000
2023-06-30 08:36:43 -07:00
vimproved 24554082bd contrib: dracut: Conditionalize copying of libgcc_s.so.1 to glibc only
The issue that this is designed to work around is only applicable to
glibc, since it's caused by glibc's pthread_cancel() implementation
using dlopen on libgcc_s.so.1 (and therefor not triggering dracut to
include it in the initramfs). This commit adds an extra condition to the
workaround that tests for glibc via "ldconfig -p | grep -qF 'libc.so.6'"
(which should only be present on glibc systems).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Violet Purcell <vimproved@inventati.org>
Closes #14992
2023-06-29 12:54:37 -07:00
Yuri Pankov 77a3bb1f47 spa.h: use IN_BASE instead of IN_FREEBSD_BASE
Consistently get the proper default value for autotrim.

Currently, only the kernel module is built with IN_FREEBSD_BASE,
and libzfs get the wrong default value, leading to confusion and
incorrect output when autotrim value was not set explicitly.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes #15016
2023-06-29 11:50:52 -07:00
Mateusz Piotrowski 62ace21a14 zdb: Add missing poolname to -C synopsis
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Sponsored-by: Klara Inc.
Closes #15014
2023-06-29 10:54:43 -07:00
Alexander Motin a9d6b0690b ZIL: Fix another use-after-free.
lwb->lwb_issued_txg can not be accessed after lwb_state is set to
LWB_STATE_FLUSH_DONE and zl_lock is dropped, since the lwb may be
freed by zil_sync().  We must save the txg number before that.

This is similar to the 55b1842f92, but as I see the bug is not new.
It existed for quite a while, just was not triggered due to smaller
race window.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14988
Closes #14999
2023-06-27 17:03:37 -07:00
Alexander Motin b0cbc1aa9a Use big transactions for small recordsize writes.
When ZFS appends files in chunks bigger than recordsize, it borrows
buffer from ARC and fills it before opening transaction.  This
supposed to help in case of page faults to not hold transaction open
indefinitely.  The problem appears when recordsize is set lower than
default 128KB. Since each block is committed in separate transaction,
per-transaction overhead becomes significant, and what is even worse,
active use of of per-dataset and per-pool locks to protect space use
accounting for each transaction badly hurts the code SMP scalability.
The same transaction size limitation applies in case of file rewrite,
but without even excuse of buffer borrowing.

To address the issue, disable the borrowing mechanism if recordsize
is smaller than default and the write request is 4x bigger than it.
In such case writes up to 32MB are executed in single transaction,
that dramatically reduces overhead and lock contention.  Since the
borrowing mechanism is not used for file rewrites, and it was never
used by zvols, which seem to work fine, I don't think this change
should create significant problems, partially because in addition to
the borrowing mechanism there are also used pre-faults.

My tests with 4/8 threads writing several files same time on datasets
with 32KB recordsize in 1MB requests show reduction of CPU usage by
the user threads by 25-35%.  I would measure it in GB/s, but at that
block size we are now limited by the lock contention of single write
issue taskqueue, which is a separate problem we are going to work on.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14964
2023-06-27 17:00:30 -07:00
Laevos bc9d0084ea Remove unnecessary commas in zpool-create.8
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Laevos <5572812+Laevos@users.noreply.github.com>
Closes #15011
2023-06-27 16:58:32 -07:00
Alexander Motin 8469b5aac0 Another set of vdev queue optimizations.
Switch FIFO queues (SYNC/TRIM) and active queue of vdev queue from
time-sorted AVL-trees to simple lists.  AVL-trees are too expensive
for such a simple task.  To change I/O priority without searching
through the trees, add io_queue_state field to struct zio.

To not check number of queued I/Os for each priority add vq_cqueued
bitmap to struct vdev_queue.  Update it when adding/removing I/Os.
Make vq_cactive a separate array instead of struct vdev_queue_class
member.  Together those allow to avoid lots of cache misses when
looking for work in vdev_queue_class_to_issue().

Introduce deadline of ~0.5s for LBA-sorted queues.  Before this I
saw some I/Os waiting in a queue for up to 8 seconds and possibly
more due to starvation.  With this change I no longer see it.  I
had to slightly more complicate the comparison function, but since
it uses all the same cache lines the difference is minimal.  For a
sequential I/Os the new code in vdev_queue_io_to_issue() actually
often uses more simple avl_first(), falling back to avl_find() and
avl_nearest() only when needed.

Arrange members in struct zio to access only one cache line when
searching through vdev queues.  While there, remove io_alloc_node,
reusing the io_queue_node instead.  Those two are never used same
time.

Remove zfs_vdev_aggregate_trim parameter.  It was disabled for 4
years since implemented, while still wasted time maintaining the
offset-sorted tree of TRIM requests.  Just remove the tree.

Remove locking from txg_all_lists_empty().  It is racy by design,
while 2 pair of locks/unlocks take noticeable time under the vdev
queue lock.

With these changes in my tests with volblocksize=4KB I measure vdev
queue lock spin time reduction by 50% on read and 75% on write.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14925
2023-06-27 09:09:48 -07:00
Rich Ercolani 35a6247c5f Add a delay to tearing down threads.
It's been observed that in certain workloads (zvol-related being a
big one), ZFS will end up spending a large amount of time spinning
up taskqs only to tear them down again almost immediately, then
spin them up again...

I noticed this when I looked at what my mostly-idle system was doing
and wondered how on earth taskq creation/destroy was a bunch of time...

So I added a configurable delay to avoid it tearing down tasks the
first time it notices them idle, and the total number of threads at
steady state went up, but the amount of time being burned just
tearing down/turning up new ones almost vanished.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14938
2023-06-26 13:57:12 -07:00
Alexander Motin 8e8acabdca Fix memory leak in zil_parse().
482da24e2 missed arc_buf_destroy() calls on log parse errors, possibly
leaking up to 128KB of memory per dataset during ZIL replay.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14987
2023-06-17 19:51:37 -07:00
George Amanakis 10e36e1761 Shorten arcstat_quiescence sleep time
With the latest L2ARC fixes, 2 seconds is too long to wait for
quiescence of arcstats like l2_size. Shorten this interval to avoid
having the persistent L2ARC tests in ZTS prematurely terminated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14981
2023-06-15 12:45:36 -07:00
Alexander Motin ccec7fbe1c Remove ARC/ZIO physdone callbacks.
Those callbacks were introduced many years ago as part of a bigger
patch to smoothen the write throttling within a txg. They allow to
account completion of individual physical writes within a logical
one, improving cases when some of physical writes complete much
sooner than others, gradually opening the write throttle.

Few years after that ZFS got allocation throttling, working on a
level of logical writes and limiting number of writes queued to
vdevs at any point, and so limiting latency distribution between
the physical writes and especially writes of multiple copies.
The addition of scheduling deadline I proposed in #14925 should
further reduce the latency distribution.  Grown memory sizes over
the past 10 years should also reduce importance of the smoothing.

While the use of physdone callback may still in theory provide
some smoother throttling, there are cases where we simply can not
afford it.  Since dirty data accounting is protected by pool-wide
lock, in case of 6-wide RAIDZ, for example, it requires us to take
it 8 times per logical block write, creating huge lock contention.

My tests of this patch show radical reduction of the lock spinning
time on workloads when smaller blocks are written to RAIDZ pools,
when each of the disks receives 8-16KB chunks, but the total rate
reaching 100K+ blocks per second.  Same time attempts to measure
any write time fluctuations didn't show anything noticeable.

While there, remove also io_child_count/io_parent_count counters.
They are used only for couple assertions that can be avoided.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14948
2023-06-15 10:49:03 -07:00
Brian Behlendorf e32e326c5b ZTS: Skip send_raw_ashift on FreeBSD
On FreeBSD 14 this test runs slowly in the CI environment
and is killed by the 10 minute timeout.  Skip the test on
FreeBSD until the slow down is resolved.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14961
2023-06-14 08:04:05 -07:00
Alexander Motin d057807ede Switch refcount tracking from lists to AVL-trees.
With large number of tracked references list searches under the lock
become too expensive, creating enormous lock contention.

On my tests with ZFS_DEBUG enabled this increases write throughput
with 32KB blocks from ~1.2GB/s to ~7.5GB/s.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14970
2023-06-14 08:02:27 -07:00
George Amanakis 8af1104f83 Store the L2ARC device ashift in the vdev label
If this is not done, and the pool has an ashift other than the default
(at the moment 9) then the following happens:

1) vdev_alloc() assigns the ashift of the pool to L2ARC device, but
   upon export it is not stored anywhere
2) at the first import, vdev_open() sees an vdev_ashift() of 0 and
   assigns the logical_ashift, which is 9
3) reading the contents of L2ARC, including the header fails
4) L2ARC buffers are not restored in ARC.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14313 
Closes #14963
2023-06-14 08:01:17 -07:00
George Amanakis feff9dfed3 Fix the L2ARC write size calculating logic (2)
While commit bcd5321 adjusts the write size based on the size of the log
block, this happens after comparing the unadjusted write size to the
evicted (target) size.

In this case l2ad_hand will exceed l2ad_evict and violate an assertion
at the end of l2arc_write_buffers().

Fix this by adding the max log block size to the allocated size of the
buffer to be committed before comparing the result to the target
size.

Also reset the l2arc_trim_ahead ZFS module variable when the adjusted
write size exceeds the size of the L2ARC device.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14936
Closes #14954
2023-06-09 17:05:47 -07:00
Alexander Motin 70ea484e3e Finally drop long disabled vdev cache.
It was a vdev level read cache, designed to aggregate many small
reads by speculatively issuing bigger reads instead and caching
the result.  But since it has almost no idea about what is going
on with exception of ZIO_FLAG_DONT_CACHE flag set by higher layers,
it was found to make more harm than good, for which reason it was
disabled for the past 12 years.  These days we have much better
instruments to enlarge the I/Os, such as speculative and prescient
prefetches, I/O scheduler, I/O aggregation etc.

Besides just the dead code removal this removes one extra mutex
lock/unlock per write inside vdev_cache_write(), not otherwise
disabled and trying to do some work.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14953
2023-06-09 12:40:55 -07:00
Brian Behlendorf 6db4ed51d6 ZTS: Skip checkpoint_discard_busy
Until the ASSERT which is occasionally hit while running
checkpoint_discard_busy is resolved skip this test case.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12053
Closes #14952
2023-06-09 11:10:01 -07:00
Alexander Motin 90ccfd426d Improve l2arc reporting in arc_summary.
- Do not report L2ARC as FAULTED in presence of in-flight writes.
- Report read and write I/Os, bytes and errors.
- Remove few numbers not important to average user.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #12304 
Closes #14946
2023-06-09 10:14:05 -07:00
Alexander Motin b3ad3f48d9 Use list_remove_head() where possible.
... instead of list_head() + list_remove().  On FreeBSD the list
functions are not inlined, so in addition to more compact code
this also saves another function call.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14955
2023-06-09 10:12:52 -07:00
Alexander Motin 55b1842f92 ZIL: Fix race introduced by f63811f072.
We are not allowed to access lwb after setting LWB_STATE_FLUSH_DONE
state and dropping zl_lock, since it may be freed by zil_sync().
To free itxs and waiters after dropping the lock we need to move
lwb_itxs and lwb_waiters lists elements to local storage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14957
Closes #14959
2023-06-09 10:08:05 -07:00
Rich Ercolani 6c96269024 Revert "systemd: Use non-absolute paths in Exec* lines"
This reverts commit 79b20949b2 since it
doesn't work with the systemd version shipped with RHEL7-based systems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14943
Closes #14945
2023-06-07 11:14:05 -07:00
Brian Behlendorf 93f8abeff0 Linux: Never sleep in kmem_cache_alloc(..., KM_NOSLEEP) (#14926)
When a kmem cache is exhausted and needs to be expanded a new
slab is allocated.  KM_SLEEP callers can block and wait for the
allocation, but KM_NOSLEEP callers were incorrectly allowed to
block as well.

Resolve this by attempting an emergency allocation as a best
effort.  This may fail but that's fine since any KM_NOSLEEP
consumer is required to handle an allocation failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2023-06-07 10:43:43 -07:00
George Amanakis bcd5321039 Fix the L2ARC write size calculating logic
l2arc_write_size() should return the write size after adjusting for trim
and overhead of the L2ARC log blocks. Also take into account the
allocated size of log blocks when deciding when to stop writing buffers
to L2ARC.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14939
2023-06-06 12:32:37 -07:00
Rob Norris 8653f1de48 zdb: add -B option to generate backup stream
This is more-or-less like `zfs send`, but specifying the snapshot by its
objset id for situations where it can't be referenced any other way.

Sponsored-By: Klara, Inc.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: WHR <msl0000023508@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #14642
2023-06-05 11:54:42 -07:00
Rob Norris 2b9f8ba673 znode: expose zfs_get_zplprop to libzpool
There's no particular reason this function should be kernel-only, and I
want to use it (indirectly) from zdb. I've moved it to zfs_znode.c
because libzpool does not compile in zfs_vfsops.c, and this at least
matches the header its imported from.

Sponsored-By: Klara, Inc.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: WHR <msl0000023508@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #14642
2023-06-05 11:53:44 -07:00
Alexander Motin 5ba4025a8d Introduce zfs_refcount_(add|remove)_few().
There are two places where we need to add/remove several references
with semantics of zfs_refcount_(add|remove). But when debug/tracing
is disabled, it is a crime to run multiple atomic_inc() in a loop,
especially under congested pool-wide allocator lock.

Introduced new functions implement the same semantics as the loop,
but without overhead in production builds.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14934
2023-06-05 11:51:44 -07:00
Brian Behlendorf dae3c549f5 Linux 6.3 compat: META (#14930)
Update the META file to reflect compatibility with the 6.3 kernel.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2023-06-05 11:08:24 -07:00
Graham Perrin 6c29422e90 zfs-create(8): ZFS for swap: caution, clarity
Make the section heading more generic (the section relates to ZFS files
as well as ZFS volumes).

Swapping to a ZFS volume is prone to deadlock. Remove the related
instruction, direct readers to OpenZFS FAQ. Related, but not linked
from within the manual page:

<https://openzfs.github.io/openzfs-docs/Project%20and%20Community/FAQ.html#using-a-zvol-for-a-swap-device-on-linux>
(Using a zvol for a swap device on Linux).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Graham Perrin <grahamperrin@freebsd.org>
Issue #7734 
Closes #14756
2023-06-02 11:25:13 -07:00
Alexander Motin 482da24e20 ZIL: Allow to replay blocks of any size.
There seems to be no reason for ZIL blocks to be limited by 128KB
other than replay code is written in such a way.  This change does
not increase the limit yet, just removes the artificial limitation.

Avoided extra memcpy() may save us a second during replay.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14910
2023-06-02 11:01:58 -07:00
Val Packett 4f583a827e PAM: enable testing on FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834
2023-05-31 17:01:16 -07:00
Val Packett db994458bb PAM: support password changes even when not mounted
There's usually no requirement that a user be logged in for changing
their password, so let's not be surprising here.

We need to use the fetch_lazy mechanism for the old password to avoid
a double prompt for it, so that mechanism is now generalized a bit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834
2023-05-31 17:01:11 -07:00
Val Packett e3ba6b93de PAM: add 'uid_min' and 'uid_max' options for changing the uid range
Instead of a fixed >=1000 check, allow the configuration to override
the minimum UID and add a maximum one as well. While here, add the
uid range check to the authenticate method as well, and fix the return
in the chauthtok method (seems very wrong to report success when we've
done absolutely nothing).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834
2023-05-31 17:01:07 -07:00
Val Packett f2f3ec17ed PAM: add 'forceunmount' flag
Probably not always a good idea, but it's nice to have the option.
It is a workaround for FreeBSD calling the PAM session end earier than
the last process is actually done touching the mount, for example.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834
2023-05-31 17:01:02 -07:00
Val Packett 850bccd3bc PAM: add 'recursive_homes' flag to use with 'prop_mountpoint'
It's not always desirable to have a fixed flat homes directory.
With the 'recursive_homes' flag, 'prop_mountpoint' search would
traverse the whole tree starting at 'homes' (which can now be '*'
to mean all pools) to find a dataset with a mountpoint matching
the home directory.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834
2023-05-31 17:00:58 -07:00
Val Packett bd4962b5ac PAM: use boolean_t for config flags
Since we already use boolean_t in the file, we can use it here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834
2023-05-31 17:00:53 -07:00
Val Packett c47b708647 PAM: do not fail to mount if the key's already loaded
If we're expecting a working home directory on login, it would be
rather frustrating to not have it mounted just because it e.g. failed to
unmount once on logout.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14834
2023-05-31 17:00:15 -07:00
Rich Ercolani 2810dda80b Revert "initramfs: use mount.zfs instead of mount"
This broke mounting of snapshots on / for users.

See https://github.com/openzfs/zfs/issues/9461#issuecomment-1376162949 for more context.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14908
2023-05-31 16:58:41 -07:00
Luís Henriques 928c81f4df Fix NULL pointer dereference when doing concurrent 'send' operations
A NULL pointer will occur when doing a 'zfs send -S' on a dataset that
is still being received.  The problem is that the new 'send' will
rightfully fail to own the datasets (i.e. dsl_dataset_own_force() will
fail), but then dmu_send() will still do the dsl_dataset_disown().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Luís Henriques <henrix@camandro.org>
Closes #14903 
Closes #14890
2023-05-30 15:15:24 -07:00
Brian Behlendorf e085e98d54 ZTS: zvol_misc_trim disable blk mq
Disable the zvol_misc_fua.ksh and zvol_misc_trim.ksh test cases on impacted
kernels.  This issue is being actively worked in #14872 and as part of that
fix this commit will be reverted.

    VERIFY(zh->zh_claim_txg == 0) failed
    PANIC at zil.c:904:zil_create()

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14872
Closes #14870
2023-05-29 12:55:35 -07:00
Richard Yao 0f03a41161 Use __attribute__((malloc)) on memory allocation functions
This informs the C compiler that pointers returned from these functions
do not alias other functions, which allows it to do better code
optimization and should make the compiled code smaller.

References:
https://stackoverflow.com/a/53654773
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-malloc-function-attribute
https://clang.llvm.org/docs/AttributeReference.html#malloc

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14827
2023-05-26 15:47:52 -07:00
Brian Behlendorf 20494d47d2 ZTS: Add zpool_resilver_concurrent exception
The zpool_resilver_concurrent test case requires the ZED which is not used
on FreeBSD.  Add this test to the known list of skipped tested for FreeBSD.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14904
2023-05-26 15:39:23 -07:00
Mike Swanson 365bae0eab Add compatibility symlinks for FreeBSD 12.{3,4} and 13.{0,1,2}
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes #14902
2023-05-26 15:37:15 -07:00
Colm d3e0138a3d Adding new read-only compatible zpool features to compatibility.d/grub2
GRUB2 is compatible with all "read-only compatible" features,
so it is safe to add new features of this type to the grub2
compatibility list. We generally want to include all compatible
features, to minimize the differences between grub2-compatible
pools and no-compatibility pools.

Adding new properties `livelist` and `zpool_checkpoint` accordingly.

Also adding them to the man page which references this file as an
example, for consistency.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #14893
2023-05-26 10:04:19 -07:00
Richard Yao 677c6f8457 btree: Implement faster binary search algorithm
This implements a binary search algorithm for B-Trees that reduces
branching to the absolute minimum necessary for a binary search
algorithm. It also enables the compiler to inline the comparator to
ensure that the only slowdown when doing binary search is from waiting
for memory accesses. Additionally, it instructs the compiler to unroll
the loop, which gives an additional 40% improve with Clang and 8%
improvement with GCC.

Consumers must opt into using the faster algorithm. At present, only
B-Trees used inside kernel code have been modified to use the faster
algorithm.

Micro-benchmarks suggest that this can improve binary search performance
by up to 3.5 times when compiling with Clang 16 and up to 1.9 times when
compiling with GCC 12.2.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14866
2023-05-26 10:03:12 -07:00
George Amanakis bb736d98d1 Fix inconsistent definition of zfs_scrub_error_blocks_per_txg
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14894
2023-05-26 09:53:00 -07:00
Damiano Albani ff03dfd4d8 Add missing files to Debian DKMS package
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Damiano Albani <damiano.albani@gmail.com>
Closes #14887
Closes #14889
2023-05-25 16:10:54 -07:00
Brian Behlendorf 91a2325c4a Update compatibility.d files
Add an openzfs-2.2 compatibility file for the next release.

Edon-R support has been enabled for FreeBSD removing the need
for different FreeBSD and Linux files.  Symlinks for the -linux
and -freebsd names are created for any scripts expecting that
convention.

Additionally, a symlink for ubunutu-22.04 was added.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14833
2023-05-25 13:53:08 -07:00
Alexander Motin b6fbe61fa6 zil: Add some more statistics.
In addition to a number of actual log bytes written, account also a
total written bytes including padding and total allocated bytes (bytes
<= write <= alloc).  It should allow to monitor zil traffic and space
efficiency.

Add dtrace probe for zil block size selection.

Make zilstat report more information and fit it into less width.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #14863
2023-05-25 13:51:53 -07:00
Alexander Motin f63811f072 ZIL: Reduce scope of per-dataset zl_issuer_lock.
Before this change ZIL copied all log data while holding the lock.
It caused huge lock contention on workloads with many big parallel
writes.  This change splits the process into two parts: first,
zil_lwb_assign() estimates the log space needed for all transactions,
and zil_lwb_write_close() allocates blocks and zios while holding the
lock, then, after the lock in dropped, zil_lwb_commit() copies the
data, and zil_lwb_write_issue() issues the I/Os.

Also while there slightly reduce scope of zl_lock.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #14841
2023-05-25 09:48:43 -07:00
Dimitri John Ledkov 79b20949b2 systemd: Use non-absolute paths in Exec* lines
Since systemd v239, Exec* binaries are resolved from PATH when they
are not-absolute. Switch to this by default for ease of downstream
maintenance. Many downstream distributions move individual binaries
to locations that existing compile-time configurations cannot
accommodate.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Closes #14880
2023-05-24 12:31:28 -07:00
Akash B 9d618615d1 Fix concurrent resilvers initiated at same time
For draid vdevs it was possible to initiate both the
sequential and healing resilver at same time.

This fixes the following two scenarios.
     1) There's a window where a sequential rebuild can
be started via ZED even if a healing resilver has been
scheduled.
	- This is fixed by adding additional check in
spa_vdev_attach() for any scheduled resilver and return
appropriate error code when a resilver is already in
progress.

     2) It was possible for zpool clear to start a healing
resilver when it wasn't needed at all. This occurs because
during a vdev_open() the device is presumed to be healthy not
until the device is validated by vdev_validate() and it's set
unavailable. However, by this point an async resilver will
have already been requested if the DTL isn't empty.
	- This is fixed by cancelling the SPA_ASYNC_RESILVER
request immediately at the end of vdev_reopen() when a resilver
is unneeded.

Finally, added a testcase in ZTS for verification.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #14881
Closes #14892
2023-05-24 12:28:09 -07:00
youzhongyang f8447cf22e Linux 6.4 compat: reclaimed_slab renamed to reclaimed
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14891
2023-05-24 12:23:42 -07:00
Brian Atkinson ad0a554614 Hold db_mtx when updating db_state
Commit 555ef90 did some general code refactoring for
dmu_buf_will_not_fill() and dmu_buf_will_fill(). However, the db_mtx was
not held when update db->db_state in those code block. The rest of the
dbuf code always holds the db_mtx when updating db_state. This is
important because cv_wait() db_changed is used to check for db_state
changes.

Updating dmu_buf_will_not_fill() and dmu_buf_will_fill() to hold the
db_mtx when updating db_state.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #14875
2023-05-19 13:05:53 -07:00
Brian Behlendorf 577e835f30 Probe vdevs before marking removed
Before allowing the ZED to mark a vdev as REMOVED due to a
hotplug event confirm that it is non-responsive with probe.
Any device which can be successfully probed should be left
ONLINE to prevent a healthy pool from being incorrectly
SUSPENDED.  This may occur for at least the following two
scenarios.

1) Drive expansion (zpool online -e) in VMware environments.
   If, during the partition resize operation, a partition is
   removed and re-created then udev will send a removed event.

2) Re-scanning the namespaces of an NVMe device (nvme ns-rescan)
   may result in a udev remove and add event being delivered.

Finally, update the ZED to only kick in a spare when the
removal was successful.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14859
Closes #14861
2023-05-19 13:05:09 -07:00
George Amanakis 482eeef804 Teach zpool scrub to scrub only blocks in error log
Added a flag '-e' in zpool scrub to scrub only blocks in error log. A
user can pause, resume and cancel the error scrub by passing additional
command line arguments -p -s just like a regular scrub. This involves
adding a new flag, creating new libzfs interfaces, a new ioctl, and the
actual iteration and read-issuing logic. Error scrubbing is executed in
multiple txg to make sure pool performance is not affected.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Co-authored-by: TulsiJain tulsi.jain@delphix.com
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #8995
Closes #12355
2023-05-18 11:59:42 -07:00
Brian Behlendorf e34e15ed6d Add the ability to uninitialize
zpool initialize functions well for touching every free byte...once.
But if we want to do it again, we're currently out of luck.

So let's add zpool initialize -u to clear it.

Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12451 
Closes #14873
2023-05-18 10:02:20 -07:00
Antonio Russo e0d5007bcf test-runner: pass kmemleak and kmsg to Cmd.run
test-runner.py orchestrates all of the ZTS executions. The `Cmd` object
manages these process, and its `run` method specifically invokes these
possibly long-running processes, possibly retrying in the event of a
timeout. Since its inception, memory leak detection using the kmemleak
infrastructure [1], and kernel logging [2] have been added to this run
mechanism.

However, the callback to cull a process beyond its timeout threshold,
`kill_cmd`, has evaded modernization by both of these changes. As a
result, this function fails to properly invoke `run`, leading to an
untrapped exception and unreported test failure.

This patch extends `kill_cmd` to receive these kernel devices through
the `options` parameter, and regularizes all the `.run` calls from
`Cmd`, and its subclasses, to accept that parameter.

[1] Commit a69765ea5b
[2] Commit fc2c0256c5

Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #14849
2023-05-15 16:11:33 -07:00
Richard Yao ee7b71dbc9 Fix undefined behavior in spa_sync_props()
8eae2d214c caused Coverity to begin
complaining about "Improper use of negative value" in two places in
spa_sync_props() because Coverity correctly inferred from `prop ==
ZPOOL_PROP_INVAL` that prop could be -1 while both zpool_prop_to_name()
and zpool_prop_get_type() use it an array index, which is undefined
behavior.

Assuming that the system does not panic from an attempt to read invalid
memory, the case statement for ZPOOL_PROP_INVAL will ensure that only
user properties will reach this code when prop is ZPOOL_PROP_INVAL, such
that execution will continue safely. However, if we are unlucky enough
to read invalid memory, then the system will panic.

This issue predates the patch that caused coverity to begin complaining.
Thankfully, our userland tools do not pass nonsense to us, so this bug
should not be triggered unless a future userland tool attempts to set a
property that we do not understand.

Reported-by: Coverity (CID-1561129)
Reported-by: Coverity (CID-1561130)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14860
2023-05-15 10:29:05 -07:00
Richard Yao c87798d8ff Fix use after free regression in spa_remove_healed_errors()
6839ec6f10 placed code in
spa_remove_healed_errors() that uses a pointer after the kmem_free()
call that frees it.

Reported-by: Coverity (CID-1562375)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14860
2023-05-15 10:29:01 -07:00
Alexander Motin 7381ddf1ab zil: Free lwb_buf after write completion.
There is no sense to keep that memory allocated during the flush.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #14855
2023-05-12 09:49:26 -07:00
Alexander Motin 895e03135e zil: Some micro-optimizations.
Should not cause functional changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #14854
2023-05-12 09:14:29 -07:00
Don Brady da211a4a33 Refine special_small_blocks property validation
When the special_small_blocks property is being set during a pool 
create it enforces a limit of 128KiB even if the pool's record size 
is larger.

If the recordsize property is being set during a pool create, then 
use that value instead of the default SPA_OLD_MAXBLOCKSIZE value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <dev.fs.zfs@gmail.com>
Closes #13815
Closes #14811
2023-05-12 09:12:28 -07:00
Brian Behlendorf 5b3b6e95c0 ZTS: Add auto_replace_001_pos to exceptions
The auto_replace_001_pos test case does not reliably pass on
Fedora 37 and newer.  Until the test case can be updated to make
it reliable add it to the list of "maybe" exceptions on Linux.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14851
Closes #14852
2023-05-12 09:07:58 -07:00
Pawel Jakub Dawidek e610766838 Make sure we are not trying to clone a spill block.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825
2023-05-11 16:07:15 -07:00
Pawel Jakub Dawidek fbbe5e96ef Correct comment.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825
2023-05-11 16:07:11 -07:00
Pawel Jakub Dawidek 9879930f7a Remove badly placed comment.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825
2023-05-11 16:07:07 -07:00
Pawel Jakub Dawidek b6d7370b9d Don't call zfs_exit_two() before zfs_enter_two().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825
2023-05-11 16:07:02 -07:00
Pawel Jakub Dawidek d0d91f185e Don't use dmu_buf_is_dirty() for unassigned transaction.
The dmu_buf_is_dirty() call doesn't make sense here for two reasons:
1. txg is 0 for unassigned tx, so it was a no-op.
2. It is equivalent of checking if we have dirty records and we are doing
   this few lines earlier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825
2023-05-11 16:06:57 -07:00
Pawel Jakub Dawidek bd8c6bd66f Deny block cloning is dbuf size doesn't match BP size.
I don't know an easy way to shrink down dbuf size, so just deny block cloning
into dbufs that don't match our BP's size.

This fixes the following situation:
1. Create a small file, eg. 1kB of random bytes. Its dbuf will be 1kB.
2. Create a larger file, eg. 2kB of random bytes. Its dbuf will be 2kB.
3. Truncate the large file to 0. Its dbuf will remain 2kB.
4. Clone the small file into the large file. Small file's BP lsize is
   1kB, but the large file's dbuf is 2kB.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825
2023-05-11 16:06:52 -07:00
Pawel Jakub Dawidek 555ef90c5c Additional block cloning fixes.
Reimplement some of the block cloning vs dbuf logic, mostly to fix
situation where we clone a block and in the same transaction group
we want to partially overwrite the clone.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14825
2023-05-11 16:06:36 -07:00
Alexander Motin 469019fb0b zil: Don't expect zio_shrink() to succeed.
At least for RAIDZ zio_shrink() does not reduce zio size, but reduced
wsz in that case likely results in writing uninitialized memory.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #14853
2023-05-11 14:27:12 -07:00
Ameer Hamza 14ba8ab97d Prevent panic during concurrent snapshot rollback and zvol read
Protect zvol_cdev_read with zv_suspend_lock to prevent concurrent
release of the dnode, avoiding panic when a snapshot is rolled back
in parallel during ongoing zvol read operation.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14839
2023-05-09 17:56:35 -07:00
Tony Hutter d3db900a4e pam: Fix "buffer overflow" in pam ZTS tests on F38
The pam ZTS tests were reporting a buffer overflow on F38, possibly
due to F38 now setting _FORTIFY_SOURCE=3 by default.  gdb and
valgrind narrowed this down to a snprintf() buffer overflow in
zfs_key_config_modify_session_counter().  I'm not clear why this
particular snprintf() was being flagged as an overflow, but when
I replaced it with an asprintf(), the test passed reliably.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #14802 
Closes #14842
2023-05-09 17:55:19 -07:00
Brian Behlendorf 903c3613d4 Add dmu_tx_hold_append() interface
Provides an interface which callers can use to declare a write when
the exact starting offset in not yet known.  Since the full range
being updated is not available only the first L0 block at the
provided offset will be prefetched.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14819
2023-05-09 09:03:10 -07:00
Brian Behlendorf c8b3dda186 Debug auto_replace_001_pos failures
Reduced the timeout to 60 seconds which should be more than
sufficient and allow the test to be marked as FAILED rather
than KILLED.  Also dump the pool status on cleanup.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14829
2023-05-09 08:57:02 -07:00
George Amanakis d38c815fe2 Remove duplicate code in l2arc_evict()
l2arc_evict() performs the adjustment of the size of buffers to be
written on L2ARC unnecessarily. l2arc_write_size() is called right
before l2arc_evict() and performs those adjustments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14828
2023-05-09 08:54:41 -07:00
Alexander Motin b035f2b2cb Remove single parent assertion from zio_nowait().
We only need to know if ZIO has any parent there.  We do not care if
it has more than one, but use of zio_unique_parent() == NULL asserts
that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14823
2023-05-09 08:54:01 -07:00
George Amanakis 6839ec6f10 Enable the head_errlog feature to remove errors
In case check_filesystem() does not error out and does not report
an error, remove that error block from error lists and logs
without requiring a scrub. This can happen when the original file and
all snapshots/clones referencing it have been removed.

Otherwise zpool status will still report that "Permanent errors have
been detected..." without actually reporting any of them.

To implement this change the functions introduced in corrective
receive were modified to take into account the head_errlog feature.

Before this change:
=============================
pool: test
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
config:

        NAME                   STATE     READ WRITE CKSUM
        test                   ONLINE       0     0     0
          /home/user/vdev_a    ONLINE       0     0     2

errors: Permanent errors have been detected in the following files:

=============================

After this change:
=============================
  pool: test
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are
unaffected.
action: Determine if the device needs to be replaced, and clear the
errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
config:

        NAME                   STATE     READ WRITE CKSUM
        test                   ONLINE       0     0     0
          /home/user/vdev_a    ONLINE       0     0     2

errors: No known data errors
=============================

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14813
2023-05-09 08:53:27 -07:00
George Amanakis 4eca03faaf Fixes in head_errlog feature with encryption
For the head_errlog feature use dsl_dataset_hold_obj_flags() instead of
dsl_dataset_hold_obj() in order to enable access to the encryption keys
(if loaded). This enables reporting of errors in encrypted filesystems
which are not mounted but have their keys loaded.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14837
2023-05-08 13:35:03 -07:00
Matthew Ahrens 3095ca91c2 Verify block pointers before writing them out
If a block pointer is corrupted (but the block containing it checksums
correctly, e.g. due to a bug that overwrites random memory), we can
often detect it before the block is read, with the `zfs_blkptr_verify()`
function, which is used in `arc_read()`, `zio_free()`, etc.

However, such corruption is not typically recoverable.  To recover from
it we would need to detect the memory error before the block pointer is
written to disk.

This PR verifies BP's that are contained in indirect blocks and dnodes
before they are written to disk, in `dbuf_write_ready()`. This way,
we'll get a panic before the on-disk data is corrupted. This will help
us to diagnose what's causing the corruption, as well as being much
easier to recover from.

To minimize performance impact, only checks that can be done without
holding the spa_config_lock are performed.

Additionally, when corruption is detected, the raw words of the block
pointer are logged.  (Note that `dprintf_bp()` is a no-op by default,
but if enabled it is not safe to use with invalid block pointers.)

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14817
2023-05-08 11:20:23 -07:00
Brian Behlendorf dd19821149 zdb: consistent xattr output
When using zdb to output the value of an xattr only interpret it
as printable characters if the entire byte array is printable.
Additionally, if the --parseable option is set always output the
buffer contents as octal for easy parsing.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14830
2023-05-08 11:17:41 -07:00
Brian Behlendorf 245f4a3467 ZTS: add snapshot/snapshot_002_pos exception
Add snapshot_002_pos to the known list of occasional failures
for FreeBSD until it can be made entirely reliable.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14831
Closes #14832
2023-05-08 10:09:30 -07:00
Alexander Motin 190290a9ac Fix two abd_gang_add_gang() issues.
- There is no reason to assert that added gang is not empty.  It
may be weird to add an empty gang, but it is legal.
 - When moving chain list from the added gang clear its size, or it
will trigger assertion in abd_verify() when that gang is freed.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14816
2023-05-05 09:17:55 -07:00
Pawel Jakub Dawidek 6fa6bb051c Simplify and optimize random_int_between().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14805
2023-05-05 09:09:12 -07:00
Pawel Jakub Dawidek 599df82049 Plug memory leak in zfsdev_state.
On kernel module unload, free all zfsdev state structures, except for
zfsdev_state_listhead, which is statically allocated.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14824
2023-05-05 08:51:41 -07:00
Ameer Hamza 82ac409acc zpool import -m also removing spare and cache when log device is missing
spa_import() relies on a pool config fetched by spa_try_import() for
spare/cache devices. Import flags are not passed to spa_tryimport(),
which makes it return early due to a missing log device and missing
retrieving the cache device and spare eventually. Passing
ZFS_IMPORT_MISSING_LOG to spa_tryimport() makes it fetch the correct
configuration regardless of the missing log device.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14794
2023-05-03 15:10:32 -07:00
buzzingwires a46001adb9 Allow zhack label repair to restore detached devices.
This commit expands on the zhack label repair command in d04b5c9 by
adding the -u option to undetach a device by regenerating uberblocks,
in addition to the existing functionality of fixing checksums, now
represented by -c. Previous behavior is retained in the case of no
options.

The changes are heavily inspired by Jeff Bonwick's labelfix
utility, as archived at:

https://gist.github.com/jjwhitney/baaa63144da89726e482

Additionally, it is now capable of properly determining the size of
block devices and other media, as well as handling sizes which are
not divisible by 2^18. This should make it viable for use on physical
devices and partitions, in addition to files.

These changes should make it possible to import zpools that have had
their uberblocks erased, such as in the case of pools rendered
inaccessible by erroneous detach commands.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: buzzingwires <buzzingwires@outlook.com>
Closes #14773
2023-05-03 09:03:57 -07:00
George Amanakis 9de5300c7f Optimize check_filesystem() and process_error_log()
Integrate check_clones() into check_filesystem() and implement a list
instead of iterating recursively over the clones, thus eliminating the
risk of a stack overflow.

Also use kmem_zalloc() to allocate large structures in
process_error_log() reducing its stack size from ~700 to ~128 bytes.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14744
2023-05-03 09:00:14 -07:00
Pawel Jakub Dawidek d96e29576c Use correct block pointer in block cloning case.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14806
2023-05-02 09:24:26 -07:00
Brian Behlendorf 012829df0c Wrap clang specific pragma
Clang specific pragmas need to be wrapped to prevent a build
warning when compiling with gcc.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14814
2023-05-02 09:21:47 -07:00
Mateusz Guzik e2a92d726e blake3: fix up bogus checksums in face of cpu migration
This is a temporary measure until a better fix is sorted out.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Closes #14785
Closes #14808
2023-05-01 17:21:27 -07:00
Serapheim Dimitropoulos 0c93d86f01 Correct ABD size for split block ZIOs
Currently when layering the ABD buffer of each split block on top of
an indirect vdev's ZIO ABD we don't specify the split block's ABD.
This results in those ABDs being incorrectly sized by inheriting
the size of their parent ABD which is larger than what each split
block needs.

The above behavior isn't causing any bugs currently but can lead
to unexpected ABD sizes for people analyzing and/or working on
the ZIO codepath. This patch fixes this behavior by properly setting
the ABD size for split block ZIOs.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14804
2023-05-01 17:18:42 -07:00
Justin Hibbits 5a83f761c7 powerpc64: Support ELFv2 asm on Big Endian
FreeBSD/powerpc64 is all ELFv2 since FreeBSD 13, even big endian.  The
existing sha256 and sha512 asm code assumes that BE is all ELFv1, and LE
is ELFv2.  Minor changes to add ELFv2 in the BE side gets this working
correctly on FreeBSD with latest OpenZFS import.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Justin Hibbits <chmeeedalf@gmail.com>
Closes #14779
2023-04-27 12:49:21 -07:00
Alexander Motin 2fd1c30423 Mark TX_COMMIT transaction with TXG_NOTHROTTLE.
TX_COMMIT has no on-disk representation and does not produce any more
dirty data.  It should not wait for anything, and even just skipping
the checks if not waiting gives improvement noticeable in profiler.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14798
2023-04-27 12:32:58 -07:00
Val Packett ae0d0f0e04 PAM: support the authentication facility
Implement the pam_sm_authenticate method, using the noop argument of
lzc_load_key to do a passphrase check without actually loading the key.

This allows using ZFS as the source of truth for user passwords,
without storing any password hashes in /etc or using other PAM modules.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Signed-off-by: Val Packett <val@packett.cool>
Closes #14789
2023-04-27 09:49:03 -07:00
Tino Reichardt ee728008a4 Fix BLAKE3 aarch64 assembly for FreeBSD and macOS
The x18 register isn't useable within FreeBSD kernel space, so we
have to fix the BLAKE3 aarch64 assembly for not using it.

The source files are here: https://github.com/mcmilk/BLAKE3-tests

Reviewed-by: Kyle Evans <kevans@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14728
2023-04-26 12:40:26 -07:00
Brian Behlendorf b5411618f7 Fix checkstyle warning
Resolve a missed checkstyle warning.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14799
2023-04-26 11:49:16 -07:00
Alexander Motin bba7cbf0a4 Fix positive ABD size assertion in abd_verify().
Gang ABDs without childred are legal, and they do have zero size.
For other ABD types zero size doesn't have much sense and likely
not working correctly now.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14795
2023-04-26 09:20:43 -07:00
Mateusz Guzik e37a89d5d0 FreeBSD: fix up EINVAL from getdirentries on .zfs
Without the change:
/.zfs
/.zfs/snapshot
find: /.zfs: Invalid argument

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #14774
2023-04-26 09:16:37 -07:00
Mateusz Guzik 88b8777159 FreeBSD: add missing vn state transition for .zfs
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #14774
2023-04-26 09:16:09 -07:00
Rob N b69cb06664 tests/zdb_encrypted: parse numbers a little more robustly
On FreeBSD, `wc` prints some leading spaces, while on Linux it does not.
So we tell ksh to expect an integer, and it does the rest.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14791
Closes #14797
2023-04-26 08:50:44 -07:00
Brian Behlendorf d960beca61 zdb: Fix minor memory leak
Commit 6b6aaf6dc2 introduced a small
memory leak in zdb.  This was detected by the LeakSanitizer and was
causing all ztest runs to fail.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14796
2023-04-26 08:43:39 -07:00
Brian Behlendorf 0e8a42bbee Revert "Fix data race between zil_commit() and zil_suspend()"
This reverts commit 4c856fb333 to
resolve a newly introduced deadlock which in practice in more
disruptive that the issue this commit intended to address.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14775
Closes #14790
2023-04-25 16:40:55 -07:00
Han Gao 6d59d5df98 Add loongarch64 support
Add loongarch64 definitions & lua module setjmp asm

LoongArch is a new RISC ISA, which is a bit like MIPS or RISC-V.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Han Gao <gaohan@uniontech.com>
Signed-off-by: WANG Xuerui <xen0n@gentoo.org>
Closes #13422
2023-04-25 16:05:45 -07:00
Rich Ercolani 6b6aaf6dc2 Taught zdb -bb to print metadata totals
People often want estimates of how much of their pool is occupied
by metadata, but they end up using lots of text processing on zdb's
output to get it.

So let's just...provide it for them.

Now, zdb -bbbs will output something like:

Blocks  LSIZE   PSIZE   ASIZE     avg    comp   %Total  Type
[...]
    68  1.06M    272K    544K      8K    4.00     0.00      L6 Total
 1.71K   212M   6.85M   13.7M      8K   30.91     0.00      L5 Total
 1.71K   212M   6.85M   13.7M      8K   30.91     0.00      L4 Total
 1.73K   214M   6.92M   13.8M      8K   30.89     0.00      L3 Total
 18.7K  2.29G    111M    221M   11.8K   21.19     0.00      L2 Total
 3.56M   454G   28.4G   56.9G   16.0K   15.97     0.19      L1 Total
  308M  36.8T   28.2T   28.6T   95.1K    1.30    99.80      L0 Total
  311M  37.3T   28.3T   28.6T   94.2K    1.32   100.00  Total
 50.4M   774G    113G    291G   5.77K    6.85     0.99  Metadata Total

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14746
2023-04-24 16:55:07 -07:00
Mateusz Guzik 81a2b2e6a6 FreeBSD: add missing vop_fplookup assignments
It became illegal to not have them as of
5f6df177758b9dff88e4b6069aeb2359e8b0c493 ("vfs: validate that vop
vectors provide all or none fplookup vops") upstream.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #14788
2023-04-24 16:15:42 -07:00
Mateusz Guzik ff0e135e25 FreeBSD: try to fallback early if can't do optimized copy
Not complete, but already shaves on some locking.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Closes #14723
2023-04-24 16:13:52 -07:00
Mateusz Guzik a7982d5d30 FreeBSD: fix up EXDEV handling for clone_range
API contract requires VOPs to handle EXDEV internally, worst case by
falling back to the generic copy routine. This broke with the recent
changes.

While here whack custom loop to lock 2 vnodes with vn_lock_pair, which
provides the same functionality internally. write start/finish around
it plays no role so got eliminated.

One difference is that vn_lock_pair always takes an exclusive lock on
both vnodes. I did not patch around it because current code takes an
exclusive lock on the target vnode. zfs supports shared-locking for
writes, so this serializes different calls to the routine as is, despite
range locking inside. At the same time you may notice the source vnode
can get some traffic if only shared-locked, thus once more this goes
the safer route of exclusive-locking. Note this should be patched to
use shared-locking for both once the feature is considered stable.

Technically the switch to vn_lock_pair should be a separate change, but
it would only introduce churn immediately whacked by the rest of the
patch.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Closes #14723
2023-04-24 16:13:09 -07:00
Dimitry Andric 62cc9d4f6b FreeBSD: make zfs_vfs_held() definition consistent with declaration
Noticed while attempting to change FreeBSD's boolean_t into an actual
bool: in include/sys/zfs_ioctl_impl.h, zfs_vfs_held() is declared to
return a boolean_t, but in module/os/freebsd/zfs/zfs_ioctl_os.c it is
defined to return an int. Make the definition match the declaration.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #14776
2023-04-21 10:22:52 -07:00
Allan Jude 8eae2d214c Add support for zpool user properties
Usage:

    zpool set org.freebsd:comment="this is my pool" poolname

Tests are based on zfs_set's user property tests.

Also stop truncating property values at MAXNAMELEN, use ZFS_MAXPROPLEN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Sponsored-by: Beckhoff Automation GmbH & Co. KG.
Sponsored-by: Klara Inc.
Closes #11680
2023-04-21 10:20:36 -07:00
Richard Yao 135d9a9048 Linux: Suppress -Wordered-compare-function-pointers in tracepoint code
Clang points out that there is a comparison against -1, but we cannot
fix it because that is from the kernel headers, which we must support.
We can workaround this by using a pragma.

Sponsored-By: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Closes #14738
2023-04-20 10:31:01 -07:00
Richard Yao ab71b24d20 Linux: zfs_zaccess_trivial() should always call generic_permission()
Building with Clang on Linux generates a warning that err could be
uninitialized if mnt_ns is a NULL pointer. However, mnt_ns should never
be NULL, so there is no need to put this behind an if statement.  Taking
it outside of the if statement means that the possibility of err being
uninitialized goes from being always zero in a way that the compiler
could not realize to a way that is always zero in a way that the
compiler can realize.

Sponsored-By: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Closes #14738
2023-04-20 10:29:44 -07:00
Brian Behlendorf a10c645616 ZTS: zvol_misc_trim retry busy export
Retry the export if the pool is busy due to an open zvol.
Observed in the CI on Fedora 37.

  cannot export 'testpool': pool is busy
  ERROR: zpool export testpool exited 1

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14769
2023-04-20 10:25:16 -07:00
rob-wing 3e4ed4213d Create zap for root vdev
And add it to the AVZ, this is not backwards compatible with older pools
due to an assertion in spa_sync() that verifies the number of ZAPs of
all vdevs matches the number of ZAPs in the AVZ.

Granted, the assertion only applies to #DEBUG builds - still, a feature
flag is introduced to avoid the assertion, com.klarasystems:vdev_zaps_v2

Notably, this allows to get/set properties on the root vdev:

    % zpool set user:prop=value <pool> root-0

Before this commit, it was already possible to get/set properties on
top-level vdevs with the syntax <type>-<vdev_id> (e.g. mirror-0):

    % zpool set user:prop=value <pool> mirror-0

This syntax also applies to the root vdev as it is is of type 'root'
with a vdev_id of 0, root-0. The keyword 'root' as an alias for
'root-0'.

The following tests have been added:

    - zpool get all properties from root vdev
    - zpool set a property on root vdev
    - verify root vdev ZAP is created

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Sponsored-by: Seagate Technology
Submitted-by: Klara, Inc.
Closes #14405
2023-04-20 10:07:56 -07:00
Herb Wartens 71d191ef25 Allow MMP to bypass waiting for other threads
At our site we have seen cases when multi-modifier protection is enabled
(multihost=on) on our pool and the pool gets suspended due to a single
disk that is failing and responding very slowly. Our pools have 90 disks
in them and we expect disks to fail. The current version of MMP requires
that we wait for other writers before moving on. When a disk is
responding very slowly, we observed that waiting here was bad enough to
cause the pool to suspend. This change allows the MMP thread to bypass
waiting for other threads and reduces the chances the pool gets
suspended.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Herb Wartens <hawartens@gmail.com>
Closes #14659
2023-04-19 13:22:59 -07:00
Paul Dagnelie d4657835c8 ZTS: send-c_volume is flaky
We use block_device_wait to wait for the zvol block device to 
actually appear, and we log the result of the dd calls by using 
an intermediate file.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #14767
2023-04-19 13:20:02 -07:00
Ameer Hamza 719534ca8e Fix "Detach spare vdev in case if resilvering does not happen"
Spare vdev should detach from the pool when a disk is reinserted.
However, spare detachment depends on the completion of resilvering,
and if resilver does not schedule, the spare vdev keeps attached to
the pool until the next resilvering. When a zfs pool contains
several disks (25+ mirror), resilvering does not always happen when
a disk is reinserted. In this patch, spare vdev is manually detached
from the pool when resilvering does not occur and it has been tested
on both Linux and FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14722
2023-04-19 09:04:32 -07:00
наб 3d37e7e5f5 zfsprops.7: update mandlock
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=f7e33bdbd6d1bdf9c3df8bba5abcf3399f957ac3
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=7e59106e9c34458540f7d382d5b49071d1b7104f

Fixes: commit fb9baa9b20 ("zfsprops.8:
 remove nbmand-not-used-on-Linux and pointer to mount(8)")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #14765
2023-04-19 09:03:42 -07:00
youzhongyang 23f84d161e Silence clang warning of flexible array not at end
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14764
2023-04-18 18:10:40 -07:00
Low-power f9e1c63f8c Values printed by zpool-iostat(8) should be right-aligned
This inappropriate left-alignment was introduced in 7bb7b1f.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #14751
2023-04-18 11:34:41 -07:00
Tony Hutter accfdeb948 Revert "ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()"
This reverts commit 4b3133e671.

Users identified this commit as a possible source of data
corruption:
https://github.com/openzfs/zfs/issues/14753

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Issue #14753 
Closes #14761
2023-04-18 08:41:52 -07:00
Rich Ercolani 8ed62440ef Work around Raspberry Pi kernel packaging oddities
On Debian and Ubuntu and friends, you get something like
"linux-image-$(uname -r)" and "linux-headers-$(uname -r)" you
can put a Depends on.

On Raspberry Pi OS, you get "raspberrypi-kernel" and
"raspberrypi-kernel-headers", with version numbers like 20230411.

There is not, as far as I can tell, a reasonable way to map that
to a kernel version short of reaching out and digging around in
the changelogs or Makefile, so just special-case it so the packages
don't fail to install at install time. They still might not build
if the versions don't match, but I don't see a way to do anything
about that...

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14745
Closes #14747
2023-04-17 17:38:09 -07:00
Pawel Jakub Dawidek 3b5af20139 Fix VERIFY(!zil_replaying(zilog, tx)) panic
The zfs_log_clone_range() function is never called from the
zfs_clone_range_replay() function, so I assumed it is safe to assert
that zil_replaying() is never TRUE here. It turns out zil_replaying()
also returns TRUE when the sync property is set to disabled.

Fix the problem by just returning if zil_replaying() returns TRUE.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported by: Florian Smeets
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14758
2023-04-17 16:42:09 -07:00
dodexahedron ac18dc77f3 Minor improvements to zpoolconcepts.7
* Fixed one typo (effects -> affects)
 * Re-worded raidz description to make it clearer that it is not
   quite the same as RAID5, though similar
 * Clarified that data is not necessarily written in a static
   stripe width
 * Minor grammar consistency improvement
 * Noted that "volumes" means zvols
 * Fixed a couple of split infinitives
 * Clarified that hot spares come from the same pool they were
   assigned to
* "we" -> ZFS
* Fixed warnings thrown by mandoc, and removed unnecessary
  wordiness in one fixed line.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brandon Thetford <brandon@dodecatec.com>
Closes #14726
2023-04-13 09:15:34 -07:00
youzhongyang 27a82cbb3e Linux 6.3 compat: Fix memcpy "detected field-spanning write" error
Add a new union member of flexible array to dnode_phys_t and use
it in the macro so we can silence the memcpy() fortify error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14737
2023-04-13 09:12:03 -07:00
Pawel Jakub Dawidek c71fe71640 Fix data corruption when cloning embedded blocks
Don't overwrite blk_phys_birth, as for embedded blocks it is part of
the payload.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Issue #13392 
Closes #14739
2023-04-12 16:15:05 -07:00
наб 6e015933f8 initramfs: source user scripts from /e/z/initramfs-tools-load-key{,.d/*}
By dropping in a file in a directory (for packages) or by making a file
(for local administrators), custom key loading methods may be provided
for the rootfs and necessities.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nicholas Morris <security@niwamo.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Co-authored-by: Nicholas Morris <security@niwamo.com>
Supersedes: #14704
Closes: #13757
Closes #14733
2023-04-12 10:08:49 -07:00
George Amanakis 574e09d8c6 Fix in check_filesystem()
Fix the code in case of missing snapshots. Previously the check was in
a conditional that would be executed if the filesystem had snapshots.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14735
2023-04-12 08:53:53 -07:00
Alan Somers 678a3b8f99 Trim needless zeroes from checksum events
The ereport.fs.zfs.checksum event contains histograms of the bits that
were wrongly set or cleared according to their bit position in a 64-bit
word.  So the maximum value that any histogram bucket could have would
be 64.  But ZFS currently uses a uint32_t to hold each bucket.  As a
result, the event report is full of needless zeroes.

Change the bucket size to uint8_t, stripping 768 needless zeros from
each event.

Original event format:
```
 class=ereport.fs.zfs.checksum ena=639460469834258433 pool=testpool.1933 pool_guid=4979719877084416563 pool_state=0 pool_context=0 pool_failmode=wait vdev_guid=4136721804819128578 vdev_type=file vdev_path=/tmp/kyua.1TxP3A/2/work/file1.1933 vdev_ashift=9 vdev_complete_ts=609837019678 vdev_delta_ts=33450 vdev_read_errors=0 vdev_write_errors=0 vdev_cksum_errors=20 vdev_delays=0 parent_guid=2751977006639883417 parent_type=raidz vdev_spare_guids= zio_err=0 zio_flags=1048752 zio_stage=4194304 zio_pipeline=65011712 zio_delay=0 zio_timestamp=0 zio_delta=0 zio_priority=4 zio_offset=702976 zio_size=1024 zio_objset=24 zio_object=0 zio_level=3 zio_blkid=0 bad_ranges=0000000000000400 bad_ranges_min_gap=8 bad_range_sets=0000079e bad_range_clears=00000854 bad_set_histogram=000000210000001a000000150000001d000000240000001b000000220000001b000000210000002100000018000000260000002300000025000000210000001e000000250000001b0000001d0000001e0000001600000025000000180000001b000000240000001b000000240000001b0000001c000000210000001b0000001e000000210000001a0000001e000000220000001d0000001b000000200000001f0000001a000000250000001f0000001d0000001b0000001d000000240000001d0000001b0000001b0000001f00000024000000190000001a0000001f0000001e000000240000001e0000002400000021000000200000001d0000001d00000021 bad_cleared_histogram=000000220000002700000021000000210000001b0000001a000000250000001f0000001c0000001e0000002400000022000000220000002400000022000000240000002200000021000000220000001b0000002100000021000000190000001b000000240000002400000020000000290000002a00000028000000250000002400000020000000270000002500000016000000270000001c000000210000001f000000240000001c0000002100000022000000240000002100000023000000210000002700000022000000240000001b00000022000000210000001c00000023000000150000002600000020000000270000001e0000001d0000002400000026 time=00000016806457270000000323406839 eid=458
```

New format:
```
 class=ereport.fs.zfs.checksum ena=96599319807790081 pool=testpool.1933 pool_guid=1236902063710799041 pool_state=0 pool_context=0 pool_failmode=wait vdev_guid=2774253874431514999 vdev_type=file vdev_path=/tmp/kyua.6Temlq/2/work/file1.1933 vdev_ashift=9 vdev_complete_ts=92124283803 vdev_delta_ts=46670 vdev_read_errors=0 vdev_write_errors=0 vdev_cksum_errors=20 vdev_delays=0 parent_guid=8090931855087882905 parent_type=raidz vdev_spare_guids= zio_err=0 zio_flags=1048752 zio_stage=4194304 zio_pipeline=65011712 zio_delay=0 zio_timestamp=0 zio_delta=0 zio_priority=4 zio_offset=1028608 zio_size=512 zio_objset=0 zio_object=0 zio_level=0 zio_blkid=4 bad_ranges=0000000000000200 bad_ranges_min_gap=8 bad_range_sets=0000061f bad_range_clears=000001f4 bad_set_histogram=1719161c1c1c101618171a151a1a19161e1c171d1816161c191f1a18192117191c131d171b1613151a171419161a1b1319101b14171b18151e191a1b141a1c17 bad_cleared_histogram=06090a0808070a0b020609060506090a01090a050a0a0509070609080d050d0607080d060507080c04070807070a0608020c080c080908040808090a05090a07 time=00000016806477050000000604157480 eid=62
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Alan Somers <asomers@FreeBSD.org>
Sponsored-by: Axcient
Closes #14716
2023-04-10 14:24:27 -07:00
youzhongyang d4dc53dad2 Linux 6.3 compat: idmapped mount API changes
Linux kernel 6.3 changed a bunch of APIs to use the dedicated idmap 
type for mounts (struct mnt_idmap), we need to detect these changes 
and make zfs work with the new APIs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14682
2023-04-10 14:15:36 -07:00
Kyle Evans d0cbd9feaf module: freebsd: fix aarch64 fpu handling
Just like x86, aarch64 needs to use the fpu_kern(9) API around FPU
usage, otherwise we panic promptly at boot as soon as ZFS attempts to
do checksum benchmarking.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #14715
2023-04-10 12:40:18 -07:00
Kyle Evans dee77f45d0 module: resync part of Makefile.bsd
sha256-armv8.S and sha512-armv8.S need the same treatment as the sse
bits; removal of -mgeneral-regs-only from flags.

This fixes errors about requiring NEON, which is a difference in clang
vs. gcc treatment of -mgeneral-regs-only being specified on asm files.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #14715
2023-04-10 12:39:43 -07:00
Rob N baca06c258 libzfs: add v2 iterator interfaces
f6a0dac84 modified the zfs_iter_* functions to take a new "flags"
parameter, and introduced a variety of flags to ask the kernel to limit
the results in various ways, reducing the amount of work the caller
needed to do to filter out things they didn't need.

Unfortunately this change broke the ABI for existing clients (read:
older versions of the `zfs` program), and was reverted 399b98198.

dc95911d2 reintroduced the original patch, with the understanding that a
backwards-compatible fix would be made before the 2.2 release branch was
tagged. This commit is that fix.

This introduces zfs_iter_*_v2 functions that have the new flags
argument, and reverts the existing functions to not have the flags
parameter, as they were before. The old functions are now reimplemented
in terms of the new, with flags set to 0.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Original-patch-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Closes #14597
2023-04-10 11:53:02 -07:00
Rob N ff73574cd8 vdev: expose zfs_vdev_max_ms_shift as a module parameter
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Seagate Technology LLC
Closes #14719
2023-04-06 10:52:50 -07:00
George Amanakis a8a127e2c9 Fix typo in check_clones()
Run kmem_free() after zap_cursor_fini().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14702
2023-04-06 10:46:18 -07:00
Damian Szuberski 8ab674ab9c ZTS: add existing tests to runfiles
Some test cases were committed to the repository but never added to
runfiles.
Move `zfs_unshare_008_pos` to the Linux-only runfile.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14701
2023-04-06 10:43:24 -07:00
Andrew Innes 0b8fdb8ade ZTS: Use inbuilt monotonic time
Make the test runner try to use the included python monotonic time
function instead of calling librt.

This makes the test runner work on macos where librt wasn't available.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Closes #14700
2023-04-06 10:40:23 -07:00
Martin Matuška a3f82aec93 Miscellaneous FreBSD compilation bugfixes
Add missing machine/md_var.h to spl/sys/simd_aarch64.h and
spl/sys/simd_arm.h

In spl/sys/simd_x86.h, PCB_FPUNOSAVE exists only on amd64, use PCB_NPXNOSAVE
on i386

In FreeBSD sys/elf_common.h redefines AT_UID and AT_GID on FreeBSD, we need
a hack in vnode.h similar to Linux. sys/simd.h needs to be included early.

In zfs_freebsd_copy_file_range() we pass a (size_t *)lenp to
zfs_clone_range() that expects a (uint64_t *)

Allow compiling armv6 world by limiting ARM macros in sha256_impl.c and
sha512_impl.c to __ARM_ARCH > 6

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Signed-off-by: WHR <msl0000023508@gmail.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #14674
2023-04-06 10:35:02 -07:00
Rob N ece7ab7e7d vdev: expose zfs_vdev_def_queue_depth as a module parameter
It was previously available only to FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Seagate Technology LLC
Closes #14718
2023-04-06 10:31:19 -07:00
Paul Dagnelie b66c2a0899 Storage device expansion "silently" fails on degraded vdev
When a vdev is degraded or faulted, we refuse to expand it when doing
online -e. However, we also don't actually cause the online command
to fail, even though the disk didn't expand. This is confusing and
misleading, and can result in violated expectations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes 14145
2023-04-06 10:29:27 -07:00
Alexander Motin 1038f87c4e Fix some signedness issues in arc_evict()
It may happen that "wanted total ARC size" (wt) is negative, that was
expected.  But multiplication product of it and unsigned fractions
result in unsigned value, incorrectly shifted right with a sing loss.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #14692
2023-04-05 10:42:22 -07:00
youzhongyang 8eb2f26057 Linux 6.3 compat: writepage_t first arg struct folio*
The type def of writepage_t in kernel 6.3 is changed to take
struct folio* as the first argument. We need to detect this
change and pass correct function to write_cache_pages().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14699
2023-04-05 10:01:38 -07:00
Tino Reichardt 6ecdd35bdb Fix "Add colored output to zfs list"
Running `zfs list -o avail rpool` resulted in a core dump.
This commit will fix this.

Run the needed overhead only, when `use_color()` is true.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14712
2023-04-05 09:57:01 -07:00
наб 3399a30ee0 contrib: dracut: fix race with root=zfs:dset when necessities required
This had always worked in my testing, but a user on hardware reported
this to happen 100%, and I reproduced it once with cold VM host caches.

dracut-zfs-generator runs as a systemd generator, i.e. at Some
Relatively Early Time; if root= is a fixed dataset, it tries to
"solve [necessities] statically at generation time".

If by that point zfs-import.target hasn't popped (because the import is
taking a non-negligible amount of time for whatever reason), it'll see
no children for the root datase, and as such generate no mounts.

This has never had any right to work. No-one caught this earlier because
it's just that much more convenient to have root=zfs:AUTO, which orders
itself properly.

To fix this, always run zfs-nonroot-necessities.service;
this additionally simplifies the implementation by:
  * making BOOTFS from zfs-env-bootfs.service be the real, canonical,
    root dataset name, not just "whatever the first bootfs is",
    and only set it if we're ZFS-booting
  * zfs-{rollback,snapshot}-bootfs.service can use this instead of
    re-implementing it
  * having zfs-env-bootfs.service also set BOOTFSFLAGS
  * this means the sysroot.mount drop-in can be fixed text
  * zfs-nonroot-necessities.service can also be constant and always
    enabled, because it's conditioned on BOOTFS being set

There is no longer any code generated at run-time
(the sysroot.mount drop-in is an unavoidable gratuitous cp).

The flow of BOOTFS{,FLAGS} from zfs-env-bootfs.service to sysroot.mount
is not noted explicitly in dracut.zfs(7), because (a) at some point it's
just visual noise and (b) it's already ordered via d-p-m.s from z-i.t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #14690
2023-03-31 09:47:48 -07:00
youzhongyang c5431f1465 linux 6.3 compat: needs REQ_PREFLUSH | REQ_OP_WRITE
Modify bio_set_flush() so if kernel version is >= 4.10, flags 
REQ_PREFLUSH and REQ_OP_WRITE are set together.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14695
2023-03-31 09:46:22 -07:00
Brian Behlendorf 1142362ff6 Use vmem_zalloc to silence allocation warning
The kmem allocation in zfs_prune_aliases() will trigger a large
allocation warning on systems with 64K pages.  Resolve this by
switching to vmem_alloc() which internally uses kvmalloc() so the
right allocator will be used based on the allocation size.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8491
Closes #14694
2023-03-31 09:43:54 -07:00
Tony Hutter 21c4b2a944 Linux 6.2 compat: META
Update the META file to reflect compatibility with the 6.2 kernel.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #14689
2023-03-29 15:39:36 -07:00
George Amanakis 431083f75b Fixes in persistent error log
Address the following bugs in persistent error log:

1) Check nested clones, eg "fs->snap->clone->snap2->clone2".

2) When deleting files containing error blocks in those clones (from
   "clone" the example above), do not break the check chain.

3) When deleting files in the originating fs before syncing the errlog
   to disk, do not break the check chain. This happens because at the
   time of introducing the error block in the error list, we do not have
   its birth txg and the head filesystem. If the original file is
   deleted before the error list is synced to the error log (which is
   when we actually lookup the birth txg and the head filesystem), then
   we do not have access to this info anymore and break the check chain.

The most prominent change is related to achieving (3). We expand the
spa_error_entry_t structure to accommodate the newly introduced
zbookmark_err_phys_t structure (containing the birth txg of the error
block).Due to compatibility reasons we cannot remove the
zbookmark_phys_t structure and we also need to place the new structure
after se_avl, so it is not accounted for in avl_find(). Then we modify
spa_log_error() to also provide the birth txg of the error block. With
these changes in place we simplify the previously introduced function
get_head_and_birth_txg() (now named get_head_ds()).

We chose not to follow the same approach for the head filesystem (thus
completely removing get_head_ds()) to avoid introducing new lock
contentions.

The stack sizes of nested functions (as measured by checkstack.pl in the
linux kernel) are:
check_filesystem [zfs]: 272 (was 912)
check_clones [zfs]: 64

We also introduced two new tests covering the above changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14633
2023-03-28 16:51:58 -07:00
Kevin Jin 65d10bd87c Fix short-lived txg caused by autotrim
Current autotrim causes short-lived txg through:

1. calling txg_wait_synced() in metaslab_enable()
2. calling txg_wait_open() with should_quiesce = true

This patch addresses all the issues mentioned above.

A new cv, vdev_autotrim_kick_cv is added to kick autotrim activity.
It will be signaled once a txg is synced so that it does not change 
the original autotrim pace. Also because it is a cv, the wait is 
interruptible which speeds up the vdev_autotrim_stop_wait() call.

Finally, combining big zfs_txg_timeout, txg_wait_open() also causes
delay when exporting a pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Issue #8993
Closes #12194
2023-03-28 08:43:41 -07:00
Brian Behlendorf 64bfa6bae3 Additional limits on hole reporting
Holding the zp->z_rangelock as a RL_READER over the range
0-UINT64_MAX is sufficient to prevent the dnode from being
re-dirtied by concurrent writers.  To avoid potentially
looping multiple times for external caller which do not
take the rangelock holes are not reported after the first
sync.  While not optimal this is always functionally correct.

This change adds the missing rangelock calls on FreeBSD to
zvol_cdev_ioctl().

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14512
Closes #14641
2023-03-28 08:19:03 -07:00
George Wilson a604d3243b Revert "Do not hold spa_config in ZIL while blocked on IO"
This reverts commit 7d638df09b.

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #14678
2023-03-28 08:13:32 -07:00
Rob N aebd94cc85 config: don't link libudev on FreeBSD
FreeBSD has a libudev shim in libudev-devd. If present, configure would
detect it and produce binaries linked against it, even though nothing
used it. That is surprising and unnecessary, so lets remove it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14669
2023-03-27 11:55:54 -07:00
Rich Ercolani ae0b1f66c7 linux 6.3 compat: add another bdev_io_acct case
Linux 6.3+, and backports from it (6.2.8+), changed the
signatures on bdev_io_{start,end}_acct.  Add a case for it.  

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14658
Closes #14668
2023-03-27 11:29:19 -07:00
Ameer Hamza a05263b7aa Update vdev state for spare vdev
zfsd fetches new pool configuration through ZFS_IOC_POOL_STATS but
it does not get updated nvlist configuration for spare vdev since
the configuration is read by spa_spares->sav_config. In this commit,
updating the vdev state for spare vdev that is consumed by zfsd on
spare disk hotplug.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14653
2023-03-24 10:30:38 -07:00
Rich Ercolani 0ad5f43442 Drop lying to the compiler in the fletcher4 code
This is probably the uncontroversial part of #13631, which fixes
a real problem people are having.

There's still things to improve in our code after this is merged,
but it should stop the breakage that people have reported, where
we lie about a type always being aligned and then pass in stack
objects with no alignment requirement and hope for the best.

Of course, our SIMD code was written with unaligned accesses, so it
doesn't care if we drop this...but some auto-vectorized code that
gcc emits sure does, since we told it it can assume they're aligned.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14649
2023-03-24 10:29:19 -07:00
George Wilson 460d887c43 panic loop when removing slog device
There is a window in the slog removal code where a panic loop could
ensue if the system crashes during that operation. The original design
of slog removal did not persisted any state because the removal happened
synchronously. This was changed by a later commit which persisted the
vdev_removing flag and exposed this bug. If a slog removal is in
progress and happens to crash after persisting the vdev_removing flag to
the label but before the vdev is removed from the spa config, then the
pool will continue to panic on import. Here's a sample of the panic:

[  134.387411] VERIFY0(0 == dmu_buf_hold_array(os, object, offset, size,
FALSE, FTAG, &numbufs, &dbp)) failed (0 == 22)
[  134.393865] PANIC at dmu.c:1135:dmu_write()
[  134.396035] Kernel panic - not syncing: VERIFY0(0 ==
dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs,
&dbp)) failed (0 == 22)
[  134.397857] CPU: 2 PID: 5914 Comm: txg_sync Kdump: loaded Tainted:
P           OE     5.4.0-1100-dx2023020205-b3751f8c2-azure #106
[  134.407938] Hardware name: Microsoft Corporation Virtual
Machine/Virtual Machine, BIOS 090008  12/07/2018
[  134.407938] Call Trace:
[  134.407938]  dump_stack+0x57/0x6d
[  134.407938]  panic+0xfb/0x2d7
[  134.407938]  spl_panic+0xcf/0x102 [spl]
[  134.407938]  ? traverse_impl+0x1ca/0x420 [zfs]
[  134.407938]  ? dmu_object_alloc_impl+0x3b4/0x3c0 [zfs]
[  134.407938]  ? dnode_hold+0x1b/0x20 [zfs]
[  134.407938]  dmu_write+0xc3/0xd0 [zfs]
[  134.407938]  ? space_map_alloc+0x55/0x80 [zfs]
[  134.407938]  metaslab_sync+0x61a/0x830 [zfs]
[  134.407938]  ? queued_spin_unlock+0x9/0x10 [zfs]
[  134.407938]  vdev_sync+0x72/0x190 [zfs]
[  134.407938]  spa_sync_iterate_to_convergence+0x160/0x250 [zfs]
[  134.407938]  spa_sync+0x2f7/0x670 [zfs]
[  134.407938]  txg_sync_thread+0x22d/0x2d0 [zfs]
[  134.407938]  ? txg_dispatch_callbacks+0xf0/0xf0 [zfs]
[  134.407938]  thread_generic_wrapper+0x83/0xa0 [spl]
[  134.407938]  kthread+0x104/0x140
[  134.407938]  ? kasan_check_write.constprop.0+0x10/0x10 [spl]
[  134.407938]  ? kthread_park+0x90/0x90
[  134.457802]  ret_from_fork+0x1f/0x40

This change no longer persists the vdev_removing flag when removing slog
devices and also cleans up some code that was added which is not used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #14652
2023-03-24 10:27:07 -07:00
Tino Reichardt 2bd0490faf Add colored output to zfs list
Use a bold header row and colorize the AVAIL column based on
the used space percentage of volume.

We define these colors:
- when > 80%, use yellow
- when > 90%, use red

Reviewed-by: WHR <msl0000023508@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14621
Closes #14350
2023-03-24 10:24:11 -07:00
Tino Reichardt 7bde396aa2 Colorize zpool iostat output
Use a bold header and colorize the space suffixes in iostat
by order of magnitude like this:
- K is green
- M is yellow
- G is red
- T is lightblue
- P is magenta
- E is cyan
- 0 space is colored gray

Reviewed-by: WHR <msl0000023508@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14621
Closes #14459
2023-03-24 10:23:52 -07:00
Tino Reichardt 80f2cdcd67 Add more ANSI colors to libzfs
Reviewed-by: WHR <msl0000023508@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14621
2023-03-24 10:21:19 -07:00
Matthew Ahrens d2d4f8554f Fix prefetching of indirect blocks while destroying
When traversing a tree of block pointers (e.g. for `zfs destroy <fs>` or
`zfs send`), we prefetch the indirect blocks that will be needed, in
`traverse_prefetch_metadata()`.  In the case of `zfs destroy <fs>`, we
do a little traversing each txg, and resume the traversal the next txg.
So the indirect blocks that will be needed, and thus are candidates for
prefetching, does not include blocks that are before the resume point.

The problem is that the logic for determining if the indirect blocks are
before the resume point is incorrect, causing the (up to 1024) L1
indirect blocks that are inside the first L2 to not be prefetched.  In
practice, if we are able to read many more than 1024 blocks per txg,
then this will be inconsequential.  But if i/o latency is more than a
few milliseconds, almost no L1's will be prefetched, so they will be
read serially, and thus the destroying will be very slow.  This can be
observed as `zpool get freeing` decreasing very slowly.

Specifically: When we first examine the L2 that contains the block we'll
be resuming from, we have not yet resumed, so `td_resume` is nonzero.
At this point, all calls to `traverse_prefetch_metadata()` will fail,
even if the L1 in question is after the resume point.  It isn't until
the callback is issued for the resume point that we zero out
`td_resume`, but by this point we've already attempted and failed to
prefetch everything under this L2 indirect block.

This commit addresses the issue by reusing the existing
`resume_skip_check()` to determine if the L1's bookmark is before or
after the resume point.  To do so, this function is made non-mutating
(the caller now zeros `td_resume`).

Note, this bug likely predates (was not introduced by) #11803.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14603
2023-03-24 10:20:07 -07:00
Pawel Jakub Dawidek ce0e1cc402 Fix cloning into already dirty dbufs.
Undirty the dbuf and destroy its buffer when cloning into it.

Coverity ID: CID-1535375
Reported-by: Richard Yao
Reported-by: Benjamin Coddington
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14655
2023-03-24 10:18:35 -07:00
Rob N 5b5f518687 man: add ZIO_STAGE_BRT_FREE to zpool-events
And bump all the values after it, matching the header update in
67a1b037.

Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14665
2023-03-24 10:14:39 -07:00
Pawel Jakub Dawidek 9fa007d35d Fix build on FreeBSD
Constify some variables after d1807f168e.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #14656
2023-03-22 09:24:41 -07:00
Timothy Day 1eca40f3ad Fix kmodtool for packaging mainline Linux
kmodtool currently incorrectly identifies official
RHEL kernels, as opposed to custom kernels. This
can cause the openZFS kmod RPM build to break.

The issue can be reproduced by building a set of
mainline Linux RPMs, installing them, and then
attempting to build the openZFS kmod package
against them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Timothy Day <timday@amazon.com>
Closes #14617
2023-03-22 09:22:52 -07:00
Tino Reichardt 0f9e735414 Remove unused constant EdonR256_BLOCK_BITSIZE
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14650
2023-03-22 08:39:48 -07:00
Alexander Motin d520f64342 FreeBSD: Remove extra arc_reduce_target_size() call
Remove arc_reduce_target_size() call from arc_prune_task().  The idea
of arc_prune_task() is to remove external references on ARC metadata,
such as vnodes. Since arc_prune_async() is called only from ARC itself,
it makes no sense to create a parasitic loop between ARC eviction and
the pruning, treatening to drop ARC to its minimum.  I can't guess why
it was added as part of FreeBSD to OpenZFS integration.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14639
2023-03-17 17:31:08 -07:00
Richard Yao fa46802585 Fix possible bad bit shift in dnode_next_offset_level()
031d7c2fe6 did not handle reverse
iteration, such that the original issue theoretically could still occur.

Note that contrary to the claim in the ZFS disk format specification
that a maximum of 6 levels are possible, 9 levels are possible with
recordsize=512 and and indirect block size of 16KB. In this unusual
configuration, span will be 65. The maximum size of span at 70 can be
reached at recordsize=16K and an indirect blocksize of 16KB.

When we are at this indirection level and are traversing backward, the
minimum value is start, but we cannot calculate that with 64-bit
arithmetic, so we avoid the calculation and instead rely on the earlier
statement that did `*offset = start;`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-1466214)
Closes #14618
2023-03-16 14:27:49 -07:00
naivekun 60cfd3bbc2 QAT: Fix uninitialized seed in QAT compression
CpaDcRqResults have to be initialized with checksum=1 for adler32.
Otherwise when error CPA_DC_OVERFLOW occurred, the next compress
operation will continue on previously part-compressed data, and write
invalid checksum data. When zfs decompress the compressed data, a
invalid checksum will occurred and lead to #14463

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Weigang Li <weigang.li@intel.com>
Reviewed-by: Chengfei Zhu <chengfeix.zhu@intel.com>
Signed-off-by: naivekun <naivekun0817@gmail.com>
Closes #14632
Closes #14463
2023-03-16 11:54:10 -07:00
Tino Reichardt 480d809703 Refine some details for the github actions update
Set the retention-days variable to 14 days for these artifacts:
- the zloop error logs
- the zloop vdev files
- the compiled modules

Add the abality to re-run some part of the functional testings.
Fix some comments and remove the deleting of the modules artifact.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14637
2023-03-16 10:00:14 -07:00
Tino Reichardt c9c463a508 Add git repo checkout to testing workflow
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14634
2023-03-15 15:37:56 -07:00
Attila Fülöp 5f3611121d spl: cmn_err_once() should be usable in brace-less if else statements
Commit 11913870 (#14567) added cmn_err_once() by #define'ing a
compound statement but failed to consider usage in a single
statement brace-less if else.

Fix the problem by using the common "do {} while (0)" construct.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14629
2023-03-15 11:13:25 -07:00
Low-power c31bb934cd Fix building for powerpc* targets with some compilers
Under some configurations, GCC didn't predefined macro 'powerpc' for
such a target. Use the guaranteed macro '__powerpc__' instead.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #14631
2023-03-15 10:44:28 -07:00
Tino Reichardt b7bc334d13 Split functional testings via github action matrix
This commit changes the workflow of the github actions.

We split the workflow into different parts:

1) build zfs modules for Ubuntu 20.04 and 22.04 (~25m)
2) 2x zloop test (~10m) + 2x sanity test (~25m)
3) functional testings in parts 1..5 (each ~1h)
   - these could be triggered, when sanity tests are ok
   - currently I just start them all in the same time
4) cleanup and create summary

When everything is fine, the full run with all testings
should be done in around 2 hours.

The codeql.yml and checkstyle.yml are not part in this circle.

The testings are also modified a bit:
- report info about CPU and checksum benchmarks
- reset the debugging logs for each test
  - when some error occurred, we call dmesg with -c to get
    only the log output for the last failed test
  - we empty also the dbgsys

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14078
2023-03-15 10:41:05 -07:00
Alek P f55d6ee818 Improve tests and update man page for healing recv
Fix the manpage. The "SYNOPSIS" section is incorrectly formatted for 
receive -c.  I also took this opportunity to reword some parts and 
fix a run-on sentence in the manpage.

Add large block testing for corrective recv. This adds a new test 
that makes sure blocks generated using zfs send -L/--large-block 
large-block send flag are able to be used for healing.

Since with unloaded key and errlog feature enabled corruption is not 
shown in zpool status #13675 is fixed the zfs_receive_corrective.ksh 
test no longer sets -o feature@head_errlog=disabled on pool creation 
so that it can also test for regressions related to head_errlog feature.

Note that the zfs_receive_compressed_corrective.ksh and
zfs_receive_large_block_corrective.ksh tests are still creating pools 
with -o feature@head_errlog=disabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #14615
2023-03-15 10:34:10 -07:00
WHR d74f123045 Refactor CONFIG_SPE check on Linux/powerpc
Commit 5401472 adds a check to call enable_kernel_spe and
disable_kernel_spe only if CONFIG_SPE is defined. Refactor this check
in a way similar to what CONFIG_ALTIVEC and CONFIG_VSX are checked, in
order to remove redundant kfpu_begin() and kfpu_end() implementations.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #14623
2023-03-15 10:30:42 -07:00
WHR f31b0d4e88 Fix missing semicolons in commit 1f196e3
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #14623
2023-03-15 10:30:02 -07:00
ofthesun9 8b7342d290 Fix for mountpoint=legacy
We need to clear mountpoint only after checking it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: ofthesun9 <olivier@ofthesun.net>
Closes #14599
Closes #14604
2023-03-14 16:40:55 -07:00
Tino Reichardt fe6a7b787f Remove unused Edon-R variants
This commit removes the edonr_byteorder.h file and all unused
variants of Edon-R.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13618
2023-03-14 15:59:58 -07:00
Richard Yao dbfc622345 nvpair: Use flexible array member for nvpair name strings
Coverity reported possible out-of-bounds reads from doing `((char
*)(nvp) + sizeof (nvpair_t))` to get the nvpair name string. These were
initially marked as false positives, but since we are now using C99
flexible array members elsewhere, we could use them here too as cleanup
to make the code easier to understand.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-977165)
Reported-by: Coverity (CID-1524109)
Reported-by: Coverity (CID-1524642)
Closes #14612
2023-03-14 15:25:55 -07:00
Richard Yao d1807f168e nvpair: Constify string functions
After addressing coverity complaints involving `nvpair_name()`, the
compiler started complaining about dropping const. This lead to a rabbit
hole where not only `nvpair_name()` needed to be constified, but also
`nvpair_value_string()`, `fnvpair_value_string()` and a few other static
functions, plus variable pointers throughout the code. The result became
a fairly big change, so it has been split out into its own patch.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14612
2023-03-14 15:25:50 -07:00
Richard Yao 50f6934b9c discover_cached_paths() should not corrupt nvlist string value
discover_cached_paths() will write a NULL into a string from a nvlist to
use it as a substring, but does not restore it before return. This
corrupts the nvlist. It should be harmless unless the string is needed
again later, but we should not do this, so let us fix it.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14612
2023-03-14 15:25:45 -07:00
Richard Yao 47a7062772 zpool_valid_proplist() should not corrupt nvpair name string on error
The strings returned from parsing nvlists should be immutable, but to
simplify the code when we want a substring from it, we sometimes will
write a NULL into it and then restore the value afterward. Provided
there is no concurrent access, this is okay, unless we forget to restore
the value afterward. This was caught when constifying string functions
related to nvlists.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14612
2023-03-14 15:25:40 -07:00
Richard Yao 27ff18cd43 Fix possible NULL pointer dereference in nvlist_lookup_nvpair_ei_sep()
Clang's static analyzer complains about a possible NULL pointer
dereference in nvlist_lookup_nvpair_ei_sep() because it unconditionally
dereferences a pointer initialized by `nvpair_value_nvlist_array()`
under the assumption that `nvpair_value_nvlist_array()` will always
initialize the pointer without checking to see if an error was returned
to indicate otherwise. This itself is improper error handling, so we fix
it. However, fixing it to properly respond to errors is not enough to
avoid a NULL pointer dereference, since we can receive NULL when the
array is empty, so we also add a NULL check.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14612
2023-03-14 15:25:35 -07:00
Richard Yao 47b994049f Silence clang static analyzer warnings about stored stack addresses
Clang's static analyzer complains that nvs_xdr() and nvs_native()
functions return pointers to stack memory. That is technically true, but
the pointers are stored in stack memory from the caller's stack frame,
are not read by the caller and are deallocated when the caller returns,
so this is harmless. We set the pointers to NULL to silence the
warnings.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14612
2023-03-14 15:25:01 -07:00
Richard Yao 3cb293a6f8 Fix possible NULL pointer dereference in dbuf_verify()
Coverity reported a dereference after a NULL check in dbuf_verify(). If
`dn` is `NULL`, we can just assume that !dn->dn_free_txg, so we change
`!dn->dn_free_txg` to `(dn == NULL || !dn->dn_free_txg)`.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-992298)
Closes #14619
2023-03-14 15:00:54 -07:00
Tino Reichardt 3a03c96381 Replace dead opensolaris.org license links
The commit replaces all findings of the link:
http://www.opensolaris.org/os/licensing with this one:
https://opensource.org/licenses/CDDL-1.0

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: WHR <msl0000023508@gmail.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14625
2023-03-14 14:44:01 -07:00
Richard Yao 24e61911f0 Zero zio_prop_t in flush_write_batch_impl()
After 67a1b03791 was merged, coverity
started complaining about an uninitialized scalar variable in
flush_write_batch_impl() due to the new field zp.zp_brtwrite. Upon
inspection, it appears that uninitialized memory was being copied for
non-raw streams, so this is a pre-existing issue. The addition of
zp_brtwrite by the block cloning commit caused Coverity to begin to
notice it.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-1535378)
Closes #14607
2023-03-14 14:41:45 -07:00
Richard Yao 1c212d1b7c Fix uninitialized scalar value read regression in dmu_recv_begin()
da19d919a8 changed this in a way that
permits execution to reach `if (err == 0)` without initializing err.
This could randomly cause the sync task to not execute. We fix that by
initializing err to zero.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-1535377)
Closes #14607
2023-03-14 14:40:49 -07:00
Matthew Ahrens 519851122b ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()
`lseek(SEEK_DATA | SEEK_HOLE)` are only accurate when the on-disk blocks
reflect all writes, i.e. when there are no dirty data blocks.  To ensure
this, if the target dnode is dirty, they wait for the open txg to be
synced, so we can call them "stabilizing operations".  If they cause
txg_wait_synced often, it can be detrimental to performance.

Typically, a group of files are all modified, and then SEEK_DATA/HOLE
are performed on them.  In this case, the first SEEK does a
txg_wait_synced(), and subsequent SEEKs don't need to wait, so
performance is good.

However, if a workload involves an interleaved metadata modification,
the subsequent SEEK may do a txg_wait_synced() unnecessarily.  For
example, if we do a `read()` syscall to each file before we do its SEEK.
This applies even with `relatime=on`, when the `read()` is the first
read after the last write.  The txg_wait_synced() is unnecessary because
the SEEK operations only care that the structure of the tree of indirect
and data blocks is up to date on disk.  They don't care about metadata
like the contents of the bonus or spill blocks.  (They also don't care
if an existing data block is modified, but this would be more involved
to filter out.)

This commit changes the behavior of SEEK_DATA/HOLE operations such that
they do not call txg_wait_synced() if there is only a pending change to
the bonus or spill block.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #13368 
Issue #14594 
Issue #14512 
Issue #14009
2023-03-14 14:30:29 -07:00
Attila Fülöp 78289b8458 zcommon: Refactor FPU state handling in fletcher4
Currently calls to kfpu_begin() and kfpu_end() are split between
the init() and fini() functions of the particular SIMD
implementation. This was done in #14247 as an optimization measure
for the ABD adapter. Unfortunately the split complicates FPU
handling on platforms that use a local FPU state buffer, like
Windows and macOS.

To ease porting, we introduce a boolean struct member in
fletcher_4_ops_t, indicating use of the FPU, and move the FPU state
handling from the SIMD implementations to the call sites.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14600
2023-03-14 09:45:28 -07:00
George Melikov b15ab50c4d ABI files: sync with new Ubuntu release in CI
We may try to build ZFS inside container too,
but let's just sync them for now.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #14605
2023-03-10 16:23:01 -08:00
Pawel Jakub Dawidek 67a1b03791 Implementation of block cloning for ZFS
Block Cloning allows to manually clone a file (or a subset of its
blocks) into another (or the same) file by just creating additional
references to the data blocks without copying the data itself.
Those references are kept in the Block Reference Tables (BRTs).

The whole design of block cloning is documented in module/zfs/brt.c.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #13392
2023-03-10 11:59:53 -08:00
Paul Dagnelie da19d919a8 Fix incremental receive silently failing for recursive sends
The problem occurs because dmu_recv_begin pulls in the payload and 
next header from the input stream in order to use the contents of 
the begin record's nvlist. However, the change to do that before the 
other checks in dmu_recv_begin occur caused a regression where an 
empty send stream in a recursive send could have its END record 
consumed by this, which broke the logic of recv_skip. A test is 
also included to protect against this case in the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12661
Closes #14568
2023-03-10 09:52:44 -08:00
Low-power 589f59b52a Workaround for Linux PowerPC GPL-only cpu_has_feature()
Linux since 4.7 makes interface 'cpu_has_feature' to use jump labels on
powerpc if CONFIG_JUMP_LABEL_FEATURE_CHECKS is enabled, in this case
however the inline function references GPL-only symbol
'cpu_feature_keys'.

ZFS currently uses 'cpu_has_feature' either directly or indirectly from
several places; while it is unknown how this issue didn't break ZFS on
64-bit little-endian powerpc, it is known to break ZFS with many Linux
versions on both 32-bit and 64-bit big-endian powerpc.

Until this issue is fixed in Linux, we have to workaround it by
overriding affected inline functions without depending on
'cpu_feature_keys'.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #14590
2023-03-10 09:35:00 -08:00
Richard Yao 7316fdd1c0 txg_sync should handle write errors in ZIL
The txg_sync thread will see certain buffers in a DR_IN_DMU_SYNC state
when ZIL is writing them out. Then it waits until the state changes, but
has an assertion to check that they were not DR_NOT_OVERRIDDEN. If the
data write failed with an error, ZIL will put it into the
DR_NOT_OVERRIDDEN state. It looks like the code will handle that state
without an issue, so we can just delete the assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Closes #14283
2023-03-10 09:34:00 -08:00
Richard Yao 950980b4c4 Suppress clang static analyzer warning in vdev_stat_update()
63652e1546 added unnecessary branches in
`vdev_stat_update()` to suppress an ASAN false positive the breaks
ztest. This had the downside of causing false positive reports in both
Coverity and Clang's static analyzer. vd is never NULL, so we add a
preprocessor check to only apply the workaround when compiling with ASAN
support.

Reported-by: Coverity (CID-1524583)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:52:24 -08:00
Richard Yao 37edc7ea98 Refactor loop in dump_histogram()
The current loop triggers a complaint that we are using an array offset
prior to a range check from cpp/offset-use-before-range-check when we
are actually calculating maximum and minimum values. I was about to file
a false positive report with CodeQL, but after looking at how the code
is structured, I really cannot blame CodeQL for mistaking this for a
range check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:52:20 -08:00
Richard Yao 08641d9007 Suppress static analyzer warning in dmu_objset_create_impl_dnstats()
ae7e700650 added an assertion to suppress
a complaint from Clang's static analyzer. Unfortunately, it missed
another way for Clang to complain about this function. This adds another
assertion to handle that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:52:15 -08:00
Richard Yao 703283fabd Linux: Fix octal detection in define_ddi_strtox()
Clang Tidy reported this as a misc-redundant-expression because writing
`8` instead of `'8'` meant that the condition could never be true.

The only place where we have a chance of this being a bug would be in
nvlist_lookup_nvpair_ei_sep(). I am not sure if we ever pass an octal to
that, but if we ever do, it should work properly now instead of failing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:52:09 -08:00
Richard Yao 66a38fd10a Linux: Suppress clang static analyzer warning in zfs_remove()
Clang's static analyzer points out that if we fail to find an extended
attribute directory, but somehow find it when calculating delete_now and
delete_now is true, we will have a NULL pointer dereference when we try
to unlink the extended attribute directory.

I am not sure if this is possible, but if it is, I do not see a sane way
of handling this other than rolling back the transaction and retrying.
For now, let us do an VERIFY_IMPLY(). If this trips, it will stop the
transaction from committing, which will prevent an attribute directory
leak.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:52:04 -08:00
Richard Yao c2550a136e Linux: Silence static analyzer warning in crypto_create_ctx_template()
A CodeChecker report from Clang's CTU analysis indicated that we were
assigning uninitialized values in crypto_create_ctx_template() when we
call it from zio_crypt_key_init(). This occurs because the ->cm_param
and ->cm_param_len fields are uninitialized. Thankfully, the
uninitialized values are only used in the skein via
KCF_PROV_CREATE_CTX_TEMPLATE() -> skein_create_ctx_template() ->
skein_mac_ctx_build() -> skein_get_digest_bitlen(), but that should not
be called from here. We fix this to avoid a possible trap should this
code change in the future.

The FreeBSD version of zio_crypt_key_init() is unaffected.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:59 -08:00
Richard Yao 51f55742f6 Suppress Clang Static Analyzer warning in bpobj_enqueue()
scan-build does not do cross translation unit analysis to realize that
`dmu_buf_hold()` will always set `bpo->bpo_cached_dbuf` to a non-NULL
pointer, so we add an assertion to make it realize this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:55 -08:00
Richard Yao 45c446308a Suppress Clang Static Analyzer warning in dsl_dir_rename_sync()
Clang's static analyzer reports that if we try to rename a root dataset
in `dsl_dir_rename_sync()`, we will have a NULL pointer passed to
strlcpy(). This is impossible because `dsl_dir_rename_check()` will
prevent us from doing this. We add an assertion to silence this warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:50 -08:00
Richard Yao 17443e0b20 Cleanup: Remove constant comparisons reported by CodeQL
CodeQL's cpp/constant-comparison query from its security-and-extended
query set reported 4 instances where we have comparions that always
evaluate the same way.

In `draid_config_by_type()`, we have an early `if (nparity == 0)` check
that returns `EINVAL`, making a later `if (nparity == 0 || nparity >
VDEV_DRAID_MAXPARITY)` partially redundant. The later check prints an
error message when parity is 0, but the early check does not. This is
not useful feedback, so we move the later check to the place where the
early check runs to replace the early check.

In `perform_thread_merge()`, we return when `num_threads == 0`. After
that block, we do `if (num_threads > 0) {`, which will always be true.
We remove the `if` statement.

In `sa_modify_attrs()`, we have a loop condition that is `k != 2`, but
at the end of the loop, we have `if (k == 0 && hdl->sa_spill)` followed
by an else that does a break. The result is that k != 2 will never be
evaluated when it is false. We drop the comparison.

In `zap_leaf_array_read()`, we have a for loop condition that is `i <
ZAP_LEAF_ARRAY_BYTES && len > 0`. However, that loop itself is in a loop
that is `while (len > 0)` and while the value of len is decremented
inside the loop, when `len == 0`, it will return, such that `len > 0`
inside the loop condition will always be true. We drop that part of the
condition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:46 -08:00
Richard Yao 5dd0f019cd Linux cleanup: zvol_discard() should only call blk_queue_io_stat() once
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:40 -08:00
Richard Yao f9e109223b Suppress Clang Static Analyzer warning in dbuf_dnode_findbp()
Clang's static analyzer reports that if a `blkid == DMU_SPILL_BLKID` is
passed, then we can have a NULL pointer dereference when either
->dn_have_spill or `DNODE_FLAG_SPILL_BLKPTR` is not set. This should not
happen. We add an `ASSERT()` to suppress reports about NULL pointer
dereferences.

Originally, I wanted to use one or two IMPLY statements on
pre-conditions before the call to `dbuf_findbp()`, but Clang's static
analyzer did not understand it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:36 -08:00
Richard Yao 399bb81607 Suppress Clang Static Analyzer warning in vdev_split()
Clang's static analyzer pointed out that we can have a NULL pointer
dereference if we ever attempt to split a vdev that has only 1 child. If
that happens, we are left with zero children, but then try to access a
non-existent child. Calling vdev_split() on a vdev with only 1 child
should be impossible due to how the code is structured. If this ever
happens, it would be best to stop execution immediately even in a
production environment to allow for the best possible chance of recovery
by an expert, so we use `VERIFY3U()` instead of `ASSERT3U()`.

Unfortunately, while that defensive assertion will prevent execution
from ever reaching the NULL pointer dereference, Clang's static analyzer
does not realize that, so we add an `ASSERT()` to inform it of this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:31 -08:00
Richard Yao 0b831cabc6 Suppress Clang Static Analyzer warning about SNPRINTF_BLKPTR()
Clang's static analyzer pointed out that if we can pass a -1 array index
to copyname[copies] if there are no valid DVAs. This is an absurd
situation, but it suggests that we are missing an assertion, so we add
it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:26 -08:00
Richard Yao a4240a8ac7 Suppress Clang Static Analyzer false positive in the AVL tree code.
This has been filed as llvm/llvm-project#60694. Switching from a copy
through a C pointer dereference to an explicit memcpy() is a workaround
that prevents a false positive.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:21 -08:00
Richard Yao 8b72dfed11 Suppress Clang Static Analyzer defect report in abd_get_size()
Clang's static analyzer reports a possible NULL pointer dereference in
abd_get_size() when called from vdev_draid_map_alloc_write() called from
vdev_draid_map_alloc_row() and vdc->vdc_nparity == 0. This should be
impossible, so we add an assertion to silence the defect report.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:51:09 -08:00
Richard Yao 9368b3877c Fix TOCTOU race in zpool_do_labelclear()
Coverity reported a TOCTOU race in `zpool_do_labelclear()`. This is not
believed to be a real security issue, but fixing it reduces the number
of syscalls we do and will prevent other static analyzers from
complaining about this.

The code is expected to be equivalent. However, under rare
circumstances, such as ELOOP, ENAMETOOLONG, ENOMEM, ENOTDIR and
EOVERFLOW, we will display the error message that we currently display
for the `open()` syscall rather than the one that we currently display
for the `stat()` syscall. This is considered to be an improvement.

Reported-by: Coverity (CID-1524188)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14575
2023-03-08 13:50:51 -08:00
Alexander Motin a8d83e2a24 More adaptive ARC eviction
Traditionally ARC adaptation was limited to MRU/MFU distribution.  But
for years people with metadata-centric workload demanded mechanisms to
also manage data/metadata distribution, that in original ZFS was just
a FIFO.  As result ZFS effectively got separate states for data and
metadata, minimum and maximum metadata limits etc, but it all required
manual tuning, was not adaptive and in its heart remained a bad FIFO.

This change removes most of existing eviction logic, rewriting it from
scratch.  This makes MRU/MFU adaptation individual for data and meta-
data, same as the distribution between data and metadata themselves.
Since most of required states separation was already done, it only
required to make arcs_size state field specific per data/metadata.

The adaptation logic is still based on previous concept of ghost hits,
just now it balances ARC capacity between 4 states: MRU data, MRU
metadata, MFU data and MFU metadata.  To simplify arc_c changes instead
of arc_p measured in bytes, this code uses 3 variable arc_meta, arc_pd
and arc_pm, representing ARC balance between metadata and data, MRU and
MFU for data, and MRU and MFU for metadata respectively as 32-bit fixed
point fractions.  Since we care about the math result only when need to
evict, this moves all the logic from arc_adapt() to arc_evict(), that
reduces per-block overhead, since per-block operations are limited to
stats collection, now moved from arc_adapt() to arc_access() and using
cheaper wmsums.  This also allows to remove ugly ARC_HDR_DO_ADAPT flag
from many places.

This change also removes number of metadata specific tunables, part of
which were actually not functioning correctly, since not all metadata
are equal and some (like L2ARC headers) are not really evictable.
Instead it introduced single opaque knob zfs_arc_meta_balance, tuning
ARC's reaction on ghost hits, allowing administrator give more or less
preference to metadata without setting strict limits.

Some of old code parts like arc_evict_meta() are just removed, because
since introduction of ABD ARC they really make no sense: only headers
referenced by small number of buffers are not evictable, and they are
really not evictable no matter what this code do.  Instead just call
arc_prune_async() if too much metadata appear not evictable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14359
2023-03-08 11:17:23 -08:00
Attila Fülöp 8d9752569b ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx()
gmac_init_ctx() duplicates most of the code in gcm_int_ctx() while
it just needs to set its own IV length and AAD tag length.

Introduce gcm_init_ctx_impl() which handles the GCM and GMAC
differences while reusing the duplicated code.

While here, fix a flaw where the AVX implementation would accept a
context using a byte swapped key schedule which it could not
handle. Also constify the IV and AAD pointers passed to
gcm_init{,_avx}().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14529
2023-03-08 11:12:15 -08:00
Richard Yao 7d638df09b Do not hold spa_config in ZIL while blocked on IO
Otherwise, we can get a deadlock that looks like this:

1. fsync() grabs spa_config_enter(zilog->zl_spa, SCL_STATE, lwb,
RW_READER) as part of zil_lwb_write_issue() . It then blocks on the
txg_sync when a flush fails from a drive power cycling.

2. The txg_sync then blocks on the pool suspending due to the loss of
too many disks.

3. zpool clear then blocks on spa_config_enter(spa, SCL_STATE |
SCL_L2ARC | SCL_ZIO, spa, RW_WRITER)  because it is a writer.

The disks cannot be brought online due to fsync() holding that lock and
the user gets upset since fsync() is uninterruptibly blocked inside the
kernel.

We need to grab the lock for vdev_lookup_top(), but we do not need to
hold it while there is outstanding IO.

This fixes a regression introduced by
1ce23dcaff.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Closes #14519
2023-03-07 16:12:28 -08:00
Low-power 1f196e3107 Fix build for Linux/powerpc without CONFIG_ALTIVEC or CONFIG_VSX
This fixes building ZFS for Linux 4.7+ powerpc* architecture, where 
Linux was configured without CONFIG_ALTIVEC or CONFIG_VSX.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #14591
2023-03-07 14:06:52 -08:00
Rob N b988f32c70 Better handling for future crypto parameters
The intent is that this is like ENOTSUP, but specifically for when
something can't be done because we have no support for the requested
crypto parameters; eg unlocking a dataset or receiving a stream
encrypted with a suite we don't support.

Its not intended to be recoverable without upgrading ZFS itself.
If the request could be made to work by enabling a feature or modifying
some other configuration item, then some other code should be used.

load-key: In the future we might have more crypto suites (ie new values
for the `encryption` property. Right now trying to load a key on such
a future crypto suite will look up suite parameters off the end of the
crypto table, resulting in misbehaviour and/or crashes (or, with debug
enabled, trip the assertion in `zio_crypt_key_unwrap`).

Instead, lets check the value we got from the dataset, and if we can't
handle it, abort early.

recv: When receiving a raw stream encrypted with an unknown crypto
suite, `zfs recv` would report a generic `invalid backup stream`
(EINVAL). While technically correct, its not super helpful, so lets
ship a more specific error code and message.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14577
2023-03-07 14:05:14 -08:00
George Amanakis 12a240ac0b Fix a typo in ac2038a
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: George Amanakis <gamanakis@gmail.com> 
Closes #14585
Closes #14592
2023-03-07 13:50:44 -08:00
Andriy Gapon a55254be7a [FreeBSD] fix false assert in cache_vop_rmdir when replaying ZIL
The assert is enabled when DEBUG_VFS_LOCKS kernel option is set.
The exact panic is:
    panic: condition seqc_in_modify(_vp->v_seqc) not met
It happens because seqc protocol is not followed for ZIL replay.

But we actually do not need to make any namecache calls at that stage,
because the namecache use is not enabled until after the replay is
completed.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #14566
2023-03-07 13:48:43 -08:00
Attila Fülöp 1191387012 spl: Add cmn_err_once() to log a message only on the first call
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14567
2023-03-07 13:44:11 -08:00
q66 4628bb9c60 initramfs: fix zpool get argument order
When using the zfs initramfs scripts on my system, I get various
errors at initramfs stage, such as:

cannot open '-o': name must begin with a letter

My zfs binaries are compiled with musl libc, which may be why
this happens. In any case, fix the argument order to make the
zpool binary happy, and to match its --help output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Closes #14572
2023-03-06 17:07:01 -08:00
Tino Reichardt 84a1c48c86 Fix detection of IBM Power8 machines (ISA 2.07)
An IBM POWER7 system with Power ISA 2.06 tried to execute
zfs_sha256_power8() - which should only be run on ISA 2.07
machines.

The detection is implemented via the zfs_isa207_available() call,
but this check was not used.

This pull request will fix this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Low-power <msl0000023508@gmail.com>
Closes #14576
2023-03-06 17:01:01 -08:00
Andriy Gapon 28bf26acb6 [FreeBSD] zfs_znode_alloc: lock the vnode earlier
This is needed because of a possible error path where zfs_vnode_forget()
is called.  That function calls vgone() and vput(), the former requires
the vnode to be exclusively locked and the latter expects it to be
locked.

It should be safe to lock the vnode as early as possible because it is
not yet visible, so there is no interaction with other locks.

While here, remove a tautological assignment to 'vp'.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #14565
2023-03-06 16:30:54 -08:00
George Amanakis ca9e32d3a7 Optimize the is_l2cacheable functions
by placing the most common use case (no special vdevs) first and avoid
allocating new variables.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14494 
Closes #14563
2023-03-06 16:13:05 -08:00
Richard Yao bc4d210783 Fix memory leak in ztest
This is tripping LeakSanitizer, which causes zloop test failures on pull
requests.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14583
2023-03-06 15:30:29 -08:00
Richard Yao b79e7114bb Add missing increment to dsl_deadlist_move_bpobj()
dc5c8006f6 was recently merged to prefetch
up to 128 deadlists. Unfortunately, a loop was missing an increment,
such that it will prefetch all deadlists. The performance properties of
that patch probably should be re-evaluated.

This was caught by CodeQL's cpp/constant-comparison check in an
experimental branch where I am testing the security-and-extended
queries. It complained about the `i < 128` part of the loop condition
always evaluating to the same thing. The standard CodeQL configuration
we use missed this because it does not include that check.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14573
2023-03-06 15:28:26 -08:00
Richard Yao 8846139b45 SHA2Init() should use signed assertions when checking an enum
The recent 4c5fec01a4 commit caused
Coverity to report that ASSERT3U(algotype, >=, SHA256_MECH_INFO_TYPE);
is always true. That is because the signed algotype and signed
SHA256_MECH_INFO_TYPE values were cast to unsigned types. To fix this,
we switch the assertions to use ASSERT3S(), which retains the signedness
of the original values for the comparison.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-1535300)
Closes #14573
2023-03-06 15:26:43 -08:00
Jorgen Lundman 47119d60ef Restore ASMABI and other Unify work
Make sure all SHA2 transform function has wrappers

For ASMABI to work, it is required the calling convention
is consistent.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Joergen Lundman <lundman@lundman.net>
Closes #14569
2023-03-06 15:24:05 -08:00
Tino Reichardt 620a977f22 Use SECTION_STATIC macro for sha2 x86_64 assembly
- instead of ".section .rodata" we should use SECTION_STATIC

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13741
2023-03-02 13:52:33 -08:00
Tino Reichardt f9f9bef22f Update BLAKE3 for using the new impl handling
This commit changes the BLAKE3 implementation handling and
also the calls to it from the ztest command.

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13741
2023-03-02 13:52:27 -08:00
Tino Reichardt 4c5fec01a4 Add generic implementation handling and SHA2 impl
The skeleton file module/icp/include/generic_impl.c can be used for
iterating over different implementations of algorithms.

It is used by SHA256, SHA512 and BLAKE3 currently.

The Solaris SHA2 implementation got replaced with a version which is
based on public domain code of cppcrypto v0.10.

These assembly files are taken from current openssl master:
- sha256-x86_64.S: x64, SSSE3, AVX, AVX2, SHA-NI (x86_64)
- sha512-x86_64.S: x64, AVX, AVX2 (x86_64)
- sha256-armv7.S: ARMv7, NEON, ARMv8-CE (arm)
- sha512-armv7.S: ARMv7, NEON (arm)
- sha256-armv8.S: ARMv7, NEON, ARMv8-CE (aarch64)
- sha512-armv8.S: ARMv7, ARMv8-CE (aarch64)
- sha256-ppc.S: Generic PPC64 LE/BE (ppc64)
- sha512-ppc.S: Generic PPC64 LE/BE (ppc64)
- sha256-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64)
- sha512-p8.S: Power8 ISA Version 2.07 LE/BE (ppc64)

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13741
2023-03-02 13:52:21 -08:00
Tino Reichardt ac678c8eee Add SHA2 SIMD feature tests for libspl
These are added via HWCAP interface:
- zfs_neon_available() for arm and aarch64
- zfs_sha256_available() for arm and aarch64
- zfs_sha512_available() for aarch64

This one via cpuid() call:
- zfs_shani_available() for x86_64

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13741
2023-03-02 13:52:15 -08:00
Tino Reichardt cef1253135 Add SHA2 SIMD feature tests for Linux
These are added:
- zfs_neon_available() for arm and aarch64
- zfs_sha256_available() for arm and aarch64
- zfs_sha512_available() for aarch64
- zfs_shani_available() for x86_64

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Co-Authored-By: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes #13741
2023-03-02 13:52:04 -08:00
Tino Reichardt 589143c225 Add SHA2 SIMD feature tests for FreeBSD
These are added:
- zfs_neon_available() for arm and aarch64
- zfs_sha256_available() for arm and aarch64
- zfs_sha512_available() for aarch64
- zfs_shani_available() for x86_64

Changes:
- simd_powerpc.h: change license from CDDL to BSD

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13741
2023-03-02 13:51:56 -08:00
Tino Reichardt 6723d1110f Add ARM architecture to OpenZFS buildsystem
Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13741
2023-03-02 13:51:50 -08:00
Tino Reichardt 3e254aaad0 Remove old or redundant SHA2 files
We had three sha2.h headers in different places.
The FreeBSD version, the Linux version and the generic solaris version.

The only assembly used for acceleration was some old x86-64 openssl
implementation for sha256 within the icp module.

For FreeBSD the whole SHA2 files of FreeBSD were copied into OpenZFS,
these files got removed also.

Tested-by: Rich Ercolani <rincebrain@gmail.com>
Tested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13741
2023-03-02 13:50:21 -08:00
Rob N 163f3d3a1f zdb: add decryption support
The approach is straightforward: for dataset ops, if a key was offered,
find the encryption root and the various encryption parameters, derive a
wrapping key if necessary, and then unlock the encryption root. After
that all the regular dataset ops will return unencrypted data, and
that's kinda the whole thing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #11551
Closes #12707
Closes #14503
2023-03-02 13:39:09 -08:00
Alexander Motin 5f42d1dbf2 System-wide speculative prefetch limit.
With some pathological access patterns it is possible to make ZFS
accumulate almost unlimited amount of speculative prefetch ZIOs.
Combined with linear ABD allocations in RAIDZ code, it appears to
be possible to exhaust system KVA, triggering kernel panic.

Address this by introducing a system-wide counter of active prefetch
requests and blocking prefetch distance doubling per stream hits if
the number of active requests is higher that ~6% of ARC size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14516
2023-03-01 15:27:40 -08:00
Brian Behlendorf cd560c4474 Accommodate debug-kernel stack frame size
The blk_queue_discard() and blk_queue_sector_erase() functions
slightly exceed the allowed 4096 maximum stack frame size when
building with the RedHat debug kernel which causes their
configure checks to fail.

Add an exception for these two tests so the interfaces are
correctly detected.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14540
2023-03-01 14:40:45 -08:00
Richard Yao 4c856fb333 Fix data race between zil_commit() and zil_suspend()
openzfsonwindows/openzfs#206 found that it is possible to trip
`VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed
by the scheduler long enough for a parallel `zil_suspend()` operation to
exit `zil_commit_impl()`. This is a data race. To prevent this, we
introduce a `zilog->zl_suspend_lock` rwlock to ensure that all
outstanding `zil_commit()` operations finish before `zil_suspend()`
begins and that subsequent operations fallback to `txg_wait_synced()`
after `zil_suspend()` has begun.

On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers
from writer starvation. This means that a ZIL intensive system can delay
`zil_suspend()` indefinitely. This is a pre-existing problem that
affects everything that uses rw locks, so it needs to be addressed in
the SPL.  However, builds against `PREEMPT_RT` Linux kernels are
currently broken due to a GPL symbol issue (#11097), so we can safely
disregard that issue for now.

Reported-by: Arun KV <arun.kv@datacore.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14514
2023-03-01 13:23:09 -08:00
Richard Yao 9fd5fedc67 Remove bad kmem_free() oversight from previous zfsdev_state_list patch
I forgot to remove the corresponding kmem_free() from zfs_kmod_fini() in
9a14ce43c3. Clang's static analyzer did
not complain, but the Coverity scan that was run after the patch was
merged did.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-1535275)
Closes #14556
2023-03-01 13:20:53 -08:00
Richard Yao dd108f5d73 Linux: zfs_fillpage() should handle partial pages from end of file
After 89cd2197b9 was merged, Clang's
static analyzer began complaining about a dead assignment in
`zfs_fillpage()`. Upon inspection, I noticed that the dead assignment
was because we are not using the calculated io_len that we should use to
avoid asking the DMU to read past the end of a file. This should result
in `dmu_buf_hold_array_by_dnode()` calling `zfs_panic_recover()`.

This issue predates 89cd2197b9, but its
simplification of zfs_fillpage() eliminated the only use of the
assignment to io_len, which made Clang's static analyzer complain about
the issue.

Also, as a precaution, we add an assertion that io_offset < i_size. If
this ever fails, bad things will happen. Otherwise, we are blindly
trusting the kernel not to give us invalid offsets. We continue to
blindly trust it on non-debug kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14534
2023-03-01 13:19:47 -08:00
Tino Reichardt 0f9ed58f43 Update reclaim disk space script
The script uses systemd-run, which does the job in background.
We should take the the time and wait for the job to finish.

Maybe some functional tests suffer from not really freed disk
space and fail because of this.

We also add some trimming in the end of the script.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14554
2023-03-01 12:25:09 -08:00
Richard Yao 3a7c35119e Handle unexpected errors in zil_lwb_commit() without ASSERT()
We tripped `ASSERT(error == ENOENT || error == EEXIST || error ==
EALREADY)` in `zil_lwb_commit()` at Klara when doing robustness testing
of ZIL against drive power cycles.

That assertion presumably exists because when this code was written, the
only errors expected from here were EIO, ENOENT, EEXIST and EALREADY,
with EIO having its own handling before the assertion. However, upon
doing a manual depth first search traversal of the source tree, it turns
out that a large number of unexpected errors are possible here. In
theory, EINVAL and ENOSPC can come from dnode_hold_impl(). However, most
unexpected errors originate in the block layer and come to us from
zio_wait() in various ways. One way is ->zl_get_data() -> dmu_buf_hold()
-> dbuf_read() -> zio_wait().

From vdev_disk.c on Linux alone, zio_wait() can return the unexpected
errors ENXIO, ENOTSUP, EOPNOTSUPP, ETIMEDOUT, ENOSPC, ENOLINK,
EREMOTEIO, EBADE, ENODATA, EILSEQ and ENOMEM

This was only observed after what have been likely over 1000 test
iterations, so we do not expect to reproduce this again to find out what
the error code was. However, circumstantial evidence suggests that the
error was ENXIO.

When ENXIO or any other unexpected error occurs, the `fsync()` or
equivalent operation that called zil_commit() will return success, when
in fact, dirty data has not been committed to stable storage. This is a
violation of the Single UNIX Specification.

The code should be able to handle this and any other unknown error by
calling `txg_wait_synced()`. In addition to changing the code to call
txg_wait_synced() on unexpected errors instead of returning, we modify
it to print information about unexpected errors to dmesg.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Closes #14532
2023-03-01 09:39:41 -08:00
Rob N 25c4d1f032 mancheck: exclude dotfiles in man dir
Its not uncommon for an editor to drop a hidden swap file in the dir
while editing a file there. mancheck would find it and run mandoc on it,
which would complain about its distinctly not-manpage format.

A more correct solution might be to reconfigure the editor to not put
swap files in the same dir, but its the default a lot of the time, and
this is a very small change that gives a very nice quality-of-life
improvement.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14549
2023-03-01 09:38:31 -08:00
Rob N 4fe9cc5437 man: note that zdb operates directly on pool storage
A frequent misunderstanding is that zdb accesses the pool through the
kernel or filesystem, leading to confusion particularly when it can't
access something that it seems like it should be able to.

I've seen this confusion recently when zdb couldn't access a pool because
the user didn't have permission to read directly from the block devices,
and when it couldn't show attributes of encrypted files even though the
dataset was unlocked.

The manpage already speaks to another symptom of this, namely that zdb
may "behave erratically" on an active pool; here I'm trying to make that
a little more explicit.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14539
2023-02-28 17:35:33 -08:00
Allan Jude cbd76b4032 Makefile.bsd: cleanup and sync with FreeBSD
xxhash.c was not being compiled, so when FreeBSD's kernel
switched to a newer version of ZSTD a few weeks ago, out-of-tree ZFS
failed to build

Sync module/Makefile.bsd with FreeBSD's sys/modules/zfs/Makefile

And restore the alphabetical sort in a number of places

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Klara, Inc.
Closes #14508
2023-02-28 17:33:59 -08:00
Richard Yao ae7e700650 Suppress static analyzer warning in dmu_objset_create_impl_dnstats()
Clang's static analyzer claims that dereferencing ds in
dmu_objset_create_impl_dnstats() could cause a NULL pointer dereference
when a previous NULL check confirms that it is NULL. It is only NULL on
the MOS, for which dmu_objset_userused_enabled(os) should always return
false, so ds will never be dereferenced when it is NULL. We add an
assertion to suppress this warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14470
2023-02-28 17:31:58 -08:00
Richard Yao 5cc4950901 Suppress static analyzer warning in dbuf_hold_copy()
Clang's static analyzer claims that dbuf_hold_copy() will have a NULL
pointer dereference in data->b_data when called by dbuf_hold_impl().
This is impossible because data is dr->dt.dl.dr_data, which is non-NULL
whenever db->db_level == 0, which is always the case whenever
dbuf_hold_impl() calls dbuf_hold_copy(). We add an assertion to suppress
the complaint.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14470
2023-02-28 17:31:50 -08:00
Richard Yao 9a14ce43c3 Statically allocate first node of zfsdev_state_list
This avoids a call to kmem_alloc() during module load. It also
suppresses a defect report from Clang's static analyzer that claims that
we will have a NULL pointer dereference in zfsdev_state_init() because
it does not understand that this has already been allocated in
zfs_kmod_init().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14470
2023-02-28 17:31:41 -08:00
Richard Yao 7fc48f8378 Suppress static analyzer warning in sa_attr_iter()
Clang's static analyzer points out that when IS_SA_BONUSTYPE(type) is
true and .sa_length is 0 for an attribute, we have a NULL pointer
dereference. We suppress this with an IMPLY() statement.

This was also identified by Coverity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-1017954)
Closes #14470
2023-02-28 17:31:30 -08:00
Richard Yao 4d9bb5514c Suppress static analyzer warnings in zio_checksum_error_impl()
Clang's static analyzer informs us of multiple NULL pointer dereferences
involving zio_checksum_error_impl().

The first is a NULL pointer dereference if bp is NULL and ci->ci_flags &
ZCHECKSUM_FLAG_EMBEDDED is false, but bp is NULL implies that
ci->ci_flags & ZCHECKSUM_FLAG_EMBEDDED is true, so we add an IMPLY()
statement to suppress the report.

The second and third are identical, and are duplicated because while the
NULL pointer dereference occurs in zio_checksum_gang_verifier(), it is
called by zio_checksum_error_impl() and there is a report for each of
the two functions. The reports state that when bp is NULL, ci->ci_flags
& ZCHECKSUM_FLAG_EMBEDDED is true and checksum is not
ZIO_CHECKSUM_LABEL, we also have a NULL pointer dereference. bp is NULL
should imply that checksum == ZIO_CHECKSUM_LABEL, so we add an IMPLY()
statement to suppress the second report. The two reports are
functionally identical.

A fourth variation of this was also reported by Coverity. It occurs when
checksum == ZIO_CHECKSUM_ZILOG2.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Coverity (CID-1524672)
Closes #14470
2023-02-28 17:31:08 -08:00
Richard Yao d634d20d1b icp: Prevent compilers from optimizing away memset() in gcm_clear_ctx()
The recently merged f58e513f74 was
intended to zero sensitive data before exit from encryption
functions to harden the code against theoretical information
leaks. Unfortunately, the method by which it did that is
optimized away by the compiler, so some information still leaks. This
was confirmed by counting function calls in disassembly.

After studying how the OpenBSD, FreeBSD and Linux kernels handle this,
and looking at our disassembly, I decided on a two-factor approach to
protect us from compiler dead store elimination passes.

The first factor is to stop trying to inline gcm_clear_ctx(). GCC does
not actually inline it in the first place, and testing suggests that
dead store elimination passes appear to become more powerful in a bad
way when inlining is forced, so we recognize that and move
gcm_clear_ctx() to a C file.

The second factor is to implement an explicit_memset() function based on
the technique used by `secure_zero_memory()` in FreeBSD's blake2
implementation, which coincidentally is functionally identical to the
one used by Linux. The source for this appears to be a LLVM bug:

https://llvm.org/bugs/show_bug.cgi?id=15495

Unlike both FreeBSD and Linux, we explicitly avoid the inline keyword,
based on my observations that GCC's dead store elimination pass becomes
more powerful when inlining is forced, under the assumption that it will
be equally powerful when the compiler does decide to inline function
calls.

Disassembly of GCC's output confirms that all 6 memset() calls are
executed with this patch applied.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14544
2023-02-28 17:28:50 -08:00
Richard Yao 2f76797ad9 Linux: Assert mutex is held in mutex_exit()
A spurious mutex_exit() in a development branch caused weird issues
until I identified it. An assertion prior to mutex_exit() would have
caught it. Rather than adding assertions before invocations of
mutex_exit() in the code, let us simply add an assertion to
mutex_exit(). It is cheap and will likely improve developer
productivity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Closes #14541
2023-02-28 17:27:20 -08:00
George Amanakis 13ff72ba0a Revert zfeature_active() to static
Commit 34ce4c4 made zfeature_active() non-static. This is not required.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14546
2023-02-28 14:03:52 -08:00
Tino Reichardt 727339b118 Fix github build failures because of vdevprops.7
This small fix adds the manpage vdevprops.7 to the file
contrib/debian/openzfs-zfsutils.install and the github
actions will work again.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14553
2023-02-28 14:02:48 -08:00
John Poduska 73c383f541 Prevent incorrect datasets being mounted
During a mount, zpl_mount_impl(), uses sget() with the callback
zpl_test_super() to find a super_block with a matching objset,
stored in z_os. It does so without taking the teardown lock on
the zfsvfs.

The problem is that operations like rollback will replace the
z_os. And, there is a window where the objset in the rollback
is freed, but z_os still points to it. Then, a mount like
operation, for instance a clone, can reallocate that exact same
pointer and zpl_test_super() will then match the super_block
associated with the rollback as opposed to the clone.

This fix tests for a match and if so, takes the teardown lock
before doing the final match test.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #14518
2023-02-27 16:49:34 -08:00
D. Ebdrup 4b9bc6345e Add vdevprops.7 to the Makefile
Adding vdevprops.7 to the Makefile ensures that it gets installed
properly on FreeBSD.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Daniel Ebdrup Jensen <debdrup@FreeBSD.org>
Closes #14527
2023-02-27 16:38:09 -08:00
Richard Yao bff26b0220 Skip memory allocation when compressing holes
Hole detection in the zio compression code allows us to
opportunistically skip compression on holes. We can go a step further
by not doing memory allocations on holes either.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Sponsored-by: Wasabi Technology, Inc.
Closes #14500
2023-02-27 14:41:02 -08:00
Attila Fülöp f58e513f74 ICP: AES-GCM: Refactor gcm_clear_ctx()
Currently the temporary buffer in which decryption takes place
isn't cleared on context destruction. Further in some routines we
fail to call gcm_clear_ctx() on error exit. Both flaws may result
in leaking sensitive data.

We follow best practices and zero out the plaintext buffer before
freeing the memory holding it. Also move all cleanup into
gcm_clear_ctx() and call it on any context destruction.

The performance impact should be negligible.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14528
2023-02-27 14:38:12 -08:00
Mariusz Zaborski 3b9309aabe Move zap_attribute_t to the heap in dsl_deadlist_merge
In the case of a regular compilation, the compiler
raises a warning for a dsl_deadlist_merge function, that
the stack size is to large. In debug build this can
generate an error.

Move large structures to heap.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #14524
2023-02-27 14:27:58 -08:00
Brian Behlendorf 000985fc15 Workaround GitHub Action failure
Ubuntu 20.04 and 22.04 workflows are failing due to an error
which is hit when running `apt-get update`.  Until the
problematic package is fixed apply the suggested workaround
described here:

  https://github.com/orgs/community/discussions/47863

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14530
2023-02-27 09:19:25 -08:00
Dimitry Andric bf1bec394e Use .section .rodata instead of .rodata on FreeBSD
In commit 0a5b942d4 the FreeBSD SECTION_STATIC macro was set to
".rodata". This assembler directive is supported by LLVM (as a
convenience alias for ".section .rodata") by not by GNU as.

This caused the FreeBSD builds that are done with gcc to fail.
Therefore, use ".section .rodata" instead, similar to the other
asm_linkage.h headers.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Dimitry Andric <dimitry@andric.com>
Closes #14526
2023-02-24 16:45:48 -08:00
George Amanakis d816bc5ec7 Move dmu_buf_rele() after dsl_dataset_sync_done()
Otherwise the dataset may be freed after the last dmu_buf_rele() leading
to a panic.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14522
Closes #14523
2023-02-23 18:14:52 -07:00
Brian Behlendorf 6109d83df8 ZTS: Minor fixes
- The migration_012_pos.ksh test case was failing because of a
  missing space after `log_must`.

- None of the tests listed in the runfiles should include the .ksh
  suffix.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14515
2023-02-23 17:10:46 -08:00
Brian Behlendorf 89cd2197b9 Fix buffered/direct/mmap I/O race
When a page is faulted in for memory mapped I/O the page lock
may be dropped before it has been read and marked up to date.
If a buffered read encounters such a page in mappedread() it
must wait until the page has been updated. Failure to do so
will result in a panic on debug builds and incorrect data on
production builds.

The critical part of this change is in mappedread() where pages
which are not up to date are now handled. Additionally, it
includes the following simplifications.

- zfs_getpage() and zfs_fillpage() could be passed an array of
  pages. This could be more efficient if it was used but in
  practice only a single page was ever provided. These
  interfaces were simplified to acknowledge that.

- update_pages() was modified to correctly set the PG_error bit
  on a page when it cannot be read by dmu_read().

- Setting PG_error and PG_uptodate was moved to zfs_fillpage()
  from zpl_readpage_common(). This is consistent with the
  handling in update_pages() and mappedread().

- Minor additional refactoring to comments and variable
  declarations to improve readability.

- Add a test case to exercise concurrent buffered, direct,
  and mmap IO to the same file.

- Reduce the mmap_sync test case default run time.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13608 
Closes #14498
2023-02-23 10:57:24 -08:00
Richard Yao 7cb67d627c Fix NULL pointer dereference in zio_ready()
Clang's static analyzer correctly identified a NULL pointer dereference
in zio_ready() when ZIO_FLAG_NODATA has been set on a zio that is
missing a block pointer. The NULL pointer dereference occurs because we
have logic intended to disable ZIO_FLAG_NODATA when it has been set on a
gang block.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14469
2023-02-23 10:19:08 -08:00
Richard Yao c9e39da9a4 Use rw_tryupgrade() in dmu_bonus_hold_by_dnode()
When dn->dn_bonus == NULL, dmu_bonus_hold_by_dnode() will unlock its
read lock on dn->dn_struct_rwlock and grab a write lock. This can be
micro-optimized by calling rw_tryupgrade().

Linux will not benefit from this since it does not support rwlock
upgrades, but FreeBSD will.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14517
2023-02-22 16:33:23 -08:00
Paul Dagnelie d9e64a4030 Improve error message of zfs redact
We improve the error message of zfs redact by checking if the target 
snapshot exists, and if all the redaction snapshots exist. As a
future improvement we could iterate over every snapshot provided and 
use that to determine which one specifically doesn't exist.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11426 
Closes #14496
2023-02-21 17:30:05 -08:00
rob-wing 28251d81d7 FreeBSD: don't verify recycled vnode for zfs control directory
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes #14501
2023-02-21 17:26:33 -08:00
Allan Jude 1d56c6d017 Fix per-jail zfs.mount_snapshot setting
When jail.conf set the nopersist flag during startup, it was
incorrectly destroying the per-jail ZFS settings.

Reported-by: Martin Matuska <mm@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Modirum MDPay
Sponsored-by: Klara, Inc.
Closes #14509
2023-02-21 17:23:01 -08:00
George Amanakis 0f32b1f728 Partially revert eee9362a7
With commit 34ce4c42f applied, there is no need for eee9362a7.
Revert that aside from the test. All tests introduced in those commits
pass.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14502
2023-02-21 09:36:22 -08:00
Richard Yao 9f08b6e31f Sync thread should avoid holding the spa config write lock when possible
spa_sync() currently grabs the write lock due to an old hack that is
documented by a comment:

    We need the write lock here because, for aux vdevs,
    calling vdev_config_dirty() modifies sav_config.
    This is ugly and will become unnecessary when we
    eliminate the aux vdev wart by integrating all vdevs
    into the root vdev tree.
 
This has lead to deadlocks in rare edge cases from holding the write 
lock. We can reduce incidence of these deadlocks by not grabbing the 
write lock on pools without auxillary vdevs.

Sponsored-By: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Closes #14282
2023-02-16 14:10:52 -08:00
Paul Dagnelie dc72c60ec1 zfs redact fails when dnodesize=auto
Add handling to dmu_object_next for the case where *objectp == 0.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #14479
2023-02-16 09:23:39 -08:00
Brian Behlendorf 57cfae4a2f zdb: zero-pad checksum output follow up
Apply zero padding for checksums consistently.  The SNPRINTF_BLKPTR
macro was not updated in commit ac7648179c which results in the
`cli_root/zdb/zdb_checksum.ksh` test case reliably failing.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14497
2023-02-15 09:06:29 -08:00
Richard Yao f04cb31e7c Suppress Clang static analyzer complaint in zfs_replay_create()
Clang's static analyzer incorrectly complains about an undefined value
here when lr->lr_common.lrc_txtype == TX_SYMLINK and txtype ==
TX_CREATE. This is impossible, because of this line:

txtype = (lr->lr_common.lrc_txtype & ~TX_CI((uint64_t)0x1 << 63));

Changing the code to compare against txtype suppresses the report.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14472
2023-02-14 11:05:41 -08:00
Brian Behlendorf 3fc92adc40 Linux: use filemap_range_has_page()
As of the 4.13 kernel filemap_range_has_page() can be used to
check if there is a page mapped in a given file range.  When
available this interface should be used which eliminates the
need for the zp->z_is_mapped boolean.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14493
2023-02-14 11:04:34 -08:00
Richard Yao ab672133a9 Give strlcat() full buffer lengths rather than smaller buffer lengths
strlcat() is supposed to be given the length of the destination buffer,
including the existing contents. Unfortunately, I had been overzealous
when I wrote a51288aabb, since I gave it
the length of the destination buffer, minus the existing contents. This
likely caused a regression on large strings.

On the topic of being overzealous, the use of strlcat() in
dmu_send_estimate_fast() was unnecessary because recv_clone_name is a
fixed length string. We continue using strlcat() mostly as defensive
programming, in case the string length is ever changed, even though it
is unnecessary.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14476
2023-02-14 11:03:42 -08:00
Rich Ercolani cfd57573ff quick fix for lingering snapdir unmount problems
Unfortunately, even after e79b6807, I still, much more rarely,
tripped asserts when playing with many ctldir mounts at once.

Since this appears to happen if we dispatched twice too fast, just
ignore it. We don't actually need to do anything if someone already
started doing it for us.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #14462
2023-02-13 16:40:13 -08:00
George Amanakis 34ce4c42ff Fix a race condition in dsl_dataset_sync() when activating features
The zio returned from arc_write() in dmu_objset_sync() uses
zio_nowait(). However we may reach the end of dsl_dataset_sync()
which checks if we need to activate features in the filesystem
without knowing if that zio has even run through the ZIO pipeline yet.
In that case we will flag features to be activated in
dsl_dataset_block_born() but dsl_dataset_sync() has already
completed its run and those features will not actually be activated.
Mitigate this by moving the feature activation code in
dsl_dataset_sync_done(). Also add new ASSERTs in
dsl_scan_visitbp() checking if a block contradicts any filesystem
flags.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13816
2023-02-13 16:37:46 -08:00
Prakash Surya 13312e2fa1 Reduce need for contiguous memory for ioctls
We've had cases where we trigger an OOM despite having memory freely
available on the system. For example, here, we had about 21GB free:

    kernel: Node 0 Normal: 2418758*4kB (UME) 1549533*8kB (UE) 0*16kB
    0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
    22071296kB

The problem being, all the memory is in 4K and 8K contiguous regions,
but the allocation request was for a 16K contiguous region:

    kernel: SafeExecutors-4 invoked oom-killer:
    gfp_mask=0x42dc0(GFP_KERNEL|__GFP_NOWARN|__GFP_COMP|__GFP_ZERO),
    order=2, oom_score_adj=0

The offending allocation came from this call trace:

    kernel: Call Trace:
    kernel:  dump_stack+0x57/0x7a
    kernel:  dump_header+0x4f/0x1e1
    kernel:  oom_kill_process.cold.33+0xb/0x10
    kernel:  out_of_memory+0x1ad/0x490
    kernel:  __alloc_pages_slowpath+0xd55/0xe40
    kernel:  __alloc_pages_nodemask+0x2df/0x330
    kernel:  kmalloc_large_node+0x42/0x90
    kernel:  __kmalloc_node+0x25a/0x320
    kernel:  ? spl_kmem_free_impl+0x21/0x30 [spl]
    kernel:  spl_kmem_alloc_impl+0xa5/0x100 [spl]
    kernel:  spl_kmem_zalloc+0x19/0x20 [spl]
    kernel:  zfsdev_ioctl+0x2b/0xe0 [zfs]
    kernel:  do_vfs_ioctl+0xa9/0x640
    kernel:  ? __audit_syscall_entry+0xdd/0x130
    kernel:  ksys_ioctl+0x67/0x90
    kernel:  __x64_sys_ioctl+0x1a/0x20
    kernel:  do_syscall_64+0x5e/0x200
    kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    kernel: RIP: 0033:0x7fdca3674317

The problem is, for each ioctl that ZFS makes, it has to allocate a
zfs_cmd_t structure, which is 13744 bytes in size (on my system):

    sdb> sizeof zfs_cmd
    (size_t)13744

This size, coupled with the fact that we currently allocate it with
kmem_zalloc, means we need a 16K contiguous region of memory to satisfy
the request.

The solution taken by this change, is to use "vmem" instead of "kmem" to
do the allocation, such that we don't necessarily need a contiguous 16K
memory region to satisfy the allocation.

Arguably, a better solution would be not to require such a large
allocation to begin with (e.g. reduce the size of the zfs_cmd_t
structure), but that'd be a much larger change than this "one liner".
Thus, I've opted for this approach for now; we can always circle back
and attempt to reduce the size of the structure in the future.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #14474
2023-02-13 16:35:59 -08:00
Alexander Motin 87a4dfa561 Improve arc_read() error reporting
Debugging reported NULL de-reference panic in dnode_hold_impl() I found
that for certain types of errors arc_read() may only return error code,
but not properly report it via done and pio arguments.  Lack of done
calls may result in reference and/or memory leaks in higher level code.
Lack of error reporting via pio may result in unnoticed errors there.
For example, dbuf_read(), where dbuf_read_impl() ignores arc_read()
return, relies completely on the pio mechanism and missed the errors.

This patch makes arc_read() to always call done callback and always
propagate errors to parent zio, if either is provided.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14454
2023-02-13 13:21:53 -08:00
Andreas Vögele 7883ea2234 rpm: Use libtirpc-devel and /usr/lib on SUSE
SUSE Linux distributions require libtirpc-devel.  The dracut and udev
directories are /usr/lib/dracut and /usr/lib/udev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Andreas Vögele <andreas@andreasvoegele.com>
Closes #14467
Closes #14468
2023-02-09 11:57:50 -08:00
Rob N ★ ac7648179c zdb: zero-pad checksum output
The leading zeroes are part of the checksum so we should show them.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #14464
2023-02-07 13:48:22 -08:00
Ryan Moeller eb823cbc76 initramfs: Make mountpoint=none work
In initramfs, mount.zfs fails to mount a dataset with mountpoint=none,
but mount.zfs -o zfsutil works.  Use -o zfsutil when mountpoint=none.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #14455
2023-02-06 11:16:01 -08:00
Richard Yao 66953686c0 Add assertion and make variables unsigned in abd_alloc_chunks()
Clang's static analyzer pointed out that if alloc_pages >= nr_pages
before the loop, the value of page will be undefined and will be used
anyway. This should not be possible, but as cleanup, we add an
assertion. We also recognize that the local variables should be unsigned
in the first place, so we make them unsigned. This is not enough to
avoid the need for the assertion, since there is still the case that
alloc_pages == nr_pages and nr_pages == 0, which the assertion
implicitly checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14456
2023-02-06 11:10:50 -08:00
Richard Yao cfb49616cd Cleanup: spa vdev processing should check NULL pointers
The PVS Studio 2016 FreeBSD kernel report stated:

\contrib\opensolaris\uts\common\fs\zfs\spa.c (1341): error V595: The 'spa->spa_spares.sav_vdevs' pointer was utilized before it was verified against nullptr. Check lines: 1341, 1342.
\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\spa.c (1355): error V595: The 'spa->spa_l2cache.sav_vdevs' pointer was utilized before it was verified against nullptr. Check lines: 1355, 1357.
\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\spa.c (1398): error V595: The 'spa->spa_spares.sav_vdevs' pointer was utilized before it was verified against nullptr. Check lines: 1398, 1408.
\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\spa.c (1583): error V595: The 'oldvdevs' pointer was utilized before it was verified against nullptr. Check lines: 1583, 1595.

In practice, all of these uses were safe because a NULL pointer
implied a 0 vdev count, which kept us from iterating over vdevs.
However, rearranging the code to check the pointer first is not a
terrible micro-optimization and makes it more readable, so let us
do that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14456
2023-02-06 11:09:28 -08:00
Richard Yao 3a7d2a0ce0 zfs_get_temporary_prop() should not pass NULL to strcpy()
`dsl_dir_activity_in_progress()` can call `zfs_get_temporary_prop()` with
the forth value set to NULL, which will pass NULL to `strcpy()` when
there is a match

Clang's static analyzer caught this with the help of CodeChecker for
Cross Translation Unit analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14456
2023-02-06 11:08:57 -08:00
Matthew Ahrens 14872aaa4f EIO caused by encryption + recursive gang
Encrypted blocks can not have 3 DVAs, because they use the space of the
3rd DVA for the IV+salt.  zio_write_gang_block() takes this into
account, setting `gbh_copies` to no more than 2 in this case.  Gang
members BP's do not have the X (encrypted) bit set (nor do they have the
DMU level and type fields set), because encryption is not handled at
this level.  The gang block is reassembled, and then encryption (and
compression) are handled.

To check if this gang block is encrypted, the code in
zio_write_gang_block() checks `pio->io_bp`.  This is normally fine,
because the block that's being ganged is typically the encrypted BP.

The problem is that if there is "recursive ganging", where a gang member
is itself a gang block, then when zio_write_gang_block() is called to
create a gang block for a gang member, `pio->io_bp` is the gang member's
BP, which doesn't have the X bit set, so the number of DVA's is not
restricted to 2.  It should instead be looking at the the "gang leader",
i.e. the top-level gang block, to determine how many DVA's can be used,
to avoid a "NDVA's inversion" (where a child has more DVA's than its
parent).

gang leader BP: X (encrypted) bit set, 2 DVA's, IV+salt in 3rd DVA's
space:
```
DVA[0]=<1:...:100400> DVA[1]=<0:...:100400> salt=... iv=...
[L0 ZFS plain file] fletcher4 uncompressed encrypted LE
gang unique double size=100000L/100000P birth=... fill=1 cksum=...
```

leader's GBH contains a BP with gang bit set and 3 DVA's:
```
DVA[0]=<1:...:55600> DVA[1]=<0:...:55600>
[L0 unallocated] fletcher4 uncompressed unencrypted LE
contiguous unique double size=55600L/55600P birth=... fill=0 cksum=...

DVA[0]=<1:...:55600> DVA[1]=<0:...:55600>
[L0 unallocated] fletcher4 uncompressed unencrypted LE
contiguous unique double size=55600L/55600P birth=... fill=0 cksum=...

DVA[0]=<1:...:55600> DVA[1]=<0:...:55600> DVA[2]=<1:...:200>
[L0 unallocated] fletcher4 uncompressed unencrypted LE
gang unique double size=55400L/55400P birth=... fill=0 cksum=...
```

On nondebug bits, having the 3rd DVA in the gang block works for the
most part, because it's true that all 3 DVA's are available in the gang
member BP (in the GBH).  However, for accounting purposes, gang block
DVA's ASIZE include all the space allocated below them, i.e. the
512-byte gang block header (GBH) as well as the gang members below that.
We see that above where the gang leader BP is 1MB logical (and after
compression: 0x`100000P`), but the ASIZE of each DVA is 2 sectors (1KB)
more than 1MB (0x`100400`).

Since thre are 3 copies of a block below it, we increment the ATIME of
the 3rd DVA of the gang leader by the space used by the 3rd DVA of the
child (1 sector, in this case).  But there isn't really a 3rd DVA of the
parent; the salt is stored in place of the 3rd DVA's ASIZE.

So when zio_write_gang_member_ready() increments the parent's BP's
`DVA[2]`'s ASIZE, it's actually incrementing the parent's salt.  When we
later try to read the encrypted recursively-ganged block, the salt
doesn't match what we used to write it, so MAC verification fails and we
get an EIO.

```
zio_encrypt():  encrypted 515/2/0/403 salt: 25 25 bb 9d ad d6 cd 89
zio_decrypt(): decrypting 515/2/0/403 salt: 26 25 bb 9d ad d6 cd 89
```

This commit addresses the problem by not increasing the number of copies
of the GBH beyond 2 (even for non-encrypted blocks).  This simplifies
the logic while maintaining the ability to traverse all metadata
(including gang blocks) even if one copy is lost.  (Note that 3 copies
of the GBH will still be created if requested, e.g. for `copies=3` or
MOS blocks.)  Additionally, the code that increments the parent's DVA's
ASIZE is made to check the parent DVA's NDVAS even on nondebug bits.  So
if there's a similar bug in the future, it will cause a panic when
trying to write, rather than corrupting the parent BP and causing an
error when reading.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Caused-by: #14356
Closes #14440
Closes #14413
2023-02-06 09:37:06 -08:00
Jorgen Lundman 0a5b942d4a Restore FreeBSD to use .rodata
In https://github.com/openzfs/zfs/pull/14228 the FreeBSD
SECTION_STATIC was set to ".data" instead of ".rodata". This
commit just restores it back to .rodata.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #14460
2023-02-06 09:34:59 -08:00
Jorgen Lundman dffd40b3b6 Unify assembly files with macOS
The remaining changes needed to make the assembly files work
with macOS.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #14451
2023-02-06 09:27:55 -08:00
Reno Reckling 6017fd9377 Fix variable shadowing in libzfs_mount
We accidentally reused variable name "i" for inner and outer loops.

Reviewed-by: Rich Ercolani <Rincebrain@gmail.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Reno Reckling <e-github@wthack.de>
Closes #14452 
Closes #14445
2023-02-02 15:22:12 -08:00
Paul Dagnelie 90f4e01f8a Prevent error messages when running tests with no timeout
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #14450
2023-02-02 15:19:26 -08:00
George Amanakis ac2038a19c Teach zdb about DMU_OT_ERROR_LOG objects
With the persistent error log feature we need to account for
spa_errlog_{scrub, last} containing mappings to other error log objects,
which need to be marked as in-use as well.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14442 
Closes #14434
2023-02-02 15:17:37 -08:00
rob-wing 326f1e3d88 zfs_main.c: fix unused variable error with GCC
zfs_setproctitle_init() is stubbed out on FreeBSD.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Rob Wing <rob.fx907@gmail.com>
Closes #14441
2023-02-02 15:16:40 -08:00
Allan Jude c799866b97 Resolve WS-2021-0184 vulnerability in zstd
Pull in d40f55cd950919d7eac951b122668e55e33e5202 from upstream

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14439
2023-02-02 15:12:51 -08:00
George Wilson f18e083bf8 rootdelay on zfs should be adaptive
The 'rootdelay' boot option currently pauses the boot for a specified
amount of time. The original intent was to ensure that slower
configurations would have ample time to enumerate the devices to make
importing the root pool successful. This, however, causes unnecessary
boot delay for environments like Azure which set this parameter by
default.

This commit changes the initramfs logic to pause until it can
successfully load the 'zfs' module. The timeout specified by
'rootdelay' now becomes the maximum amount of time that initramfs will
wait before failing the boot.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #14430
2023-02-02 15:11:35 -08:00
Ameer Hamza 05b72415d1 Fix console progress reporting for recursive send
After commit 19d3961, progress reporting (-v) with replication flag
enabled does not report the progress on the console. This commit
fixes the issue by updating the logic to check for pa->progress
instead of pa_verbosity in send_progress_thread().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14448
2023-02-02 15:09:57 -08:00
Brian Behlendorf 973934b965 Increase default zfs_rebuild_vdev_limit to 64MB
When testing distributed rebuild performance with more capable
hardware it was observed than increasing the zfs_rebuild_vdev_limit
to 64M reduced the rebuild time by 17%.  Beyond 64MB there was
some improvement (~2%) but it was not significant when weighed
against the increased memory usage. Memory usage is capped at 1/4
of arc_c_max.

Additionally, vr_bytes_inflight_max has been moved so it's updated
per-metaslab to allow the size to be adjust while a rebuild is
running.

Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14428
2023-01-27 10:02:24 -08:00
Brian Behlendorf c0aea7cf4e Increase default zfs_scan_vdev_limit to 16MB
For HDD based pools the default zfs_scan_vdev_limit of 4M
per-vdev can significantly limit the maximum scrub performance.
Increasing the default to 16M can double the scrub speed from
80 MB/s per disk to 160 MB/s per disk.

This does increase the memory footprint during scrub/resilver
but given the performance win this is a reasonable trade off.
Memory usage is capped at 1/4 of arc_c_max.  Note that number
of outstanding I/Os has not changed and is still limited by
zfs_vdev_scrub_max_active.

Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14428
2023-01-27 10:01:13 -08:00
Alexander Motin dc5c8006f6 Prefetch on deadlists merge
During snapshot deletion ZFS may issue several reads for each deadlist
to merge them into next snapshot's or pool's bpobj.  Number of the dead
lists increases with number of snapshots.  On HDD pools it may take
significant time during which sync thread is blocked.

This patch introduces prescient prefetch of required blocks for up to
128 deadlists ahead.  Tests show reduction of time required to delete
dataset with 720 snapshots with randomly overwritten file on wide HDD
pool from 75-85 to 22-28 seconds.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Issue #14276 
Closes #14402
2023-01-25 11:30:24 -08:00
Brian Behlendorf c85ac731a0 Improve resilver ETAs
When resilvering the estimated time remaining is calculated using
the average issue rate over the current pass.  Where the current
pass starts when a scan was started, or restarted, if the pool
was exported/imported.

For dRAID pools in particular this can result in wildly optimistic
estimates since the issue rate will be very high while scanning
when non-degraded regions of the pool are scanned.  Once repair
I/O starts being issued performance drops to a realistic number
but the estimated performance is still significantly skewed.

To address this we redefine a pass such that it starts after a
scanning phase completes so the issue rate is more reflective of
recent performance.  Additionally, the zfs_scan_report_txgs
module option can be set to reset the pass statistics more often.

Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14410
2023-01-25 11:28:54 -08:00
Coleman Kane 9cd71c8604 linux 6.2 compat: zpl_set_acl arg2 is now struct dentry
Linux 6.2 changes the second argument of the set_acl operation to be a
"struct dentry *" rather than a "struct inode *". The inode* parameter
is still available as dentry->d_inode, so adjust the call to the _impl
function call to dereference and pass that pointer to it.

Also document that the get_acl -> get_inode_acl member name change from
commit 884a693 was an API change also introduced in Linux 6.2.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #14415
2023-01-24 11:20:50 -08:00
Alexander Motin 0f740a4f1d Introduce minimal ZIL block commit delay
Despite all optimizations, tests on actual hardware show that FreeBSD
kernel can't sleep for less then ~2us.  Similar tests on Linux show
~50us delay at least from nanosleep() (haven't tested inside kernel).
It means that on very fast log device ZIL may not be able to satisfy
zfs_commit_timeout_pct block commit timeout, increasing log latency
more than desired.

Handle that by introduction of zil_min_commit_timeout parameter,
specifying minimal timeout value where additional delays to aggregate
writes may be skipped.  Also skip delays if the LWB is more than 7/8
full, that often happens if I/O sizes are constant and match one of
LWB sizes.  Both things are applied only if there were no already
outstanding log blocks, that may indicate single-threaded workload,
that by definition can not benefit from the commit delays.

While there, add short time moving average to zl_last_lwb_latency to
make it more stable.

Tests of single-threaded 4KB writes to NVDIMM SLOG on FreeBSD show IOPS
increase by 9% instead of expected 5%.  For zfs_commit_timeout_pct of
1 there IOPS increase by 5.5% instead of expected 1%.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14418
2023-01-24 09:20:32 -08:00
Attila Fülöp 037e4f2536 x86 asm: Replace .align with .balign
The .align directive used to align storage locations is
ambiguous. On some platforms and assemblers it takes a byte count,
on others the argument is interpreted as a shift value. The current
usage expects the first interpretation.

Replace it with the unambiguous .balign directive which always
expects a byte count, regardless of platform and assembler.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14422
2023-01-24 09:04:39 -08:00
Attila Fülöp 58ca7b1011 IPC: blake3 x86 asm: fix placement of .size directives
The .size directive used by the SET_SIZE C macro uses the special
dot symbol to calculate the size of a function. The dot symbol
refers to the current address, so for the calculation to be
meaningful the SET_SIZE macro must be placed immediately after the
end of the function the size is calculated for.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #14422
2023-01-24 09:03:31 -08:00
David Hedberg 37a27b4306 Wait for txg sync if the last DRR_FREEOBJECTS might result in a hole
If we receive a DRR_FREEOBJECTS as the first entry in an object range,
this might end up producing a hole if the freed objects were the
only existing objects in the block.

If the txg starts syncing before we've processed any following
DRR_OBJECT records, this leads to a possible race where the backing
arc_buf_t gets its psize set to 0 in the arc_write_ready() callback
while still being referenced from a dirty record in the open txg.

To prevent this, we insert a txg_wait_synced call if the first
record in the range was a DRR_FREEOBJECTS that actually
resulted in one or more freed objects.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: David Hedberg <david.hedberg@findity.com>
Sponsored by: Findity AB
Closes #11893
Closes #14358
2023-01-23 13:19:43 -08:00
Richard Yao 73968defdd Reject streams that set ->drr_payloadlen to unreasonably large values
In the zstream code, Coverity reported:

"The argument could be controlled by an attacker, who could invoke the
function with arbitrary values (for example, a very high or negative
buffer size)."

It did not report this in the kernel. This is likely because the
userspace code stored this in an int before passing it into the
allocator, while the kernel code stored it in a uint32_t.

However, this did reveal a potentially real problem. On 32-bit systems
and systems with only 4GB of physical memory or less in general, it is
possible to pass a large enough value that the system will hang. Even
worse, on Linux systems, the kernel memory allocator is not able to
support allocations up to the maximum 4GB allocation size that this
allows.

This had already been limited in userspace to 64MB by
`ZFS_SENDRECV_MAX_NVLIST`, but we need a hard limit in the kernel to
protect systems. After some discussion, we settle on 256MB as a hard
upper limit. Attempting to receive a stream that requires more memory
than that will result in E2BIG being returned to user space.

Reported-by: Coverity (CID-1529836)
Reported-by: Coverity (CID-1529837)
Reported-by: Coverity (CID-1529838)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14285
2023-01-23 13:16:22 -08:00
rob-wing 69f024a56e Configure zed's diagnosis engine with vdev properties
Introduce four new vdev properties:
    checksum_n
    checksum_t
    io_n
    io_t

These properties can be used for configuring the thresholds of zed's
diagnosis engine and are interpeted as <N> events in T <seconds>.

When this property is set to a non-default value on a top-level vdev,
those thresholds will also apply to its leaf vdevs. This behavior can be
overridden by explicitly setting the property on the leaf vdev.

Note that, these properties do not persist across vdev replacement. For
this reason, it is advisable to set the property on the top-level vdev
instead of the leaf vdev.

The default values for zed's diagnosis engine (10 events, 600 seconds)
remains unchanged.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Sponsored-by: Seagate Technology LLC
Closes #13805
2023-01-23 13:14:25 -08:00
Richard Yao f091db9248 free_blocks(): Fix reports from 2016 PVS Studio FreeBSD report
In 2016, the authors of PVS Studio ran it on the FreeBSD kernel, which
identified a number of bugs / cleanup opportunities in the FreeBSD ZFS kernel
code. A few of them persist to the present day:

https://reviews.freebsd.org/D5245

Note that the scan was done against
freebsd/freebsd-src@46763fd4ca.

In particular, we have the following in free_blocks():

\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\dnode_sync.c (174): error V547: Expression '__left >= __right' is always true. Unsigned type value is always >= 0.
\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\dnode_sync.c (171): error V634: The priority of the '*' operation is higher than that of the '<<' operation. It's possible that parentheses should be used in the expression.
\sys\cddl\contrib\opensolaris\uts\common\fs\zfs\dnode_sync.c (175): error V547: Expression '__left >= __right' is always true. Unsigned type value is always >= 0.

A couple of assertions accidentally typecast the arguments they check to
unsigned in such a way that the result is always true. Also, parentheses
are missing around `1<<epbs` in `(db->db_blkid * 1<<epbs)`. This works
out to be okay due to multiplication not caring what order of operations
we use, but it is better to fix it to be `(db->db_blkid << epbs)`.

A few of the function local variables probably never should have been
32-bit in the first place, so we make them 64-bit. We also replace the
existing assertions with additional assertions to ensure that 64-bit
unsigned arithmetic is safe.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14407
2023-01-23 13:12:37 -08:00
Chunwei Chen 71974946be Fix reading uninitialized variable in receive_read
When zfs_file_read returns error, resid may be uninitialized.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #14404
2023-01-20 11:49:56 -08:00
Allan Jude 63267f7f77 zfs_receive_one: Check for the more likely error first
If zfs_receive_one() gets back EINVAL, check for the more likely case,
embedded block pointers + encryption and return that error, before
falling back to the less likely case, a resumable stream when the
kernel has not been upgraded to support resume.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: rsync.net
Sponsored-by: Klara Inc.
Closes #14379
2023-01-20 11:11:54 -08:00
Richard Yao 856cefcd1c Cleanup ->dd_space_towrite should be unsigned
This is only ever used with unsigned data, so the type itself should be
unsigned. Also, PVS Studio's 2016 FreeBSD kernel report correctly
identified the following assertion as always being true, so we can drop
it:

ASSERT3U(dd->dd_space_towrite[i & TXG_MASK], >=, 0);

The reason it was always true is because it would do casts to give us
unsigned comparisons. This could have been fixed by switching to
`ASSERT3S()`, but upon inspection, it turned out that this variable
never should have been allowed to be signed in the first place.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14408
2023-01-20 11:10:15 -08:00
Mark Johnston ebabb93e6c Micro-optimize dsl_prop_get_dd()
Use the saved property index instead of looking it up once per DSL
directory when traversing up towards the root.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #14397
2023-01-20 11:01:41 -08:00
Mark Johnston 7c30100c00 Avoid passing an uninitialized index to dsl_prop_known_index
Reported-by: KMSAN
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #14397
2023-01-20 11:00:38 -08:00
Chunwei Chen c6dab6dd39 Fix unprotected zfs_znode_dmu_fini
In original code, zfs_znode_dmu_fini is called in zfs_rmnode without
zfs_znode_hold_enter. It seems to assume it's ok to do so when the znode
is unlinked. However this assumption is not correct, as zfs_zget can be
called by NFS through zpl_fh_to_dentry as pointed out by Christian in
https://github.com/openzfs/zfs/pull/12767, which could result in a
use-after-free bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12767 
Closes #14364
2023-01-19 16:59:05 -08:00
George Melikov a379083d9f Man: fix defaults for zfs_dirty_data_max_max
It was changed in e99932f7de,
but without docs update.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #14400
2023-01-17 13:12:29 -08:00
Jorgen Lundman 68c0771cc9 Unify Assembler files between Linux and Windows
Add new macro ASMABI used by Windows to change
calling API to "sysv_abi".

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #14228
2023-01-17 11:09:19 -08:00
Ameer Hamza 19d3961589 Use setproctitle to report progress of zfs send
This allows parsing of zfs send progress by checking the process
title.
Doing so requires some changes to the send code in libzfs_sendrecv.c;
primarily these changes move some of the accounting around, to allow
for the code to be verbose as normal, or set the process title. Unlike
BSD, setproctitle() isn't standard in Linux; thus, borrowed it from
libbsd with slight modifications.

Authored-by: Sean Eric Fagan <sef@FreeBSD.org>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14376
2023-01-17 10:17:35 -08:00
Richard Yao 2e7f664f04 Cleanup of dead code suggested by Clang Static Analyzer (#14380)
I recently gained the ability to run Clang's static analyzer on the
linux kernel modules via a few hacks. This extended coverage to code
that was previously missed since Clang's static analyzer only looked at
code that we built in userspace. Running it against the Linux kernel
modules built from my local branch produced a total of 72 reports
against my local branch. Of those, 50 were reports of logic errors and
22 were reports of dead code. Since we already had cleaned up all of
the previous dead code reports, I felt it would be a good next step to
clean up these dead code reports. Clang did a further breakdown of the
dead code reports into:

Dead assignment	15

Dead increment	2

Dead nested assignment	5

The benefit of cleaning these up, especially in the case of dead nested
assignment, is that they can expose places where our error handling is
incorrect. A number of them were fairly straight forward. However
several were not:

In vdev_disk_physio_completion(), not only were we not using the return
value from the static function vdev_disk_dio_put(), but nothing used it,
so I changed it to return void and removed the existing (void) cast in
the other area where we call it in addition to no longer storing it to a
stack value.

In FSE_createDTable(), the function is dead code. Its helper function
FSE_freeDTable() is also dead code, as are the CPP definitions in
`module/zstd/include/zstd_compat_wrapper.h`. We just delete it all.

In zfs_zevent_wait(), we have an optimization opportunity. cv_wait_sig()
returns 0 if there are waiting signals and 1 if there are none. The
Linux SPL version literally returns `signal_pending(current) ? 0 : 1)`
and FreeBSD implements the same semantics, we can just do
`!cv_wait_sig()` in place of `signal_pending(current)` to avoid
unnecessarily calling it again.

zfs_setattr() on FreeBSD version did not have error handling issue
because the code was removed entirely from FreeBSD version. The error is
from updating the attribute directory's files. After some thought, I
decided to propapage errors on it to userspace.

In zfs_secpolicy_tmp_snapshot(), we ignore a lack of permission from the
first check in favor of checking three other permissions. I assume this
is intentional.

In zfs_create_fs(), the return value of zap_update() was not checked
despite setting an important version number. I see no backward
compatibility reason to permit failures, so we add an assertion to catch
failures. Interestingly, Linux is still using ASSERT(error == 0) from
OpenSolaris while FreeBSD has switched to the improved ASSERT0(error)
from illumos, although illumos has yet to adopt it here. ASSERT(error ==
0) was used on Linux while ASSERT0(error) was used on FreeBSD since the
entire file needs conversion and that should be the subject of
another patch.

dnode_move()'s issue was caused by us not having implemented
POINTER_IS_VALID() on Linux. We have a stub in
`include/os/linux/spl/sys/kmem_cache.h` for it, when it really should be
in `include/os/linux/spl/sys/kmem.h` to be consistent with
Illumos/OpenSolaris. FreeBSD put both `POINTER_IS_VALID()` and
`POINTER_INVALIDATE()` in `include/os/freebsd/spl/sys/kmem.h`, so we
copy what it did.

Whenever a report was in platform-specific code, I checked the FreeBSD
version to see if it also applied to FreeBSD, but it was only relevant a
few times.

Lastly, the patch that enabled Clang's static analyzer to be run on the
Linux kernel modules needs more work before it can be put into a PR. I
plan to do that in the future as part of the on-going static analysis
work that I am doing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14380
2023-01-17 09:57:12 -08:00
Brian Behlendorf 60f86a2bfa ZTS: Annotate additonal flaky test cases
Update several flaky test cases in zts-report.py.in until they
can be made entirely reliable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14392
2023-01-17 09:53:53 -08:00
Brian Behlendorf 07281f7d13 CI: Reclaim space after package operations
Rather than reclaiming space before updating the packages do
it afterwards.  This avoids issues with apt returning an
error due to missing files on the system.

This commit includes a revert for 6320b9e6.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14387
2023-01-17 09:52:27 -08:00
Rob Wing 7a85f58db6 zpool-set: print error message when pool or vdev is not valid
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Sponsored-by: Seagate Technology
Submitted-by: Klara, Inc.
Closes #14310
2023-01-17 09:47:24 -08:00
Rob Wing a0276f7048 zpool-set: update usage text
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Sponsored-by: Seagate Technology
Submitted-by: Klara, Inc.
Closes #14310
2023-01-17 09:46:05 -08:00
Richard Yao d27c7ba62f Linux ppc64le ieee128 compat: Do not redefine __asm on external headers
There is an external assembly declaration extension in GNU C that glibc
uses when building with ieee128 floating point support on ppc64le.
Marking that as volatile makes no sense, so the build breaks.

It does not make sense to only mark this as volatile on Linux, since if
do not want the compiler reordering things on Linux, we do not want the
compiler reordering things on any other platform, so we stop treating
Linux specially and just manually inline the CPP macro so that we can
eliminate it. This should fix the build on ppc64le.

Tested-by: @gyakovlev 
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14308
Closes #14384
2023-01-13 10:58:58 -08:00
Richard Yao 4ef69de384 Cleanup: Use NULL when doing NULL pointer comparisons
The Linux 5.16.14 kernel's coccicheck caught this. The semantic
patch that caught it was:

./scripts/coccinelle/null/badzero.cocci

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 16:00:37 -08:00
Richard Yao 64195fc89f Cleanup: Remove unneeded semicolons
The Linux 5.16.14 kernel's coccicheck caught this. The semantic
patch that caught it was:

./scripts/coccinelle/misc/semicolon.cocci

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 16:00:30 -08:00
Richard Yao 3b2f9c1ec8 Cleanup: Use MIN() macro
The Linux 5.16.14 kernel's coccicheck caught this. The semantic
patch that caught it was:

./scripts/coccinelle/misc/minmax.cocci

There was a third opportunity to use `MIN()`, but that was in
`FSE_minTableLog()` in `module/zstd/lib/compress/fse_compress.c`.
Upstream zstd has yet to make this change and I did not want to change
header includes just for MIN, or do a one off, so I left it alone.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 16:00:23 -08:00
Richard Yao e6328fda2e Cleanup: !A || A && B is equivalent to !A || B
In zfs_zaccess_dataset_check(), we have the following subexpression:

(!IS_DEVVP(ZTOV(zp)) ||
    (IS_DEVVP(ZTOV(zp)) && (v4_mode & WRITE_MASK_ATTRS)))

When !IS_DEVVP(ZTOV(zp)) is false, IS_DEVVP(ZTOV(zp)) is true under the
law of the excluded middle since we are not doing pseudoboolean alegbra.
Therefore doing:

(IS_DEVVP(ZTOV(zp)) && (v4_mode & WRITE_MASK_ATTRS))

Is unnecessary and we can just do:

(v4_mode & WRITE_MASK_ATTRS)

The Linux 5.16.14 kernel's coccicheck caught this. The semantic
patch that caught it was:

./scripts/coccinelle/misc/excluded_middle.cocci

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 16:00:15 -08:00
Richard Yao 9c8fabffa2 Cleanup: Replace oldstyle struct hack with C99 flexible array members
The Linux 5.16.14 kernel's coccicheck caught this. The semantic
patch that caught it was:

./scripts/coccinelle/misc/flexible_array.cocci

However, unlike the cases where the GNU zero length array extension had
been used, coccicheck would not suggest patches for the older style
single member arrays. That was good because blindly changing them would
break size calculations in most cases.

Therefore, this required care to make sure that we did not break size
calculations. In the case of `indirect_split_t`, we use
`offsetof(indirect_split_t, is_child[is->is_children])` to calculate
size. This might be subtly wrong according to an old mailing list
thread:

https://inbox.sourceware.org/gcc-prs/20021226123454.27019.qmail@sources.redhat.com/T/

That is because the C99 specification should consider the flexible array
members to start at the end of a structure, but compilers prefer to put
padding at the end. A suggestion was made to allow compilers to allocate
padding after the VLA like compilers already did:

http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n983.htm

However, upon thinking about it, whether or not we allocate end of
structure padding does not matter, so using offsetof() to calculate the
size of the structure is fine, so long as we do not mix it with sizeof()
on structures with no array members.

In the case that we mix them and padding causes offsetof(struct_t,
vla_member[0]) to differ from sizeof(struct_t), we would be doing unsafe
operations if we underallocate via `offsetof()` and then overcopy via
sizeof().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 16:00:03 -08:00
Richard Yao d35ccc1f59 Cleanup: Fix indentation in zfs_dbgmsg_t
fdc2d30371 accidentally broke the
indentation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 15:59:56 -08:00
Richard Yao 8e7ebf4e2d Cleanup: Use C99 flexible array members instead of zero length arrays
The Linux 5.16.14 kernel's coccicheck caught this. The semantic
patch that caught it was:

./scripts/coccinelle/misc/flexible_array.cocci

The Linux kernel's documentation makes a good case for why we should not
use these:

https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 15:59:41 -08:00
Richard Yao c9c3ce7976 Cleanup: Use kmem_zalloc() instead of memset() to zero memory
The Linux 5.16.14 kernel's coccicheck caught this. The semantic patch
that caught it was:

./scripts/coccinelle/api/alloc/zalloc-simple.cocci

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 15:59:28 -08:00
Richard Yao 7384ec65cd Cleanup: Remove unnecessary explicit casts of pointers from allocators
The Linux 5.16.14 kernel's coccicheck caught these. The semantic patch
that caught them was:

./scripts/coccinelle/api/alloc/alloc_cast.cocci

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14372
2023-01-12 15:59:12 -08:00
Gian-Carlo DeFazio 80d64bb85f change how d_alias is replaced by du.d_alias
d_alias may need to be converted to du.d_alias
depending on the kernel version.
d_alias is currently in only one place in the code which
changes
"hlist_for_each_entry(dentry, &inode->i_dentry, d_alias)"
to
"hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias)"
as neccesary.

This effectively results in a double macro expansion
for code that uses the zfs headers but already has its
own macro for just d_alias (lustre in this case).

Remove the conditional code for hlist_for_each_entry
and have a macro for "d_alias -> du.d_alias" instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Closes #14377
2023-01-12 10:14:04 -08:00
George Amanakis eee9362a72 Activate filesystem features only in syncing context
When activating filesystem features after receiving a snapshot, do 
so only in syncing context.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #14304 
Closes #14252
2023-01-11 18:00:39 -08:00
Brian Behlendorf 6320b9e68e CI: remove unused packages/snaps
Removing portions of packages/snaps directly with rm can result in
unexpected errors when running `apt update`.  Free up the additional
space by removing (some) packages with the proper tools.

This change frees up slightly less space than before, but it is
expected to still be sufficient.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14374
2023-01-11 15:18:51 -08:00
rob-wing 6f2ffd272c zpool: do guid-based comparison in is_vdev_cb()
is_vdev_cb() uses string comparison to find a matching vdev and 
will fallback to comparing the guid via a string.  These changes 
drop the string comparison and compare the guids instead.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Sponsored-by: Seagate Technology
Submitted-by: Klara, Inc.
Closes #14311
2023-01-11 15:14:35 -08:00
Mateusz Piotrowski 926715b9fc Turn default_bs and default_ibs into ZFS_MODULE_PARAMs
The default_bs and default_ibs tunables control the default block size
and indirect block size.

So far, default_bs and default_ibs were tunable only on FreeBSD, e.g.,

    sysctl vfs.zfs.default_ibs

Remove the FreeBSD-specific sysctl code and expose default_bs and
default_ibs as tunables on both Linux and FreeBSD using
ZFS_MODULE_PARAM.

One of the use cases for changing the values of those tunables is to
lower the indirect block size, which may improve performance of large
directories (as discussed during the OpenZFS Leadership Meeting
on 2022-08-16).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Sponsored-by: Wasabi Technology, Inc.
Closes #14293
2023-01-11 09:38:20 -08:00
Tony Hutter 4ba3eff2a6 Update META to 6.1 kernel
ZFS successfully builds against the 6.1.4 kernel.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #14371
2023-01-10 15:53:33 -08:00
Mateusz Piotrowski a4b21eadec Add tunable to allow changing micro ZAP's max size
This change turns `MZAP_MAX_BLKSZ` into a `ZFS_MODULE_PARAM()` called
`zap_micro_max_size`. As a result, we can experiment with different
micro ZAP sizes to improve directory size scaling.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Mateusz Piotrowski <mateuszpiotrowski@klarasystems.com>
Co-authored-by: Toomas Soome <toomas.soome@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateuszpiotrowski@klarasystems.com>
Sponsored-by: Wasabi Technology, Inc.
Closes #14292
2023-01-10 13:41:54 -08:00
Alyssa Ross 1f19826c9a etc/systemd/zfs-mount-generator: avoid strndupa
The non-standard strndupa function is not implemented by musl libc,
and can be dangerous due to its potential to blow the stack.  (musl
_does_ implement strdupa, used elsewhere in this function.)

With a similar amount of code, we can use a heap allocation to
construct the pool name, which is musl-friendly and doesn't have
potential stack problems.

(Why care about musl when systemd only supports glibc?  Some distros
patch systemd with portability fixes, and it would be nice to be able
to use ZFS on those distros.)

 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alyssa Ross <alyssa.ross@unikie.com>
Closes #14327
2023-01-10 13:40:31 -08:00
Matthew Ahrens fc45975ec8 Batch enqueue/dequeue for bqueue
The Blocking Queue (bqueue) code is used by zfs send/receive to send
messages between the various threads.  It uses a shared linked list,
which is locked whenever we enqueue or dequeue.  For workloads which
process many blocks per second, the locking on the shared list can be
quite expensive.

This commit changes the bqueue logic to have 3 linked lists:
1. An enquing list, which is used only by the (single) enquing thread,
   and thus needs no locks.
2. A shared list, with an associated lock.
3. A dequing list, which is used only by the (single) dequing thread,
   and thus needs no locks.

The entire enquing list can be moved to the shared list in constant
time, and the entire shared list can be moved to the dequing list in
constant time.  These operations only happen when the `fill_fraction` is
reached, or on an explicit flush request.  Therefore, the lock only
needs to be acquired infrequently.

The API already allows for dequing to block until an explicit flush, so
callers don't need to be changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14121
2023-01-10 13:39:22 -08:00
Brian Behlendorf 0c8fbe5b6a ztest: update ztest_dmu_snapshot_create_destroy()
ECHRNG is returned when the channel program encounters a runtime
error.  For example, this can happen when a snapshot doesn't exist.
We handle this error the same way as the existing EEXIST and ENOENT
error checks.

Additionally, improve the internal debug message to include the
error describing why a pool couldn't be opened.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14351
2023-01-10 13:27:48 -08:00
Brian Behlendorf 549aafb7c8 ztest: ztest_dsl_prop_set_uint64() ENOSPC consistency
It is possible for ztest_dsl_prop_set_uint64() to fail with ENOSPC
and this needs to be handled consistently.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14351
2023-01-10 13:27:48 -08:00
Brian Behlendorf f7788883ab ztest: reduce zpool split frequency
There's no need to so aggressively test splitting a pool.  Reduce
the occurence of this test to once every 10 seconds.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14351
2023-01-10 13:27:48 -08:00
Brian Behlendorf 4208a052c2 ztest: update expectation for sparing a special device
Commit c23738c70e modified the expected
behavior of attach to prevent hot spares from being used as special
vdev replacements.  We update ztest's expectations accordingly to
prevent it from failing when testing the updated behavior.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14351
2023-01-10 13:26:44 -08:00
Matthew Ahrens 40d7e971ff ztest fails assertion in zio_write_gang_member_ready()
Encrypted blocks can have up to 2 DVA's, as the third DVA is reserved
for the salt+IV.  However, dmu_write_policy() allows non-encrypted
blocks (e.g. DMU_OT_OBJSET) inside encrypted datasets to request and
allocate 3 DVA's, since they don't need a salt+IV (they are merely
authenicated).

However, if such a block becomes a gang block, the gang code incorrectly
limits the gang block header to 2 DVA's.  This leads to a "NDVAs
inversion", where a parent block (the gang block header) has less DVA's
than its children (the gang members), causing an assertion failure in
zio_write_gang_member_ready().

This commit addresses the problem by only restricting the gang block
header to 2 DVA's if the block is actually encrypted (and thus its gang
block members can have at most 2 DVA's).

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14250
Closes #14356
2023-01-09 16:43:45 -08:00
Charles Suh 44a78c05b3 libzpool: fix ddi_strtoull to update nptr
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Charles Suh <charles.suh@gmail.com>
Closes #14360
2023-01-09 12:49:35 -08:00
Ameer Hamza 5091867ee6 zed: add hotplug support for spare vdevs
This commit supports for spare vdev hotplug. The
spare vdev associated with all the pools will be
marked as "Removed" when the drive is physically
detached and will become "Available" when the
drive is reattached. Currently, the spare vdev
status does not change on the drive removal and
the same is the case with reattachment.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14295
2023-01-09 12:43:03 -08:00
Alexander Motin 289f7e6adb Remove some dead ARC code. (#14340)
Every ARC buffer holds a reference on the header. It means headers with
buffers are never evictable.  When we are evicting a header, there can
be no more buffers to free.  Just assert that.

b_evict_lock seems not protecting anything now.  Remove it.

Buffers checksum should also be freed with the last uncompressed buffer,
so it should not be there also when we are evicting the header.

Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
2023-01-09 10:45:17 -08:00
Coleman Kane a0105f6cd4 linux 6.2 compat: bio->bi_rw was renamed bio->bi_opf
The bi_rw member of struct bio was renamed to bi_opf in Linux 6.2.
As well, Linux's implementation of bio_set_op_attrs(...) has been
removed.

The HAVE_BIO_BI_OPF macro already appears to be defined, but the
removal of the bio_set_op_attrs(...) implementation makes the build
fall back on the locally-defined implementation, which isn't updated
for the bio->bi_opf change. This commit adds that update.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #14324
Closes #14331
2023-01-06 14:43:22 -08:00
Coleman Kane 884a69357f linux 6.2 compat: get_acl() got moved to get_inode_acl() in 6.2
Linux 6.2 renamed the get_acl() operation to get_inode_acl() in
the inode_operations struct. This should fix Issue #14323.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #14323
Closes #14331
2023-01-06 14:40:54 -08:00
Antonio Russo 556ed09537 Introduce ZFS_LINUX_REQUIRE_API autoconf macro
Currently, if API tests fail, we either ignore the failures, or
unconditionally halt the kernel build.  This leads to situations where
incompatibilities with existing APIs may develop, but not trip the
configure compatibility checks.

This introduces a new mechanism to require APIs for kernels above a
particular version.  While not perfect, this at least guarantees
mainline kernels do not break existing APIs without at least providing
some warning.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #14343
2023-01-06 14:33:53 -08:00
Antonio Russo d27c81847b Linux 6.1 compat: open inside tmpfile()
Linux 863f144 modified the .tmpfile interface to pass a struct file,
rather than a struct dentry, and expect the tmpfile implementation to
open inside of tmpfile().

This patch implements a configuration test that checks for this new API
and appropriately sets a HAVE_TMPFILE_DENTRY flag that tracks this old
API.  Contingent on this flag, the appropriate API is implemented.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #14301
Closes #14343
2023-01-06 14:33:00 -08:00
Antonio Russo a7304ab9c1 ZTS: close in mmapwrite.c
mmapwrite is used during the ZTS to identify issues with mmap-ed files.
This helper program exercises this pathway by continuously writing to a
file.  ee6bf97c7 modified the writing threads to terminate after a set
amount of total data is written.  This change allows standard program
execution to reach the end of a writer thread without closing the file
descriptor, introducing a resource "leak."

This patch appeases resource leak analyses by close()-ing the file at
the end of the thread.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #14353
2023-01-06 10:52:08 -08:00
Antonio Russo ee6bf97c77 ZTS: limit mmapwrite file size
mmapwrite spawns several threads, all of which perform writes on a file
for the purpose of testing the behavior of mmap(2)-ed files.  One
thread performs an mmap and a write to the beginning of that region,
while the others perform regular writes after lseek(2)-ing the end of
the file.

Because these regular writes are set in a while (1) loop, they will
write an unbounded amount of data to disk.  The mmap_write_001_pos test
script SIGKILLs them after 30 seconds, but on fast testbeds, this may
be enough time to exhaust the available space in the filesystem,
leading to spurious test failures.

Instead, limit the total file size by checking that the lseek return
value is no greater than 250 * 1024*1024 bytes, which is less than the
default minimum vdev size defined in includes/default.cfg .

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #14277 
Closes #14345
2023-01-05 12:50:30 -08:00
Clemens Lang 8352e9dfae contrib: dracut: Do not timeout waiting for pw
systemd-ask-password has a default timeout of 90 seconds, which means
that dracut will fall back to the rescue shell 4.5 minutes after boot if
no password is entered.

This is undesirable when combined with, for example, unlocking remotely
using dracut-sshd and systemd-tty-ask-password-agent. See also
https://github.com/gsauthof/dracut-sshd#timeout and
https://bugzilla.redhat.com/show_bug.cgi?id=868421.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clemens Lang <neverpanic@gmail.com>
Closes #14341
2023-01-05 12:07:43 -08:00
Richard Yao 1f3bc5ea80 Illumos #15286: do_composition() needs sign awareness
Authored by: Dan McDonald <danmcd@mnx.io>
Reviewed by: Patrick Mooney <pmooney@pfmooney.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Ported-by: Richard Yao <richard.yao@alumni.stonybrook.edu>

Illumos-issue: https://www.illumos.org/issues/15286
Illumos-commit: https://github.com/illumos/illumos-gate/commit/f137b22e734e85642da3e56e8b94da3f5f027c73

Porting Notes:

The patch in illumos did not have much of a commit message, and did not
provide attribution to the reporter, while original patch proposed to
OpenZFS did, so I am listing the reporter (myself) and original patch
author (also myself) below while including the original commit message
with some minor corrections as part of the porting notes:

In do_composition(), we have:

size = u8_number_of_bytes[*p];
if (size <= 1 || (p + size) > oslast)
	break;

There, we have type promotion from int8_t to size_t, which is unsigned.
C will sign extend the value as part of the widening before treating the
value as unsigned and the negative values we can counter are error
values from U8_ILLEGAL_CHAR and U8_OUT_OF_RANGE_CHAR, which are -1 and
-2 respectively. The unsigned versions of these under two's complement
are SIZE_MAX and SIZE_MAX-1 respectively.

The bounds check is written under the assumption that `size <= 1` does a
signed comparison. This is followed by a pointer comparison to see if
the string has the correct length, which is fine.

A little further down we have:

for (i = 0; i < size; i++)
	tc[i] = *p++;

When an error condition is encountered, this will attempt to iterate at
least SIZE_MAX-1 times, which will massively overflow the buffer, which
is not fine.

The kernel will kill the loop as soon as it hits the kernel stack guard
on Linux systems built with CONFIG_VMAP_STACK=y, which should be just
about all of them. That prevents arbitrary code execution and just about
any other bad thing that a black hat attacker might attempt with
knowledge of this buffer overflow. Other systems' kernels have
mitigations for unbounded in-kernel buffer overflows that will catch
this too.

Also, the patch in illumos-gate made an effort to fix C style issues
that had been fixed in the OpenZFS/ZFSOnLinux repository. Those issues
had been mentioned in the email that I originally sent them about this
issue. One of the fixes had not been already done, so it is included.
Another to collect_a_seq()'s arguments was handled differently in
OpenZFS. For the sake of avoiding unnecessary differences, it has been
adopted. This has the interesting effect that if you correct the paths
in the illumos-gate patch to match the current OpenZFS repository, you
can reverse apply it cleanly.

Original-patch-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reported-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Co-authored-by: Dan McDonald <danmcd@mnx.io>
Closes #14318
Closes #14342
2023-01-05 11:16:21 -08:00
Matthew Ahrens b72efb7511 removal of LegacyVersion broke ax_python_dev.m4
The 22.0 release of the python `packaging` package removed the
`LegacyVersion` trait, causing ZFS to no longer compile.

This commit replaces the sections of `ax_python_dev.m4` that rely on
`LegacyVersion` with updated implementations from the upstream
`autoconf-archive`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14297
2023-01-05 11:04:24 -08:00
Mateusz Guzik f25f1f9091 FreeBSD: catch up to 1400077
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #14328
2023-01-05 10:56:40 -08:00
Martin Rüegg f6f215f07f Fix shebang for helper script of deb-utils
Shebang was missing the `!` between `#` and the actual path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch>
Closes #14339
2023-01-05 10:50:00 -08:00
Martin Rüegg fb000f7867 Add quotation marks around $PATH for deb-utils
Fix #14338, failing to build deb-utils if existing `$PATH` variable
would include a whitespace.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Rüegg <martin.rueegg@metaworx.ch>
Closes #14339
2023-01-05 10:49:33 -08:00
Alexander Motin db832c47fe Pack zrlock_t by 8 bytes
On FreeBSD this reduces this structure size from 64 to 56 bytes.
dnode_handle_t respectively reduces from 72 to 64 bytes. It sounds
like a waste to need 72 bytes to be able to relocate 808 bytes of
dnode_t, which relocation on FreeBSD is not even supported.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Closes #14317
2023-01-05 09:31:55 -08:00
Alexander Motin 792a6ee462 Update arc_summary and arcstat outputs
Recent ARC commits added new statistic counters, such as iohits,
uncached state, etc.  Represent those.  Also some of previously
reported numbers were confusing or even made no sense.  Cleanup
and restructure existing reports.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Sponsored by:   iXsystems, Inc.
Issue #14115 
Issue #14123
Issue #14243 
Closes #14320
2023-01-05 09:29:13 -08:00
Alexander Motin bacf366fe2 Hide b_freeze_* under ZFS_DEBUG
This saves 40 bytes per full ARC header, reducing it on FreeBSD from
240 to 200 bytes on production bits.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #14315
2023-01-05 10:15:31 -07:00
Alexander Motin ed2f7ba08d Implement uncached prefetch
Previously the primarycache property was handled only in the dbuf
layer. Since the speculative prefetcher is implemented in the ARC,
it had to be disabled for uncacheable buffers.

This change gives the ARC knowledge about uncacheable buffers
via  arc_read() and arc_write(). So when remove_reference() drops
the last reference on the ARC header, it can either immediately destroy
it, or if it is marked as prefetch, put it into a new arc_uncached state. 
That state is scanned every second, evicting stale buffers that were
not demand read.

This change also tracks dbufs that were read from the beginning,
but not to the end.  It is assumed that such buffers may receive further
reads, and so they are stored in dbuf cache. If a following
reads reaches the end of the buffer, it is immediately evicted.
Otherwise it will follow regular dbuf cache eviction.  Since the dbuf
layer does not know actual file sizes, this logic is not applied to
the final buffer of a dnode.

Since uncacheable buffers should no longer stay in the ARC for long,
this patch also tries to optimize I/O by allocating ARC physical
buffers as linear to allow buffer sharing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #14243
2023-01-04 17:29:54 -07:00
Alexander Motin c935fe2e92 arc_read()/arc_access() refactoring and cleanup
ARC code was many times significantly modified over the years, that
created significant amount of tangled and potentially broken code.
This should make arc_access()/arc_read() code some more readable.

 - Decouple prefetch status tracking from b_refcnt.  It made sense
originally, but became highly cryptic over the years.  Move all the
logic into arc_access().  While there, clean up and comment state
transitions in arc_access().  Some transitions were weird IMO.
 - Unify arc_access() calls to arc_read() instead of sometimes calling
it from arc_read_done().  To avoid extra state changes and checks add
one more b_refcnt for ARC_FLAG_IO_IN_PROGRESS.
 - Reimplement ARC_FLAG_WAIT in case of ARC_FLAG_IO_IN_PROGRESS with
the same callback mechanism to not falsely account them as hits. Count
those as "iohits", an intermediate between "hits" and "misses". While
there, call read callbacks in original request order, that should be
good for fairness and random speculations/allocations/aggregations.
 - Introduce additional statistic counters for prefetch, accounting
predictive vs prescient and hits vs iohits vs misses.
 - Remove hash_lock argument from functions not needing it.
 - Remove ARC_FLAG_PREDICTIVE_PREFETCH, since it should be opposite
to ARC_FLAG_PRESCIENT_PREFETCH if ARC_FLAG_PREFETCH is set.  We may
wish to add ARC_FLAG_PRESCIENT_PREFETCH to few more places.
 - Fix few false positive tests found in the process.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14123
2022-12-22 12:10:24 -08:00
Ryan Moeller dc8c2f6158 FreeBSD: Fix potential boot panic with bad label
vdev_geom_read_pool_label() can leave NULL in configs.  Check for it
and skip consistently when generating rootconf.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #14291
2022-12-22 11:50:09 -08:00
Matthew Ahrens 018f26041d deadlock between spa_errlog_lock and dp_config_rwlock
There is a lock order inversion deadlock between `spa_errlog_lock` and
`dp_config_rwlock`:

A thread in `spa_delete_dataset_errlog()` is running from a sync task.
It is holding the `dp_config_rwlock` for writer (see
`dsl_sync_task_sync()`), and waiting for the `spa_errlog_lock`.

A thread in `dsl_pool_config_enter()` is holding the `spa_errlog_lock`
(see `spa_get_errlog_size()`) and waiting for the `dp_config_rwlock` (as
reader).

Note that this was introduced by #12812.

This commit address this by defining the lock ordering to be
dp_config_rwlock first, then spa_errlog_lock / spa_errlist_lock.
spa_get_errlog() and spa_get_errlog_size() can acquire the locks in this
order, and then process_error_block() and get_head_and_birth_txg() can
verify that the dp_config_rwlock is already held.

Additionally, a buffer overrun in `spa_get_errlog()` is corrected.  Many
code paths didn't check if `*count` got to zero, instead continuing to
overwrite past the beginning of the userspace buffer at `uaddr`.

Tested by having some errors in the pool (via `zinject -t data
/path/to/file`), one thread running `zpool iostat 0.001`, and another
thread runs `zfs destroy` (in a loop, although it hits the first time).
This reproduces the problem easily without the fix, and works with the
fix.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #14239
Closes #14289
2022-12-22 11:48:49 -08:00
Brian Behlendorf 29e1b089c1 Documentation corrections
- Update the link to the OpenZFS Code of Conduct.
- Remove extra "the" from contrib/initramfs/scripts/zfs

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14298
Closes #14307
2022-12-22 11:34:28 -08:00
Brian Behlendorf b4cd4fe1aa Revert "zdb: zdb_ddt_leak_init() reads uninitialized memory..."
This reverts commit d30db519af.  With
this change applied zloop.sh fails reliably with the following ASSERT.

  zio_wait(zio_claim(NULL, zcb->zcb_spa, refcnt ? 0 : spa_min_claim_txg(
    zcb->zcb_spa), bp, NULL, NULL, ZIO_FLAG_CANFAIL)) == 0 (0x2 == 0x0)
  ASSERT at cmd/zdb/zdb.c:5452:zdb_count_block()

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14306
2022-12-21 09:17:00 -08:00
George Melikov bd9dc5a1dc systemd: set restart=always for zfs-zed.service
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Co-authored-by: Attila Fülöp <attila@fueloep.org>
Closes #14294
2022-12-19 16:08:15 -08:00
Ethan Coe-Renner fb11b1570a Add color output to zfs diff.
This adds support to color zfs diff (in the style of git diff)
conditional on the ZFS_COLOR environment variable.

Signed-off-by: Ethan Coe-Renner <coerenner1@llnl.gov>
2022-12-15 10:14:32 -08:00
Doug Rabson 24502bd3a7 FreeBSD: Remove stray debug printf
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Doug Rabson <dfr@rabson.org>
Closes #14286 
Closes #14287
2022-12-13 17:35:07 -08:00
Umer Saleem e6e31dd540 Add native-deb* targets to build native Debian packages
In continuation of previous #13451, this commits adds native-deb*
targets for make to build native debian packages. Github workflows
are updated to build and test native Debian packages.

Native packages only build with pre-configured paths (see the
dh_auto_configure section in contrib/debian/rules.in). While
building native packages, paths should not be configured. Initial
config flags e.g. '--enable-debug' are replaced in
contrib/debian/rules.in.

Additional packages on top of existing zfs packages required to
build native packages include debhelper-compat, dh-python, dkms,
po-debconf, python3-all-dev, python3-sphinx.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #14265
2022-12-13 17:33:05 -08:00
Richard Yao f3f5263f8a Zero end of embedded block buffer in dump_write_embedded()
This fixes a kernel stack leak.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-by: Nicholas Sherlock <n.sherlock@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13778
Closes #14255
2022-12-13 17:31:47 -08:00
Ameer Hamza 9be34ec99e Allow receiver to override encryption properties in case of replication
Currently, the receiver fails to override the encryption
property for the plain replicated dataset with the error:
"cannot receive incremental stream: encryption property
'encryption' cannot be set for incremental streams.". The
problem is resolved by allowing the receiver to override
the encryption property for plain replicated send.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14253
Closes #13533
2022-12-13 17:30:46 -08:00
Richard Yao 3236c0b891 Cache dbuf_hash() calculation
We currently compute a 64-bit hash three times, which consumes 0.8% CPU
time on ARC eviction heavy workloads. Caching the 64-bit value in the
dbuf allows us to avoid that overhead.

Sponsored-By: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Closes #14251
2022-12-13 17:29:21 -08:00
Allan Jude dc95911d21 zfs list: Allow more fields in ZFS_ITER_SIMPLE mode
If the fields to be listed and sorted by are constrained to those
populated by dsl_dataset_fast_stat(), then zfs list is much faster,
as it does not need to open each objset and reads its properties.

A previous optimization by Pawel Dawidek
(0cee24064a) took advantage
of this to make listing snapshot names sorted only by name much faster.

However, it was limited to `-o name -s name`, this work extends this
optimization to work with:
  - name
  - guid
  - createtxg
  - numclones
  - inconsistent
  - redacted
  - origin
and could be further extended to any other properties supported by
dsl_dataset_fast_stat() or similar, that do not require extra locking
or reading from disk.

This was committed before (9a9e2e343dfa2af28bf7910de77ae73aa006de62),
but was reverted due to a regression when used with an older kernel.

If the kernel does not populate zc->zc_objset_stats, we now fallback
to getting the properties via the slower interface, to avoid problems
with newer userland and older kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14110
2022-12-13 17:27:54 -08:00
Marcel Menzel 70ac2654f5 Change ZEVENT_POOL_GUID to ZEVENT_POOL to display pool names
Outgoing mails for ZFS pool events include the pool GUID,
but not the actual pool name. Let's change this for better
readability, as it is already done in the mails for finished
pool resilvers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Marcel Menzel <mail@mcl.gg>
Closes #14272
2022-12-13 17:26:10 -08:00
Richard Yao d31a7cb4fa Address theoretical uninitialized variable usage in zstream
Coverity has long complained about the checksum being uninitialized if
an END record is processed before its BEGIN record. This should not
happen, but there was no code to check for it. I had left this unfixed
since it was a low priority issue, but then
9f4ede63d2 added another instance of this.

I am making an effort to "hold the line" to keep new coverity defect
reports from going unaddressed, so I find myself forced to fix this much
earlier than I had originally planned to address it.

The solution is to maintain a counter and a flag. Then use VERIFY
statements to verify the following runtime constraints:

 * Every record either has a corresponding BEGIN record, is a BEGIN
   record or is the end of stream END record for replication streams.
 * BEGIN records cannot be nested. i.e. There must be an END record
   before another BEGIN record may be seen.

Failure to meet these constraints will cause the program to exit.

This is sufficient to ensure that the checksum is never accessed when
uninitialized.

Reported-by: Coverity (CID 1524578)
Reported-by: Coverity (CID 1524633)
Reported-by: Coverity (CID 1527295)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14176
2022-12-12 10:40:05 -08:00
Ryan Moeller 786ff6a6cb initramfs: Fix legacy mountpoint rootfs
Legacy mountpoint datasets should not pass `-o zfsutil` to `mount.zfs`.
Fix the logic in `mount_fs()` to not forget we have a legacy mountpoint
when checking for an `org.zol:mountpoint` userprop.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #14274
2022-12-12 10:23:06 -08:00
Ameer Hamza e3785718ba Skip permission checks for extended attributes
zfs_zaccess_trivial() calls the generic_permission() to read
xattr attributes. This causes deadlock if called from
zpl_xattr_set_dir() context as xattr and the dent locks are
already held in this scenario. This commit skips the permissions
checks for extended attributes since the Linux VFS stack already
checks it before passing us the control.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14220
2022-12-12 10:21:37 -08:00
Allan Jude f900279e6d Restrict visibility of per-dataset kstats inside FreeBSD jails
When inside a jail, visibility on datasets not "jailed" to the
jail is restricted. However, it was possible to enumerate all
datasets in the pool by looking at the kstats sysctl MIB.

Only the kstats corresponding to datasets that the user has
visibility on are accessible now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14254
2022-12-09 11:04:29 -08:00
Richard Yao 5401472cd0 Linux PPC: Fix build failures on kernels built without CONFIG_SPE
We do a simple ifdef to avoid calling enable_kernel_spe()/
disable_kernel_spe() on PowerPC.

Reported-by: Rich Ercolani <Rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Tested-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14233
Closes #14244
2022-12-09 10:51:23 -08:00
Serapheim Dimitropoulos 7bf4c97a36 Bypass metaslab throttle for removal allocations
Context:
We recently had a scenario where a customer with 2x10TB disks at 95+%
fragmentation and capacity, wanted to migrate their disks to a 2x20TB
setup. So they added the 2 new disks and submitted the removal of the
first 10TB disk.  The removal took a lot more than expected (order of
more than a week to 2 weeks vs a couple of days) and once it was done it
generated a huge indirect mappign table in RAM (~16GB vs expected ~1GB).

Root-Cause:
The removal code calls `metaslab_alloc_dva()` to allocate a new block
for each evacuating block in the removing device and it tries to batch
them into 16MB segments. If it can't find such a segment it tries for
8MBs, 4MBs, all the way down to 512 bytes.

In our scenario what would happen is that `metaslab_alloc_dva()` from
the removal thread pick the new devices initially but wouldn't allocate
from them because of throttling in their metaslab allocation queue's
depth (see `metaslab_group_allocatable()`) as these devices are new and
favored for most types of allocations because of their free space. So
then the removal thread would look at the old fragmented disk for
allocations and wouldn't find any contiguous space and finally retry
with a smaller allocation size until it would to the low KB range. This
caused a lot of small mappings to be generated blowing up the size of
the indirect table. It also wasted a lot of CPU while the removal was
active making everything slow.

This patch:
Make all allocations coming from the device removal thread bypass the
throttle checks. These allocations are not even counted in the metaslab
allocation queues anyway so why check them?

Side-Fix:
Allocations with METASLAB_DONT_THROTTLE in their flags would not be
accounted at the throttle queues but they'd still abide by the
throttling rules which seems wrong. This patch fixes this by checking
for that flag in `metaslab_group_allocatable()`. I did a quick check to
see where else this flag is used and it doesn't seem like this change
would cause issues.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14159
2022-12-09 10:48:33 -08:00
Richard Yao 5f73bbba43 Do not pass -1 to strerror() from zfs_send_cb_impl()
`zfs_send_cb_impl()` calls `dump_filesystems()`, which calls
`dump_filesystem()`, which will return `-1` as an error when
`zfs_open()` returns `NULL`.

This will be passed to `zfs_standard_error()`, which passes it to
`zfs_standard_error_fmt()`, which passes it to `strerror()`.

To fix this, we modify zfs_open() to set `errno` whenever it returns
NULL. Most of the cases already have `errno` set (since they pass it to
`zfs_standard_error_fmt()`, which makes this easy. Then we modify
`dump_filesystem()` to pass `errno` instead of `-1`.

Reported-by: Coverity (CID-1524598)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14264
2022-12-08 14:15:27 -08:00
Richard Yao 242a5b748c Fix dereference after null check in enqueue_range
If the bp is NULL, we have a hole. However, when we build with
assertions, we will dereference bp when `blkid == DMU_SPILL_BLKID`. When
this happens on a hole, we will have a NULL pointer dereference.

Reported-by: Coverity (CID-1524670)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14264
2022-12-08 14:15:21 -08:00
Richard Yao f954ea26a6 zdb: Handle theoretical buffer overflow when printing float
CodeQL pointed out that for extreme floating point values, `sprintf()`
will overwrite a 32 character buffer. It cited 1e304 as an example,
which causes `sprintf()` to print 308 characters.

In practice, the numbers should never exceed 100, so this should not
happen. To silence the warning and also handle unexpected situations, we
change the code to use `snprintf()`.

This was missed during my audit of our use of `sprintf()`, since I did
not think to consider extreme floating point representations. It also
really should not happen, so this change is purely defensive
programming.

This was found by CodeQL's cpp/overrunning-write-with-float check.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14264
2022-12-08 14:15:15 -08:00
Richard Yao d30db519af zdb: zdb_ddt_leak_init() reads uninitialized memory when birth == 0
This was written by Jeff Bonick and was committed to OpenSolaris on
November 1, 2009. It appears that Jeff meant to continue the outer loop
iteration when `ddp->ddp_phys_birth == 0`, but put his check inside the
inner loop. This causes a pointer to uninitialized memory to be passed
to ddt_lookup() inside a VERIFY() statement whenever that condition is
true.

Reported-by: Coverity (CID 1524462)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14264
2022-12-08 14:15:10 -08:00
Richard Yao 2709ace096 ztest: comparisons against errno should not assign to it
888914486e introduced this regression.

I used cscope to verify that there are no other instances of this in the
codebase. This is the one of the few bugs that are extremely easy to
identify using cscope.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14264
2022-12-08 14:15:04 -08:00
Richard Yao ba87ed1410 Fix potential buffer overflow in zpool command
The ZPOOL_SCRIPTS_PATH environment variable can be passed here. This
allows for arbitrarily long strings to be passed to sprintf(), which can
overflow the buffer.

I missed this in my earlier audit of the codebase. CodeQL's
cpp/unbounded-write check caught this.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14264
2022-12-08 14:14:30 -08:00
Richard Yao ecccaede68 zdb: Fix big parameter passed by value
This is not in performance critical code, but static analyzers will
complain about it, so lets switch to pass by pointer here.

Reported-by: Coverity (CID-1524384)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14263
2022-12-08 13:52:53 -08:00
Richard Yao f1100863f7 Linux: Cleanup unnecessary NULL check in __vdev_disk_physio()
zio is never NULL when given to the vdev. Coverity complained saying:

"Either the check against null is unnecessary, or there may be a null
pointer dereference."

Reported-by: Coverity (CID-1466174)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14263
2022-12-08 13:52:47 -08:00
Richard Yao 56c6f293c0 Remove duplicate statically allocated variable
dsl_dataset_snapshot_sync_impl() declares `static zil_header_t zero_zil
__maybe_unused;`, but this is also declared globally. This wastes
memory.

CodeQL's cpp/local-variable-hides-global-variable check caught this.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14263
2022-12-08 13:52:42 -08:00
Richard Yao aaa9a6700f Cleanup: zhack should not declare function prototypes in main()
Instead, it should include the proper header.

CodeQL caught this.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14263
2022-12-08 13:51:24 -08:00
Brian Behlendorf 0620051224 ZTS: Add missing tests to Makefile.am
The send-c_zstream_recompress.ksh test case was being skipped
because it was not added to the Makefile.am, and was thus left
out of the package.

As for the renameat2 tests these were being skipped because
when the patch was rebased it was not updated to use the new
Makefile layout for the tests directory.  Correct this.

Add missing pre/post sections to sanity.run so the pyzfs tests
will run.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14266
2022-12-07 17:26:33 -08:00
Richard Yao 59493b63c1 Micro-optimize fletcher4 calculations
When processing abds, we execute 1 `kfpu_begin()`/`kfpu_end()` pair on
every page in the abd. This is wasteful and slows down checksum
performance versus what the benchmark claimed. We correct this by moving
those calls to the init and fini functions.

Also, we always check the buffer length against 0 before calling the
non-scalar checksum functions. This means that we do not need to execute
the loop condition for the first loop iteration. That allows us to
micro-optimize the checksum calculations by switching to do-while loops.

Note that we do not apply that micro-optimization to the scalar
implementation because there is no check in
`fletcher_4_incremental_native()`/`fletcher_4_incremental_byteswap()`
against 0 sized buffers being passed.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14247
2022-12-05 11:00:34 -08:00
Richard Yao 7b9a423076 FreeBSD: zfs_register_callbacks() must implement error check correctly
I read the following article and noticed a couple of ZFS bugs mentioned:

https://pvs-studio.com/en/blog/posts/cpp/0377/

I decided to search for them in the modern OpenZFS codebase and then
found one that matched the description of the first one:

V593 Consider reviewing the expression of the 'A = B != C' kind. The
expression is calculated as following: 'A = (B != C)'. zfs_vfsops.c 498

The consequence of this is that the error value is replaced with `1`
when there is an error. When there is no error, 0 is correctly passed.
This is a very minor issue that is unlikely to cause any real problems.

The incorrect error code would either be returned to the mount command
on a failure or any of `zfs receive`, `zfs recv`, `zfs rollback` or `zfs
upgrade`.

The second one has already been fixed.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14261
2022-12-05 10:16:50 -08:00
George Wilson ffd2e15d65 zio can deadlock during device removal
When doing a device removal on a pool with gang blocks, the zio pipeline
can deadlock when trying to free blocks from a device which is being
removed with a stack similar to this:

 0xffff8ab9a13a1740 UNINTERRUPTIBLE       4
                   __schedule+0x2e5
                   __schedule+0x2e5
                   schedule+0x33
                   schedule_preempt_disabled+0xe
                   __mutex_lock.isra.12+0x2a7
                   __mutex_lock.isra.12+0x2a7
                   __mutex_lock_slowpath+0x13
                   mutex_lock+0x2c
                   free_from_removing_vdev+0x61
                   metaslab_free_impl+0xd6
                   metaslab_free_dva+0x5e
                   metaslab_free+0x196
                   zio_free_sync+0xe4
                   zio_free_gang+0x38
                   zio_gang_tree_issue+0x42
                   zio_gang_tree_issue+0xa2
                   zio_gang_issue+0x6d
                   zio_execute+0x94
                   zio_execute+0x94
                   taskq_thread+0x23b
                   kthread+0x120
                   ret_from_fork+0x1f

Since there are gang blocks we have to read the gang members as part of
the free. This can be seen with a zio dependency tree that looks like
this:

sdb> echo 0xffff900c24f8a700 | zio -rc | zio
ADDRESS                       TYPE  STAGE            WAITER
0xffff900c24f8a700            NULL  CHECKSUM_VERIFY  0xffff900ddfd31740
0xffff900c24f8c920            FREE  GANG_ASSEMBLE    -
0xffff900d93d435a0            READ  DONE

In the illustration above we are processing frees but because of gang
block we have to read the constituents blocks. Once we finish the READ
in the zio pipeline we will execute the parent. In this case the parent
is a FREE but the zio taskq is a READ and we continue to process the
pipeline leading to the stack above. In the stack above, we are blocked
waiting for the svr_lock so as a result a READ interrupt taskq thread
is now consumed. Eventually, all of the READ taskq threads end up
blocked and we're unable to complete any read requests.

In zio_notify_parent there is an optimization to continue to use
the taskq thread to exectue the parent's pipeline. To resolve the
deadlock above, we only allow this optimization if the parent's
zio type matches the child which just completed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-issue: DLPX-80130
Closes #14236
2022-12-02 17:46:29 -08:00
George Wilson d7cf06a25d nopwrites on dmu_sync-ed blocks can result in a panic
After a device has been removed, any nopwrites for blocks on that
indirect vdev should be ignored and a new block should be allocated. The
original code attempted to handle this but used the wrong block pointer
when checking for indirect vdevs and failed to check all DVAs.

This change corrects both of these issues and modifies the test case
to ensure that it properly tests nopwrites with device removal.

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #14235
2022-12-02 17:45:33 -08:00
Rob Wing 2c590bdede ZTS: test reported checksum errors for ZED
Test checksum error reporting to ZED via the call paths
vdev_raidz_io_done_unrecoverable() and zio_checksum_verify().

Sponsored-by: Seagate Technology LLC
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Closes #14190
2022-12-02 17:43:18 -08:00
Rob Wing 7a75f74cec Bump checksum error counter before reporting to ZED
The checksum error counter is incremented after reporting to ZED. This
leads ZED to receiving a checksum error report with 0 checksum errors.

To avoid this, bump the checksum error counter before reporting to ZED.

Sponsored-by: Seagate Technology LLC
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Closes #14190
2022-12-02 17:42:22 -08:00
Xinliang Liu db6ba8d744 autoconf: add support for openEuler
Add config support for openEuler, so that it set the right sysconfig
dir for openEuler.

And DEFAULT_INIT_SCRIPT is no longer needed since commit "2a34db1bd
Base init scripts for SYSV systems".

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Closes #14241
2022-12-02 17:39:48 -08:00
szubersk fe975048da Fix Clang 15 compilation errors
- Clang 15 doesn't support `-fno-ipa-sra` anymore. Do a separate
  check for `-fno-ipa-sra` support by $KERNEL_CC.

- Don't enable `-mgeneral-regs-only` for certain module files.
  Fix #13260

- Scope `GCC diagnostic ignored` statements to GCC only. Clang
  doesn't need them to compile the code.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13260
Closes #14150
2022-11-30 13:46:26 -08:00
szubersk 3c1e1933b6 Fix GCC 12 compilation errors
Squelch false positives reported by GCC 12 with UBSan.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14150
2022-11-30 13:45:53 -08:00
Brian Behlendorf 3a6d89ae52 Retire Ubuntu 18.04 CI builder
The GitHub-hosted Ubuntu 18.04 has been deprecated and will be
entirely unsupported as of April 2023.  Leading up to this there
will be scheduled "brownouts" to encourage users to update their
workflows.

This commit retires our use of the GitHub-hosted Ubuntu 18.04
runners in advance of their removal.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14238
2022-11-30 10:25:57 -08:00
Yann Collet 776441152e zstd: Refactor prefetching for the decoding loop
Following facebook/zstd#2545, I noticed that one field in `seq_t` is
optional, and only used in combination with prefetching. (This may have
contributed to static analyzer failure to detect correct
initialization).

I then wondered if it would be possible to rewrite the code so that this
optional part is handled directly by the prefetching code rather than
delegated as an option into `ZSTD_decodeSequence()`.

This resulted into this refactoring exercise where the prefetching
responsibility is better isolated into its own function and
`ZSTD_decodeSequence()` is streamlined to contain strictly Sequence
decoding operations.  Incidently, due to better code locality, it
reduces the need to send information around, leading to simplified
interface, and smaller state structures.

Port of facebook/zstd@f5434663ea

Reported-by: Coverity (CID 1462271)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Ported-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14212
2022-11-29 10:05:30 -08:00
Nick Terrell 466cf54ecf zstd: [superblock] Add defensive assert and bounds check
The bound check condition should always be met because we selected
`set_basic` as our encoding type. But that code is very far away, so
assert it is true so if it is ever false we can catch it, and add a
bounds check.

Port of facebook/zstd@1047097dad

Reported-by: Coverity (CID 1524446)
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Ported-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14212
2022-11-29 10:04:43 -08:00
Richard Yao eb1ed2a66b Coverity Model Update
When reviewing Clang Static Analyzer reports against a branch that had
experimental header changes based on the Coverity model file to inform
it that KM_SLEEP allocations cannot return NULL, I found a report saying
that a KM_PUSHPAGE allocation returned NULL. The actual implementation
does not return NULL unless KM_NOSLEEP has been passed, so we backport
the correction from the experimental header changes to the Coverity
model.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14210
2022-11-29 10:00:56 -08:00
Richard Yao 97fac0fb70 Fix NULL pointer dereference in dbuf_prefetch_indirect_done()
When ZFS is built with assertions, a prefetch is done on a redacted
blkptr and `dpa->dpa_dnode` is NULL, we will have a NULL pointer
dereference in `dbuf_prefetch_indirect_done()`.

Both Coverity and Clang's Static Analyzer caught this.

Reported-by: Coverity (CID 1524671)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14210
2022-11-29 10:00:50 -08:00
Richard Yao 887fb37843 zdb: Silence Coverity complaint about verify_livelist_allocs()
svb is declared on the stack. We then set parts of svb.svb_dva with
DVA_SET_VDEV(), DVA_SET_OFFSET() and DVA_SET_ASIZE(). However, the DVA
contains other fields for pad, GRID and G. When setting the fields we
use, we technically read uninitialized bits  from the fields we do not
use. This makes Coverity and Clang's Static Analyzer complain.
Presumably, other static analyzers might complain too.

There is no real bug here, but we are still technically reading
undefined data and unless we stop doing that, static analyzers will
complain about it in perpetuum and this could obscure real issues. We
silence the static analyzer complaints by using a 0 struct initializer.

Reported by: Coverity (CID 1524627)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14210
2022-11-29 10:00:45 -08:00
Richard Yao 8532da5e20 Cleanup: Delete dead code from send_merge_thread()
range is always deferenced before it reaches this check, such that the
kmem_zalloc() call is never executed.

A previously version of this had erronously also pruned the
`range->eos_marker = B_TRUE` line, but it must be set whenever we
encounter an error or are cancelled early.

Coverity incorrectly complained about a potential NULL pointer
dereference because of this.

Reported-by: Coverity (CID 1524550)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14210
2022-11-29 09:59:53 -08:00
Alexander b5459dd354 Fix the last two CFI callback prototype mismatches
There was the series from me a year ago which fixed most of the
callback vs implementation prototype mismatches. It was based on
running the CFI-enabled kernel (in permissive mode -- warning
instead of panic) and performing a full ZTS cycle, and then fixing
all of the problems caught by CFI.
Now, Clang 16-dev has new warning flag, -Wcast-function-type-strict,
which detect such mismatches at compile-time. It allows to find the
remaining issues missed by the first series.
There are only two of them left: one for the
secpolicy_vnode_setattr() callback and one for taskq_dispatch().
The fix is easy, since they are not used anywhere else.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14207
2022-11-29 09:56:16 -08:00
Richard Yao 587a39b729 Lua: Fix bad bitshift in lua_strx2number()
The port of lua to OpenZFS modified lua to use int64_t for numbers
instead of double. As part of this, a function for calculating
exponentiation was replaced with a bit shift. Unfortunately, it did not
handle negative values. Also, it only supported exponents numbers with
7 digits before before overflow. This supports exponents up to 15 digits
before overflow.

Clang's static analyzer reported this as "Result of operation is garbage
or undefined" because the exponent was negative.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14204
2022-11-29 09:53:33 -08:00
Brooks Davis d6df4441c0 Don't leak packed recieved proprties
When local properties (e.g., from -o and -x) are provided, don't leak
the packed representation of the received properties due to variable
reuse.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14197
2022-11-29 09:51:35 -08:00
Alexander Motin fd61b2eaba Remove few pointer dereferences in dbuf_read()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #14199
2022-11-29 09:49:02 -08:00
Mateusz Guzik 9aea88ba44 FreeBSD: stop using buffer cache-only routines on sync
Both vop_fsync and vfs_stdsync are effectively just costly no-ops
as they only act on ->v_bufobj.bo_dirty et al, which are unused
by zfs.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by:	Mateusz Guzik <mjguzik@gmail.com>
Closes #14157
2022-11-29 09:35:25 -08:00
Alexander Motin 4df415aa86 Switch dnode stats to wmsums
I've noticed that some of those counters are used in hot paths like
dnode_hold_impl(), and results of this change is visible in profiler.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #14198
2022-11-29 09:33:45 -08:00
Xinliang Liu 6712c77140 rpm: add support for openEuler
OpenEuler uses the same package manager DNF as RHEL/Fedora. And
it is similar to RHEL/Fedora.

OpenEuler Linux is becoming the mainstream Linux distro in China.
So adding support for it makes sense for the users. For more
details about it see: https://www.openeuler.org/en/.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Closes #14222
2022-11-29 09:27:22 -08:00
Alexander Motin f0a76fbec1 Micro-optimize zrl_remove()
atomic_dec_32() should be a bit lighter than atomic_dec_32_nv().

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #14200
2022-11-29 09:26:03 -08:00
Ameer Hamza e996c502e4 zed: unclean disk attachment faults the vdev
If the attached disk already contains a vdev GUID, it
means the disk is not clean. In such a scenario, the
physical path would be a match that makes the disk
faulted when trying to online it. So, we would only
want to proceed if either GUID matches with the last
attached disk or the disk is in a clean state.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14181
2022-11-29 09:24:10 -08:00
Richard Yao 303678350a Convert some sprintf() calls to kmem_scnprintf()
These `sprintf()` calls are used repeatedly to write to a buffer. There
is no protection against overflow other than reviewers explicitly
checking to see if the buffers are big enough. However, such issues are
easily missed during review and when they are missed, we would rather
stop printing rather than have a buffer overflow, so we convert these
functions to use `kmem_scnprintf()`. The Linux kernel provides an entire
page for module parameters, so we are safe to write up to PAGE_SIZE.

Removing `sprintf()` from these functions removes the last instances of
`sprintf()` usage in our platform-independent kernel code. This improves
XNU kernel compatibility because the XNU kernel does not support
(removed support for?) `sprintf()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14209
2022-11-28 13:49:58 -08:00
Allan Jude d27a00283f Avoid a null pointer dereference in zfs_mount() on FreeBSD
When mounting the root filesystem, vfs_t->mnt_vnodecovered is null

This will cause zfsctl_is_node() to dereference a null pointer when
mounting, or updating the mount flags, on the root filesystem, both
of which happen during the boot process.

Reported-by: Martin Matuska <mm@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14218
2022-11-28 13:40:49 -08:00
наб 3069872ef5 cmd: zfs: fix missing mention of zfs diff -h
Fixes: 344bbc82e7

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #14224
2022-11-28 13:37:07 -08:00
Richard Yao 022bdb5b0b Set multiple make jobs on CodeQL github workflows
github supports 2 processors right now, although we use nproc instead of
hard coding -j2, so if that ever increases, we will take advantage of it
right away.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14217
2022-11-28 13:29:55 -08:00
Minsoo Choo adf3b84fc7 Add <limits.h> header
According to the UNIX standard, <pthread.h> does not include some
PTHREAD_* values which are included in <limits.h>. OpenZFS uses
some of these values in its code, and this might cause build failure on
systems that do not have these PTHREAD_* values in <pthread.h>

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>
Closes #14225
2022-11-28 13:24:17 -08:00
Allan Jude 077fd55e85 zfs.4: Correct the default for zfs_vdev_async_write_max_active
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14179
2022-11-28 11:41:14 -08:00
Damian Szuberski 387109364e Python3: replace distutils with sysconfig
- `distutils` module is long time deprecated and already deleted
  from the CPython mainline.

- To remain compatible with Debian/Ubuntu Python3 packaging style,
  try
  `distutils.sysconfig.get_python_path(0,0)`
  first with fallback on
  `sysconfig.get_path('purelib')`

- pyzfs_unittest suite is run unconditionally as a part of ZTS.

- Add pyzfs_unittest suite to sanity tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12833 
Closes #13280 
Closes #14177
2022-11-28 11:39:41 -08:00
Alexander Motin 5f45e3f699 Remove atomics from zh_refcount
It is protected by z_hold_locks, so we do not need more serialization,
simple integer math should be fine.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Closes #14196
2022-11-28 11:36:53 -08:00
John Wren Kennedy b0657a59ab ZTS: zts-report silently ignores perf test results
The regex used to extract test result information from a test run only
matches the functional tests. Update the regex so it matches both.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com>
Closes #14185
2022-11-18 11:43:18 -08:00
Ameer Hamza 3a74f488fc zed: post a udev change event from spa_vdev_attach()
In order for zed to process the removal event correctly,
udev change event needs to be posted to sync the blkid
information. spa_create() and spa_config_update() posts
the event already through spa_write_cachefile(). Doing
the same for spa_vdev_attach() that handles the case
for vdev attachment and replacement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14172
2022-11-18 11:39:59 -08:00
George Amanakis 3226e0dc8e Fix setting the large_block feature after receiving a snapshot
We are not allowed to dirty a filesystem when done receiving
a snapshot. In this case the flag SPA_FEATURE_LARGE_BLOCKS will
not be set on that filesystem since the filesystem is not on
dp_dirty_datasets, and a subsequent encrypted raw send will fail.
Fix this by checking in dsl_dataset_snapshot_sync_impl() if the feature
needs to be activated and do so if appropriate.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13699
Closes #13782
2022-11-18 11:38:37 -08:00
Laura Hild 99c0479a4e Correct multipathd.target to .service
https://github.com/openzfs/zfs/pull/9863 says it "orders
zfs-import-cache.service and zfs-import-scan.service after
multipathd.service" but the commit (79add96) actually
ordered them after .target.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Laura Hild <lsh@jlab.org>
Closes #12709
Closes #14171
2022-11-18 11:36:19 -08:00
Richard Yao 0a0166c975 FreeBSD: do_mount() passes wrong string length to helper
It should pass `MNT_LINE_MAX`, but passes `sizeof (mntpt)`. This is
harmless because the strlen is not actually used by the helper, but
FreeBSD's Coverity scans complained about it.

This was missed in my audit of various string functions since it is not
actually passed to a string function.

Upon review, it was noticed that the helper function does not need to be
a separate function, so I have inlined it as cleanup.

Reported-by: Coverity (CID 1432079)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: szubersk <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14136
2022-11-18 11:34:25 -08:00
Richard Yao 31247c78b1 FreeBSD: get_zfs_ioctl_version() should be cast to (void)
FreeBSD's Coverity scans complain that we ignore the return value. There
is no need to check the return value so we cast it to (void) to suppress
further complaints by static analyzers.

Reported-by: Coverity (CID 1018175)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: szubersk <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14136
2022-11-18 11:33:12 -08:00
szubersk 9e7fc5da38 Ubuntu 22.04 integration: GitHub workflows
- GitHub workflows are run on Ubuntu 22.04

- Extract the `checkstyle` workflow dependencies to a separate file.

- Refresh the `build-dependencies.txt` list.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
2022-11-18 11:27:03 -08:00
szubersk 32ef14de0f Ubuntu 22.04 integration: ZTS
Add `detect_odr_violation=1` to ASAN_OPTIONS to allow both libzfs
and libzpool expose

```
zfeature_info_t spa_feature_table[SPA_FEATURES]
```

from module/zcommon/zfeature_common.c in public ABI.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
2022-11-18 11:26:55 -08:00
szubersk 28ea4f9b08 Ubuntu 22.04 integration: Cppcheck
Suppress a false positive found by new Cppcheck version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
2022-11-18 11:26:50 -08:00
szubersk b46be903fb Ubuntu 22.04 integration: mancheck
Correct new mandoc errors.
```
STYLE: input text line longer than 80 bytes
STYLE: no blank before trailing delimiter
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
2022-11-18 11:26:41 -08:00
szubersk a5087965fe Ubuntu 22.04 integration: ShellCheck
- Add new SC2312 global exclude.
  ```
  Consider invoking this command separately to avoid masking its return
  value (or use '|| true' to ignore). [SC2312]
  ```

- Correct errors detected by new ShellCheck version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
2022-11-18 11:24:48 -08:00
Damian Szuberski c3b6fd3d59 Make autodetection disable pyzfs for kernel/srpm configurations
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13394
Closes #14178
2022-11-16 09:27:53 -08:00
Rich Ercolani 2163cde450 Handle and detect #13709's unlock regression (#14161)
In #13709, as in #11294 before it, it turns out that 63a26454 still had
the same failure mode as when it was first landed as d1d47691, and
fails to unlock certain datasets that formerly worked.

Rather than reverting it again, let's add handling to just throw out
the accounting metadata that failed to unlock when that happens, as
well as a test with a pre-broken pool image to ensure that we never get
bitten by this again.

Fixes: #13709

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2022-11-15 14:44:12 -08:00
shodanshok b445b25b27 Fix arc_p aggressive increase
The original ARC paper called for an initial 50/50 MRU/MFU split
and this is accounted in various places where arc_p = arc_c >> 1,
with further adjustment based on ghost lists size/hit. However, in
current code both arc_adapt() and arc_get_data_impl() aggressively
grow arc_p until arc_c is reached, causing unneeded pressure on
MFU and greatly reducing its scan-resistance until ghost list
adjustments kick in.

This patch restores the original behavior of initially having arc_p
as 1/2 of total ARC, without preventing MRU to use up to 100% total
ARC when MFU is empty.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #14137 
Closes #14120
2022-11-11 10:41:36 -08:00
Paul Dagnelie 9f4ede63d2 Add ability to recompress send streams with new compression algorithm
As new compression algorithms are added to ZFS, it could be useful for 
people to recompress data with new algorithms. There is currently no 
mechanism to do this aside from copying the data manually into a new 
filesystem with the new algorithm enabled. This tool allows the 
transformation to happen through zfs send, allowing it to be done 
efficiently to remote systems and in an incremental fashion.

A new zstream command is added that decompresses WRITE records and 
then recompresses them with a provided algorithm, and then re-emits 
the modified send stream. It may also be possible to re-compress 
embedded block pointers, but that was not attempted for the initial 
version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #14106
2022-11-10 15:23:46 -08:00
John Wren Kennedy e9ab9e512c ZTS: random_readwrite test doesn't run correctly
This test uses fio's bssplit mechanism to choose io sizes for the test,
leaving the PERF_IOSIZES variable empty. Because that variable is
empty, the innermost loop in do_fio_run_impl is never executed, and as
a result, this test does the setup but collects no data. Setting the
variable to "bssplit" allows performance data to be gathered.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com>
Closes #14163
2022-11-10 14:00:04 -08:00
Richard Yao b1eec00904 Cleanup: Suppress Coverity dereference before/after NULL check reports
f224eddf92 began dereferencing a NULL
checked pointer in zpl_vap_init(), which made Coverity complain because
either the dereference is unsafe or the NULL check is unnecessary. Upon
inspection, this pointer is guaranteed to never be NULL because it is
from the Linux kernel VFS. The calls into ZFS simply would not make
sense if this pointer were NULL, so the NULL check is unnecessary.

Reported-by: Coverity (CID 1527260)
Reported-by: Coverity (CID 1527262)
Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14170
2022-11-10 13:58:05 -08:00
Richard Yao 9e2be2dfbd Fix potential NULL pointer dereference regression
945b407486 neglected to `NULL` check
`tx->tx_objset`, which is already done in the function. This upset
Coverity, which complained about a "dereference after null check".

Upon inspection, it was found that whenever `dmu_tx_create_dd()` is
called followed by `dmu_tx_assign()`, such as in
`dsl_sync_task_common()`, `tx->tx_objset` will be `NULL`.

Reported-by: Coverity (CID 1527261)
Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14170
2022-11-10 13:56:28 -08:00
Mariusz Zaborski 16f0fdaddd Allow to control failfast
Linux defaults to setting "failfast" on BIOs, so that the OS will not
retry IOs that fail, and instead report the error to ZFS.

In some cases, such as errors reported by the HBA driver, not
the device itself, we would wish to retry rather than generating
vdev errors in ZFS. This new property allows that.

This introduces a per vdev option to disable the failfast option.
This also introduces a global module parameter to define the failfast
mask value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by: Seagate Technology LLC
Submitted-by: Klara, Inc.
Closes #14056
2022-11-10 13:37:12 -08:00
Mariusz Zaborski 945b407486 quota: disable quota check for ZVOL
The quota for ZVOLs is set to the size of the volume. When the quota
reaches the maximum, there isn't an excellent way to check if the new
writers are overwriting the data or if they are inserting a new one.
Because of that, when we reach the maximum quota, we wait till txg is
flushed. This is causing a significant fluctuation in bandwidth.

In the case of ZVOL, the quota is enforced by the volsize, so we
can omit it.

This commit adds a sysctl thats allow to control if the quota mechanism
should be enforced or not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by: Zededa Inc.
Sponsored-by: Klara Inc.
Closes #13838
2022-11-08 12:40:22 -08:00
Alan Somers e197bb24f1 Optionally skip zil_close during zvol_create_minor_impl
If there were no zil entries to replay, skip zil_close.  zil_close waits
for a transaction to sync.  That can take several seconds, for example
during pool import of a resilvering pool.  Skipping zil_close can cut
the time for "zpool import" from 2 hours to 45 seconds on a resilvering
pool with a thousand zvols.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Sponsored-by: Axcient
Closes #13999 
Closes #14015
2022-11-08 12:38:08 -08:00
youzhongyang f224eddf92 Support idmapped mount in user namespace
Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping 
infrastructure to support idmapped mounts of filesystems mounted 
with an idmapping". Update the OpenZFS accordingly to improve the 
idmapped mount support. 

This pull request contains the following changes:

- xattr setter functions are fixed to take mnt_ns argument. Without
  this, cp -p would fail for an idmapped mount in a user namespace.
- idmap_util is enhanced/fixed for its use in a user ns context.
- One test case added to test idmapped mount in a user ns.

Reviewed-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14097
2022-11-08 10:28:56 -08:00
Damian Szuberski 109731cd73 dsl_prop_known_index(): check for invalid prop
Resolve UBSAN array-index-out-of-bounds error in zprop_desc_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14142
Closes #14147
2022-11-08 10:16:01 -08:00
Mohamed Tawfik 41715771b5 Adds the -p option to zfs holds
This allows for printing a machine-readable, accurate to the second,
hold creation time in the form of a unix epoch timestamp.

Additionally, updates relevant documentation and man pages accordingly.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mohamed Tawfik <m_tawfik@aucegypt.edu>
Closes #13690
Closes #14152
2022-11-08 10:08:21 -08:00
Brooks Davis ecbf02791f freebsd: simplify MD isa_defs.h
Most of this file was a pile of defines, apparently from Solaris that
controlled nothing in the source tree.  A few things controlled the
definition of unused types or macros which I have removed.

Considerable further cleanup is possible including removal of
architectures FreeBSD never supported.  This file should likely converge
with the Linux version to the extent possible.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
2022-11-07 15:55:37 -08:00
Brooks Davis e3ba8eb12e freebsd: trim dkio.h to the minimum
Only DKIOCFLUSHWRITECACHE is required.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
2022-11-07 15:55:33 -08:00
Brooks Davis 20b867f5f7 freebsd: add ifdefs around legacy ioctl support
Require that ZFS_LEGACY_SUPPORT be defined for legacy ioctl support to
be built.  For now, define it in zfs_ioctl_compat.h so support is always
built.  This will allow systems that need never support pre-openzfs
tools a mechanism to remove support at build time.  This code should
be removed once the need for tool compatability is gone.

No functional change at this time.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
2022-11-07 15:55:26 -08:00
Brooks Davis 6c89cffc2c freebsd: remove no-op vn_renamepath()
vn_renamepath() is a Solaris-ism that was defined away in the FreeBSD
port.  Now that the only use is in the FreeBSD zfs_vnops_os.c, drop it
entierly.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
2022-11-07 15:55:20 -08:00
Brooks Davis 270b1b5fa7 freebsd: remove unused vn_rename()
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
2022-11-07 15:54:43 -08:00
Ameer Hamza c23738c70e zed: Prevent special vdev to be replaced by hot spare
Special vdevs should not be replaced by a hot spare.
Log vdevs already support this, extending the
functionality for special vdevs.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14129
2022-11-04 11:33:47 -07:00
Alexander Lobakin 73b8f700b6 icp: fix all !ENDBR objtool warnings in x86 Asm code
Currently, only Blake3 x86 Asm code has signs of being ENDBR-aware.
At least, under certain conditions it includes some header file and
uses some custom macro from there.
Linux has its own NOENDBR since several releases ago. It's defined
in the same <asm/linkage.h>, so currently <sys/asm_linkage.h>
already is provided with it.

Let's unify those two into one %ENDBR macro. At first, check if it's
present already. If so -- use Linux kernel version. Otherwise, try
to go that second way and use %_CET_ENDBR from <cet.h> if available.
If no, fall back to just empty definition.
This fixes a couple more 'relocations to !ENDBR' across the module.
And now that we always have the latest/actual ENDBR definition, use
it at the entrance of the few corresponding functions that objtool
still complains about. This matches the way how it's used in the
upstream x86 core Asm code.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035
2022-11-04 11:25:56 -07:00
Alexander Lobakin 61cca6fa05 icp: fix rodata being marked as text in x86 Asm code
objtool properly complains that it can't decode some of the
instructions from ICP x86 Asm code. As mentioned in the Makefile,
where those object files were excluded from objtool check (but they
can still be visible under IBT and LTO), those are just constants,
not code.
In that case, they must be placed in .rodata, so they won't be
marked as "allocatable, executable" (ax) in EFL headers and this
effectively prevents objtool from trying to decode this data. That
reveals a whole bunch of other issues in ICP Asm code, as previously
objtool was bailing out after that warning message.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035
2022-11-04 11:25:51 -07:00
Alexander Lobakin b844489ec0 icp: properly fix all RETs in x86_64 Asm code
Commit 43569ee374 ("Fix objtool: missing int3 after ret warning")
addressed replacing all `ret`s in x86 asm code to a macro in the
Linux kernel in order to enable SLS. That was done by copying the
upstream macro definitions and fixed objtool complaints.
Since then, several more mitigations were introduced, including
Rethunk. It requires to have a jump to one of the thunks in order
to work, so the RET macro was changed again. And, as ZFS code
didn't use the mainline defition, but copied it, this is currently
missing.

Objtool reminds about it time to time (Clang 16, CONFIG_RETHUNK=y):

fs/zfs/lua/zlua.o: warning: objtool: setjmp+0x25: 'naked' return
 found in RETHUNK build
fs/zfs/lua/zlua.o: warning: objtool: longjmp+0x27: 'naked' return
 found in RETHUNK build

Do it the following way:
* if we're building under Linux, unconditionally include
  <linux/linkage.h> in the related files. It is available in x86
  sources since even pre-2.6 times, so doesn't need any conftests;
* then, if RET macro is available, it will be used directly, so that
  we will always have the version actual to the kernel we build;
* if there's no such macro, we define it as a simple `ret`, as it
  was on pre-SLS times.

This ensures we always have the up-to-date definition with no need
to update it manually, and at the same time is safe for the whole
variety of kernels ZFS module supports.
Then, there's a couple more "naked" rets left in the code, they're
just defined as:

	.byte 0xf3,0xc3

In fact, this is just:

	rep ret

`rep ret` instead of just `ret` seems to mitigate performance issues
on some old AMD processors and most likely makes no sense as of
today.
Anyways, address those rets, so that they will be protected with
Rethunk and SLS. Include <sys/asm_linkage.h> here which now always
has RET definition and replace those constructs with just RET.
This wipes the last couple of places with unpatched rets objtool's
been complaining about.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035
2022-11-04 11:24:09 -07:00
Richard Yao 993ee7a006 FreeBSD: Fix out of bounds read in zfs_ioctl_ozfs_to_legacy()
There is an off by 1 error in the check. Fortunately, this function does
not appear to be used in kernel space, despite being compiled as part of
the kernel module. However, it is used in userspace. Callers of
lzc_ioctl_fd() likely will crash if they attempt to use the
unimplemented request number.

This was reported by FreeBSD's coverity scan.

Reported-by: Coverity (CID 1432059)
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14135
2022-11-04 11:06:14 -07:00
Serapheim Dimitropoulos f66ffe6878 Expose zfs_vdev_open_timeout_ms as a tunable
Some of our customers have been occasionally hitting zfs import failures
in Linux because udevd doesn't create the by-id symbolic links in time
for zpool import to use them. The main issue is that the
systemd-udev-settle.service that zfs-import-cache.service and other
services depend on is racy. There is also an openzfs issue filed (see
https://github.com/openzfs/zfs/issues/10891) outlining the problem and
potential solutions.

With the proper solutions being significant in terms of complexity and
the priority of the issue being low for the time being, this patch
exposes `zfs_vdev_open_timeout_ms` as a tunable so people that are
experiencing this issue often can increase it as a workaround.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14133
2022-11-03 15:02:46 -07:00
Allan Jude 595d3ac2ed Allow mounting snapshots in .zfs/snapshot as a regular user
Rather than doing a terrible credential swapping hack, we just
check that the thing being mounted is a snapshot, and the mountpoint
is the zfsctl directory, then we allow it.

If the mount attempt is from inside a jail, on an unjailed dataset
(mounted from the host, not by the jail), the ability to mount the
snapshot is controlled by a new per-jail parameter: zfs.mount_snapshot

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Modirum MDPay
Sponsored-by: Klara Inc.
Closes #13758
2022-11-03 11:53:24 -07:00
Richard Yao 11e3416ae7 Cleanup: Remove branches that always evaluate the same way
Coverity reported that the ASSERT in taskq_create() is always true and
the `*offp > MAXOFFSET_T` check in zfs_file_seek() is always false.

We delete them as cleanup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14130
2022-11-03 10:47:48 -07:00
Brooks Davis 1e1ce10e55 Remove an unused variable
Clang-16 detects this set-but-unused variable which is assigned and
incremented, but never referenced otherwise.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14125
2022-11-03 10:17:17 -07:00
Brooks Davis abb42dc5e1 Make 1-bit bitfields unsigned
This fixes -Wsingle-bit-bitfield-constant-conversion warning from
clang-16 like:

lib/libzfs/libzfs_dataset.c:4529:19: error: implicit truncation
  from 'int' to a one-bit wide bit-field changes value from
  1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
                flags.nounmount = B_TRUE;
				^ ~~~~~~

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14125
2022-11-03 10:16:16 -07:00
Richard Yao f47f6a055d Address warnings about possible division by zero from clangsa
* The complaint in ztest_replay_write() is only possible if something
   went horribly wrong. An assertion will silence this and if it goes
   off, we will know that something is wrong.
 * The complaint in spa_estimate_metaslabs_to_flush() is not impossible,
   but seems very unlikely. We resolve this by passing the value from
   the `MIN()` that does not go to infinity when the variable is zero.

There was a third report from Clang's scan-build, but that was a
definite false positive and disappeared when checked again through
Clang's static analyzer with Z3 refution via CodeChecker.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14124
2022-11-03 09:58:14 -07:00
Brooks Davis 27d29946be libuutil: deobfuscate internal pointers
uu_avl and uu_list stored internal next/prev pointers and parent
pointers (unused) obfuscated (byte swapped) to hide them from a long
forgotten leak checker (No one at the 2022 OpenZFS developers meeting
could recall the history.)  This would break on CHERI systems and adds
no obvious value.  Rename the members, use proper types rather than
uintptr_t, and eliminate the related macros.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14126
2022-11-03 09:57:05 -07:00
Attila Fülöp 211ec1b9fd Deny receiving into encrypted datasets if the keys are not loaded
Commit 68ddc06b61 introduced support
for receiving unencrypted datasets as children of encrypted ones but
unfortunately got the logic upside down. This resulted in failing to
deny receives of incremental sends into encrypted datasets without
their keys loaded. If receiving a filesystem, the receive was done
into a newly created unencrypted child dataset of the target. In
case of volumes the receive made the target volume undeletable since
a dataset was created below it, which we obviously can't handle.
Incremental streams with embedded blocks are affected as well.

We fix the broken logic to properly deny receives in such cases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13598
Closes #14055
Closes #14119
2022-11-03 09:55:13 -07:00
Brooks Davis 84477e148d lua: cast through uintptr_t when return a pointer
Don't assume size_t can carry pointer provenance and use uintptr_t
(identialy on all current platforms) instead.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
2022-11-03 09:52:28 -07:00
Brooks Davis b9041e1f27 Use intptr_t when storing an integer in a pointer
Cast the integer type to (u)intptr_t before casting to "void *".  In
CHERI C/C++ we warn on bare casts from integers to pointers to catch
attempts to create pointers our of thin air.  We allow the warning to be
supressed with a suitable cast through (u)intptr_t.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
2022-11-03 09:52:23 -07:00
Brooks Davis 877790001e recvd_props_mode: use a uintptr_t to stash nvlists
Avoid assuming than a uint64_t can hold a pointer.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
2022-11-03 09:52:19 -07:00
Brooks Davis 250b2bac78 zfs_onexit_add_cb: make action_handle point to a uintptr_t
Avoid assuming than a uint64_t can hold a pointer and reduce the
number of casts in the process.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
2022-11-03 09:52:12 -07:00
Brooks Davis d96303cb07 acl: use uintptr_t for ace walker cookies
Avoid assuming that a pointer can fit in a uint64_t and use uintptr_t
instead.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
2022-11-03 09:51:34 -07:00
Brooks Davis 7309e94239 linux isa_defs.h: Don't define _ALIGNMENT_REQUIRED
Nothing consumes this definition so stop defining it.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14128
2022-11-03 09:39:51 -07:00
Brooks Davis 5229071ba1 Improve RISC-V support
Check __riscv_xlen == 64 rather than _LP64 and define _LP64 if missing.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Brooks Davis <brooks.davis@sri.com>
Closes #14128
2022-11-03 09:39:28 -07:00
Richard Yao da3d266672 FreeBSD: Fix regression from kmem_scnprintf() in libzfs
kmem_scnprintf() is only available in libzpool. Recent buildbot issues
with showing FreeBSD results kept us from seeing this before
97143b9d31 was merged.

The code has been changed to sanitize the output from `kmem_scnprintf()`.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14111
2022-11-01 13:58:17 -07:00
Vince van Oosten fdc59cf563 include overrides for zfs snapshot/rollback bootfs.service
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me>
Closes #14075
Closes #14076
2022-11-01 12:23:58 -07:00
Vince van Oosten 59ca6e2ad0 include overrides for zfs-import.target
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me>
Closes #14075
Closes #14076
2022-11-01 12:23:51 -07:00
Vince van Oosten b10f73f78e include systemd overrides to zfs-dracut module
If a user that uses systemd and dracut wants to overide certain
settings, they typically use `systemctl edit [unit]` or place a file in
`/etc/systemd/system/[unit].d/override.conf` directly.

The zfs-dracut module did not include those overrides however, so this
did not have any effect at boot time.

For zfs-import-scan.service and zfs-import-cache.service, overrides are
now included in the dracut initramfs image.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me>
Closes #14075
Closes #14076
2022-11-01 12:23:44 -07:00
Ryan Moeller 748b9d5bda zil: Relax assertion in zil_parse
Rather than panic debug builds when we fail to parse a whole ZIL, let's
instead improve the logging of errors and continue like in a release
build.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #14116
2022-11-01 12:19:32 -07:00
youzhongyang 95055c2ce2 ZTS: rsend_009_pos.ksh is destructive on zfs-on-root system
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14113
2022-11-01 12:08:37 -07:00
Richard Yao dcce0dc5f0 Fix oversights from 4170ae4e
4170ae4ea6 was intended to tackle TOCTOU
race conditions reported by CodeQL, but as an oversight, a file
descriptor was not closed and some comments were not updated.
Interestingly, CodeQL did not complain about the file descriptor leak,
so there is room for improvement in how we configure it to try to detect
this issue so that we get early warning about this.

In addition, an optimization opportunity was missed by mistake in
lib/libshare/os/linux/smb.c, which prevented us from truly closing the
TOCTOU race. This was also caught by Coverity.

Reported-by: Coverity (CID 1524424)
Reported-by: Coverity (CID 1526804)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14109
2022-10-31 10:01:04 -07:00
Allan Jude b37d495e04 Avoid null pointer dereference in dsl_fs_ss_limit_check()
Check for cr == NULL before dereferencing it in
dsl_enforce_ds_ss_limits() to lookup the zone/jail ID.

Reported-by: Coverity (CID 1210459)
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14103
2022-10-29 13:08:54 -07:00
Richard Yao 97143b9d31 Introduce kmem_scnprintf()
`snprintf()` is meant to protect against buffer overflows, but operating
on the buffer using its return value, possibly by calling it again, can
cause a buffer overflow, because it will return how many characters it
would have written if it had enough space even when it did not. In a
number of places, we repeatedly call snprintf() by successively
incrementing a buffer offset and decrementing a buffer length, by its
return value. This is a potentially unsafe usage of `snprintf()`
whenever the buffer length is reached. CodeQL complained about this.

To fix this, we introduce `kmem_scnprintf()`, which will return 0 when
the buffer is zero or the number of written characters, minus 1 to
exclude the NULL character, when the buffer was too small. In all other
cases, it behaves like snprintf(). The name is inspired by the Linux and
XNU kernels' `scnprintf()`. The implementation was written before I
thought to look at `scnprintf()` and had a good name for it, but it
turned out to have identical semantics to the Linux kernel version.
That lead to the name, `kmem_scnprintf()`.

CodeQL only catches this issue in loops, so repeated use of snprintf()
outside of a loop was not caught. As a result, a thorough audit of the
codebase was done to examine all instances of `snprintf()` usage for
potential problems and a few were caught. Fixes for them are included in
this patch.

Unfortunately, ZED is one of the places where `snprintf()` is
potentially used incorrectly. Since using `kmem_scnprintf()` in it would
require changing how it is linked, we modify its usage to make it safe,
no matter what buffer length is used. In addition, there was a bug in
the use of the return value where the NULL format character was not
being written by pwrite(). That has been fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098
2022-10-29 13:05:11 -07:00
Richard Yao 2e08df84d8 Cleanup dump_bookmarks()
Assertions are meant to check assumptions, but the way that this
assertion is written does not check an assumption, since it is provably
always true. Removing the assertion will cause a compiler warning (made
into an error by -Werror) about printing up to 512 bytes to a 256-byte
buffer, so instead, we change the assertion to verify the assumption
that we never do a snprintf() that is truncated to avoid overrunning the
256-byte buffer.

This was caught by an audit of the codebase to look for misuse of
`snprintf()` after CodeQL reported that we had misused `snprintf()`. An
explanation of how snprintf() can be misused is here:

https://www.redhat.com/en/blog/trouble-snprintf

This particular instance did not misuse `snprintf()`, but it was caught
by the audit anyway.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098
2022-10-29 13:05:02 -07:00
Richard Yao d71d693261 Fix too few arguments to formatting function
CodeQL reported that when the VERIFY3U condition is false, we do not
pass enough arguments to `spl_panic()`. This is because the format
string from `snprintf()` was concatenated into the format string for
`spl_panic()`, which causes us to have an unexpected format specifier.

A CodeQL developer suggested fixing the macro to have a `%s` format
string that takes a stringified RIGHT argument, which would fix this.
However, upon inspection, the VERIFY3U check was never necessary in the
first place, so we remove it in favor of just calling `snprintf()`.

Lastly, it is interesting that every other static analyzer run on the
codebase did not catch this, including some that made an effort to catch
such things. Presumably, all of them relied on header annotations, which
we have not yet done on `spl_panic()`. CodeQL apparently is able to
track the flow of arguments on their way to annotated functions, which
llowed it to catch this when others did not. A future patch that I have
in development should annotate `spl_panic()`, so the others will catch
this too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098
2022-10-29 13:04:52 -07:00
Richard Yao 4170ae4ea6 Fix TOCTOU race conditions reported by CodeQL and Coverity
CodeQL and Coverity both complained about:

 * lib/libshare/os/linux/smb.c
 * tests/zfs-tests/cmd/mmapwrite.c
 	* twice
 * tests/zfs-tests/tests/functional/tmpfile/tmpfile_002_pos.c
 * tests/zfs-tests/tests/functional/tmpfile/tmpfile_stat_mode.c
	* coverity had a second complaint that CodeQL did not have
 * tests/zfs-tests/cmd/suid_write_to_file.c
	* Coverity had two complaints and CodeQL had one complaint, both
	  differed. The CodeQL complaint is about the main point of the
	  test, so it is not fixable without a hack involving `fork()`.

The issues reported by CodeQL are fixed, with the exception of the last
one, which is deemed to be a false positive that is too much trouble to
wrokaround. The issues reported by Coverity were only fixed if CodeQL
complained about them.

There were issues reported by Coverity in a number of other files that
were not reported by CodeQL, but fixing the CodeQL complaints is
considered a priority since we want to integrate it into a github
workflow, so the remaining Coverity complaints are left for future work.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098
2022-10-29 13:04:10 -07:00
Brian Behlendorf 82ad2a06ac Revert "Cleanup: Delete dead code from send_merge_thread()"
This reverts commit fb823de9f due to a regression.  It is in fact possible
for the range->eos_marker to be false on error.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14042
Closes #14104
2022-10-28 13:25:37 -07:00
Rob N ★ 5f0a48c7c9 debug: fix output from VERIFY0 assertion
The previous version reported all the right info, but the VERIFY3 name
made a little more confusing when looking for the matching location in
the source code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Rob N ★ <robn@despairlabs.com>
Closes #14099
2022-10-28 11:46:44 -07:00
Mariusz Zaborski 8af08a69cd quota: extend quota for dataset
This patch relax the quota limitation for dataset by around 3%.
What this means is that user can write more data then the quota is
set to. However thanks to that we can get more stable bandwidth, in
case when we are overwriting data in-place, and not consuming any
additional space.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Sponsored-by: Zededa Inc.
Sponsored-by: Klara Inc.
Closes #13839
2022-10-28 11:44:18 -07:00
shodanshok dc56c673e3 Fix ARC target collapse when zfs_arc_meta_limit_percent=100
Reclaim metadata when arc_available_memory < 0 even if
meta_used is not bigger than arc_meta_limit.

As described in https://github.com/openzfs/zfs/issues/14054 if
zfs_arc_meta_limit_percent=100 then ARC target can collapse to
arc_min due to arc_purge not freeing any metadata.

This patch lets arc_prune to do its work when arc_available_memory
is negative even if meta_used is not bigger than arc_meta_limit,
avoiding ARC target collapse.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #14054 
Closes #14093
2022-10-28 10:21:54 -07:00
vaclavskala 7822b50f54 Propagate extent_bytes change to autotrim thread
The autotrim thread only reads zfs_trim_extent_bytes_min and
zfs_trim_extent_bytes_max variable only on thread start.  We
should check for parameter changes during thread execution to
allow parameter changes take effect without needing to disable
then restart the autotrim.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Václav Skála <skala@vshosting.cz>
Closes #14077
2022-10-28 10:16:31 -07:00
Aleksa Sarai dbf6108b4d zfs_rename: support RENAME_* flags
Implement support for Linux's RENAME_* flags (for renameat2). Aside from
being quite useful for userspace (providing race-free ways to exchange
paths and implement mv --no-clobber), they are used by overlayfs and are
thus required in order to use overlayfs-on-ZFS.

In order for us to represent the new renameat2(2) flags in the ZIL, we
create two new transaction types for the two flags which need
transactional-level support (RENAME_EXCHANGE and RENAME_WHITEOUT).
RENAME_NOREPLACE does not need any ZIL support because we know that if
the operation succeeded before creating the ZIL entry, there was no file
to be clobbered and thus it can be treated as a regular TX_RENAME.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes #12209
Closes #14070
2022-10-28 09:49:20 -07:00
Aleksa Sarai e015d6cc0b zfs_rename: restructure to have cleaner fallbacks
This is in preparation for RENAME_EXCHANGE and RENAME_WHITEOUT support
for ZoL, but the changes here allow for far nicer fallbacks than the
previous implementation (the source and target are re-linked in case of
the final link failing).

In addition, a small cleanup was done for the "target exists but is a
different type" codepath so that it's more understandable.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes #12209
Closes #14070
2022-10-28 09:48:58 -07:00
Aleksa Sarai 7b3ba29654 debug: add VERIFY_{IMPLY,EQUIV} variants
This allows for much cleaner VERIFY-level assertions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes #14070
2022-10-28 09:48:43 -07:00
Pavel Snajdr 86db35c447 Remove zpl_revalidate: fix snapshot rollback
Open files, which aren't present in the snapshot, which is being
roll-backed to, need to disappear from the visible VFS image of
the dataset.

Kernel provides d_drop function to drop invalid entry from
the dcache, but inode can be referenced by dentry multiple dentries.

The introduced zpl_d_drop_aliases function walks and invalidates
all aliases of an inode.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #9600
Closes #14070
2022-10-28 09:47:19 -07:00
Andrew Innes e09fdda977 Fix multiplication converted to larger type
This fixes the instances of the "Multiplication result converted to 
larger type" alert that codeQL scanning found.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Closes #14094
2022-10-28 09:30:37 -07:00
youzhongyang 5d0fd8429b Fix zio_flag_t print format
Follow up for 4938d01d which changed zio_flag from enum to uint64_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14100
2022-10-28 09:08:12 -07:00
Damian Szuberski 1428ede0ba Process script directory for all configs
Even when only building kmods process the scripts directory.  This
way the common.sh script will be generated and the zfs.sh script
can be used to load/unload the in-tree kernel modules.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14027
Closes #14051
2022-10-27 16:45:14 -07:00
Umer Saleem 4d631a509d Add native Debian Packaging for Linux
Currently, the Debian packages are generated from ALIEN that converts
RPMs to Debian packages. This commit adds native Debian packaging for
Debian based systems.

This packaging is a fork of Debian zfs-linux 2.1.6-2 release.
(source: https://salsa.debian.org/zfsonlinux-team/zfs)

Some updates have been made to keep the footprint minimal that
include removing the tests, translation files, patches directory etc.
All credits go to Debian ZFS on Linux Packaging Team.

For copyright information, please refer to contrib/debian/copyright.

scripts/debian-packaging.sh can be used to invoke the build.

Reviewed-by: Mo Zhou <cdluminate@gmail.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13451
2022-10-27 15:38:45 -07:00
Richard Yao 4938d01db7 Convert enum zio_flag to uint64_t
We ran out of space in enum zio_flag for additional flags. Rather than
introduce enum zio_flag2 and then modify a bunch of functions to take a
second flags variable, we expand the type to 64 bits via `typedef
uint64_t zio_flag_t`.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Richard Yao <richard.yao@klarasystems.com>
Closes #14086
2022-10-27 09:54:54 -07:00
Richard Yao c8ae0ca11a Add CodeQL workflow
CodeQL is a static analyzer from github with a very low false positive
rate. We have long wanted to have static analysis runs done on every
pull request and using CodeQL, we can.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Andrew Innes <andrew.c12@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14087
2022-10-27 09:36:17 -07:00
Andrew Innes 07de86923b Aligned free for aligned alloc
Windows port frees memory that was alloc'd aligned in a different way
then alloc'd memory.  So changing frees to be specific.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Co-Authored-By: Jorgen Lundman <lundman@lundman.net>
Closes #14059
2022-10-26 15:08:31 -07:00
Andriy Gapon 41133c9794 FreeBSD: vn_flush_cached_data: observe vnode locking contract
vm_object_page_clean() expects that the associated vnode is locked
as VOP_PUTPAGES() may get called on the vnode.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #14079
2022-10-26 15:00:58 -07:00
Richard Yao eeddd80572 Silence objtool warnings from 55d7afa4
The use of __noreturn__ in 55d7afa4ad on
spl_panic() caused objtool warnings on Linux when the kernel is built
with CONFIG_STACK_VALIDATION=y. This patch works around that by
restricting the application of __noreturn__ to builds for static
analyzers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14068
2022-10-26 14:57:37 -07:00
Brian Behlendorf 871d66dbf2 Linux 6.0 compat: META
Update the META file to reflect compatibility with the 6.0 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #14091
2022-10-26 14:55:12 -07:00
Ameer Hamza 0b2428da20 zed: Avoid core dump if wholedisk property does not exist
zed aborts and dumps core in vdev_whole_disk_from_config() if
wholedisk property does not exist. make_leaf_vdev() adds the
property but there may be already pools that don't have the
wholedisk in the label.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14062
2022-10-21 10:46:38 -07:00
youzhongyang c5a388a1ef Add delay between zpool add zvol and zpool destroy
As investigated by #14026, the zpool_add_004_pos can reliably hang if 
the timing is not right. This is caused by a race condition between 
zed doing zpool reopen (due to the zvol being added to the zpool), 
and the command zpool destroy.

This change adds a delay between zpool add zvol and zpool destroy to
avoid these issue, but does not address the underlying problem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Issue #14026
Closes #14052
2022-10-21 10:05:13 -07:00
Richard Yao 72a366f018 Linux: Fix big endian and partial read bugs in get_system_hostid()
Coverity made two complaints about this function. The first is that we
ignore the number of bytes read. The second is that we have a sizeof
mismatch.

On 64-bit systems, long is a 64-bit type. Paradoxically, the standard
says that hostid is 32-bit, yet is also a long type. On 64-bit big
endian systems, reading into the long would cause us to return 0 as our
hostid after the mask. This is wrong.

Also, if a partial read were to happen (it should not), we would return
a partial hostid, which is also wrong.

We introduce a uint32_t system_hostid stack variable and ensure that the
read is done into it and check the read's return value. Then we set the
value based on whether the read was successful. This should fix both of
coverity's complaints.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13968
2022-10-20 14:52:35 -07:00
Richard Yao ab32a14b2e Silence new static analyzer defect reports from idmap_util.c
2a068a1394 introduced 2 new defect
reports from Coverity and 1 from Clang's static analyzer.

Coverity complained about a potential resource leak from only calling
`close(fd)` when `fd > 0` because `fd` might be `0`. This is a false
positive, but rather than dismiss it as such, we can change the
comparison to ensure that this never appears again from any static
analyzer. Upon inspection, 6 more instances of this were found in the
file, so those were changed too. Unfortunately, since the file
descriptor has been put into an unsigned variable in `attr.userns_fd`,
we cannot do a non-negative check on it to see if it has not been
allocated, so we instead restructure the error handling to avoid the
need for a check. This also means that errors had not been handled
correctly here, so the static analyzer found a bug (although practically
by accident).

Coverity also complained about a dereference before a NULL check in
`do_idmap_mount()` on `source`. Upon inspection, it appears that the
pointer is never NULL, so we delete the NULL check as cleanup.

Clang's static analyzer complained that the return value of
`write_pid_idmaps()` can be uninitialized if we have no idmaps to write.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14061
2022-10-20 14:46:12 -07:00
Richard Yao a06df8d7c1 Linux: Upgrade random_get_pseudo_bytes() to xoshiro256++ 1.0
The motivation for upgrading our PRNG is the recent buildbot failures in
the ZTS' tests/functional/fault/decompress_fault test. The probability
of a failure in that test is 0.8^256, which is ~1.6e-25 out of 1, yet we
have observed multiple test failures in it. This suggests a problem with
our random number generation.

The xorshift128+ generator that we were using has been replaced by newer
generators that have "better statistical properties". After doing some
reading, it turns out that these generators have "low linear complexity
of the lowest bits", which could explain the ZTS test failures.

We do two things to try to fix this:

	1. We upgrade from xorshift128+ to xoshiro256++ 1.0.

	2. We tweak random_get_pseudo_bytes() to copy the higher order
	   bytes first.

It is hoped that this will fix the test failures in
tests/functional/fault/decompress_fault, although I have not done
simulations. I am skeptical that any simulations I do on a PRNG with a
period of 2^256 - 1 would be meaningful.

Since we have raised the minimum kernel version to 3.10 since this was
first implemented, we have the option of using the Linux kernel's
get_random_int(). However, I am not currently prepared to do performance
tests to ensure that this would not be a regression (for the time
being), so we opt for upgrading our PRNG to a newer one from Sebastiano
Vigna.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13983
2022-10-20 14:14:42 -07:00
Alexander Motin 9dcdee7889 Optimize microzaps
Microzap on-disk format does not include a hash tree, expecting one to
be built in RAM during mzap_open().  The built tree is linked to DMU
user buffer, freed when original DMU buffer is dropped from cache. I've
found that workloads accessing many large directories and having active
eviction from DMU cache spend significant amount of time building and
then destroying the trees.  I've also found that for each 64 byte mzap
element additional 64 byte tree element is allocated, that is a waste
of memory and CPU caches.

Improve memory efficiency of the hash tree by switching from AVL-tree
to B-tree.  It allows to save 24 bytes per element just on pointers.
Save 32 bits on mze_hash by storing only upper 32 bits since lower 32
bits are always zero for microzaps.  Save 16 bits on mze_chunkid, since
microzap can never have so many elements.  Respectively with the 16 bits
there can be no more than 16 bits of collision differentiators.  As
result, struct mzap_ent now drops from 48 (rounded to 64) to 8 bytes.

Tune B-trees for small data.  Reduce BTREE_CORE_ELEMS from 128 to 126
to allow struct zfs_btree_core in case of 8 byte elements to pack into
2KB instead of 4KB.  Aside of the microzaps it should also help 32bit
range trees.  Allow custom B-tree leaf size to reduce memmove() time.

Split zap_name_alloc() into zap_name_alloc() and zap_name_init_str().
It allows to not waste time allocating/freeing memory when processing
multiple names in a loop during mzap_open().

Together on a pool with 10K directories of 1800 files each and DMU
cache limited to 128MB this reduces time of `find . -name zzz` by 41%
from 7.63s to 4.47s, and saves additional ~30% of CPU time on the DMU
cache reclamation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14039
2022-10-20 11:57:15 -07:00
Richard Yao 9650b35e95 Fix multiple definitions of struct mount_attr on recent glibc versions
The ifdef used would never work because the CPP is not aware of C
structure definitions. Rather than use an autotools check, we can just
use a nameless structure that we typedef to mount_attr_t. This is a
Linux kernel interface, which means that it is stable and this is fine
to do.
    
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14057
Closes #14058
2022-10-20 09:12:21 -07:00
Richard Yao 411d327c67 Add defensive assertion to vdev_queue_aggregate()
a6ccb36b94 had been intended to include
this to silence Coverity reports, but this one was missed by mistake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043
2022-10-19 17:11:06 -07:00
Richard Yao d692e6c36e abd_return_buf() should call zfs_refcount_remove_many() early
Calling zfs_refcount_remove_many() after freeing memory means we pass a
reference to freed memory as the holder. This is not believed to be able
to cause a problem, but there is a bit of a tradition of fixing these
issues when they appear so that they do not obscure more serious issues
in static analyzer output, so we fix this one too.

Clang's static analyzer found this with the help of CodeChecker's CTU
analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043
2022-10-19 17:11:01 -07:00
Richard Yao c77d2d7415 crypto_get_ptrs() should always write to *out_data_2
Callers will check if it has been set to NULL before trying to access
it, but never initialize it themselves. Whenever "one block spans two
iovecs", `crypto_get_ptrs()` will return, without ever setting
`*out_data_2 = NULL`. The caller will then do a NULL check against the
uninitailized pointer and if it is not zero, pass it to `memcpy()`.

The only reason this has not caused horrible runtime issues is because
`memcpy()` should be told to copy zero bytes when this happens. That
said, this is technically undefined behavior, so we should correct it so
that future changes to the code cannot trigger it.

Clang's static analyzer found this with the help of CodeChecker's CTU
analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043
2022-10-19 17:10:56 -07:00
Richard Yao 44f71818f8 Silence static analyzer warnings about spa_sync_props()
Both Coverity and Clang's static analyzer complain about reading an
uninitialized intval if the property is not passed as DATA_TYPE_UINT64
in the nvlist. This is impossible becuase spa_prop_validate() already
checked this, but they are unlikely to be the last static analyzers to
complain about this, so lets just refactor the code to suppress the
warnings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043
2022-10-19 17:10:52 -07:00
Richard Yao 4ecd96371b Fix theoretical use of uninitialized values
Clang's static analyzer complains about this.

In get_configs(), if we have an invalid configuration that has no top
level vdevs, we can read a couple of uninitialized variables. Aborting
upon seeing this would break the userland tools for healthy pools, so we
instead initialize the two variables to 0 to allow the userland tools to
continue functioning for the pools with valid configurations.

In zfs_do_wait(), if no wait activities are enabled, we read an
uninitialized error variable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14043
2022-10-19 17:10:21 -07:00
Richard Yao 219cf0f928 Fix userland memory leak in zfs_do_send()
Clang 15's static analyzer caught this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14045
2022-10-19 17:08:33 -07:00
Akash B 5405be0365 Add options to zfs redundant_metadata property
Currently, additional/extra copies are created for metadata in
addition to the redundancy provided by the pool(mirror/raidz/draid),
due to this 2 times more space is utilized per inode and this decreases
the total number of inodes that can be created in the filesystem. By
setting redundant_metadata to none, no additional copies of metadata
are created, hence can reduce the space consumed by the additional
metadata copies and increase the total number of inodes that can be
created in the filesystem.  Additionally, this can improve file create
performance due to the reduced amount of metadata which needs
to be written.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #13680
2022-10-19 17:07:51 -07:00
samwyc 2be0a124af Fix sequential resilver drive failure race condition
This patch handles the race condition on simultaneous failure of
2 drives, which misses the vdev_rebuild_reset_wanted signal in
vdev_rebuild_thread. We retry to catch this inside the
vdev_rebuild_complete_sync function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Samuel Wycliffe J <samwyc@hpe.com>
Closes #14041
Closes #14050
2022-10-19 15:48:13 -07:00
youzhongyang 2a068a1394 Support idmapped mount
Adds support for idmapped mounts.  Supported as of Linux 5.12 this 
functionality allows user and group IDs to be remapped without changing 
their state on disk.  This can be useful for portable home directories
and a variety of container related use cases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #12923
Closes #13671
2022-10-19 11:17:09 -07:00
Richard Yao eaaed26ffb Fix memory leaks in dmu_send()/dmu_send_obj()
If we encounter an EXDEV error when using the redacted snapshots
feature, the memory used by dspp.fromredactsnaps is leaked.

Clang's static analyzer caught this during an experiment in which I had
annotated various headers in an attempt to improve the results of static
analysis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13973
2022-10-18 16:03:33 -07:00
Richard Yao 84243acb91 Cleanup: Remove NULL pointer check from dmu_send_impl()
The pointer is to a structure member, so it is never NULL.

Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042
2022-10-18 15:40:04 -07:00
Richard Yao fb823de9fb Cleanup: Delete dead code from send_merge_thread()
range is always deferenced before it reaches this check, such that the
kmem_zalloc() call is never executed.

There is also no need to set `range->eos_marker = B_TRUE` because it is
already set.

Coverity incorrectly complained about a potential NULL pointer
dereference because of this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042
2022-10-18 15:39:56 -07:00
Richard Yao 717641ac09 Cleanup: zvol_add_clones() should not NULL check dp
It is never NULL because we return early if dsl_pool_hold() fails.

This caused Coverity to complain.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042
2022-10-18 15:39:48 -07:00
Richard Yao ef55679a75 Cleanup: metaslab_alloc_dva() should not NULL check mg->mg_next
This is a circularly linked list. mg->mg_next can never be NULL.

This caused 3 defect reports in Coverity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042
2022-10-18 15:39:40 -07:00
Richard Yao d953bcbf6b Cleanup: Delete unnecessary pointer check from vdev_to_nvlist_iter()
This confused Clang's static analyzer, making it think there was a
possible NULL pointer dereference. There is no NULL pointer dereference.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042
2022-10-18 15:39:32 -07:00
Richard Yao 9a8039439a Cleanup: Simplify userspace abd_free_chunks()
Clang's static analyzer complained that we could use after free here if
the inner loop ever iterated. That is a false positive, but upon
inspection, the userland abd_alloc_chunks() function never will put
multiple consecutive pages into a `struct scatterlist`, so there is no
need to loop. We delete the inner loop.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14042
2022-10-18 15:39:21 -07:00
Richard Yao 6ae2f90888 Fix possible NULL pointer dereference in sha2_mac_init()
If mechanism->cm_param is NULL, passing mechanism to
PROV_SHA2_GET_DIGEST_LEN() will dereference a NULL pointer.

Coverity reported this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044
2022-10-18 15:35:23 -07:00
Richard Yao c6b161e390 set_global_var() should not pass NULL pointers to dlclose()
Both Coverity and Clang's static analyzer caught this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044
2022-10-18 15:35:13 -07:00
Richard Yao 1bd02680c0 Fix NULL pointer dereference in spa_open_common()
Calling spa_open() will pass a NULL pointer to spa_open_common()'s
config parameter. Under the right circumstances, we will dereference the
config parameter without doing a NULL check.

Clang's static analyzer found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044
2022-10-18 15:34:54 -07:00
Richard Yao 3146fc7edf Fix NULL pointer passed to strlcpy from zap_lookup_impl()
Clang's static analyzer pointed out that whenever zap_lookup_by_dnode()
is called, we have the following stack where strlcpy() is passed a NULL
pointer for realname from zap_lookup_by_dnode():

strlcpy()
zap_lookup_impl()
zap_lookup_norm_by_dnode()
zap_lookup_by_dnode()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044
2022-10-18 15:34:44 -07:00
Richard Yao 711b35dc24 fm_fmri_hc_create() must call va_end() before returning
clang-tidy caught this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044
2022-10-18 15:34:36 -07:00
Richard Yao aa822e4d9c Fix NULL pointer dereference in zdb
Clang's static analyzer complained that we dereference a NULL pointer in
dump_path() if we return 0 when there is an error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14044
2022-10-18 15:34:24 -07:00
Richard Yao 09453dea6a ZED: Fix uninitialized value reads
Coverity complained about a couple of uninitialized value reads in ZED.

 * zfs_deliver_dle() can pass an uninitialized string to zed_log_msg()
 * An uninitialized sev.sigev_signo is passed to timer_create()

The former would log garbage while the latter is not a real issue, but
we might as well suppress it by initializing the field to 0 for
consistency's sake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14047
2022-10-18 12:42:14 -07:00
Coleman Kane ecb6a50819 Linux 6.1 compat: change order of sys/mutex.h includes
After Linux 6.1-rc1 came out, the build started failing to build a
couple of the files in the linux spl code due to the mutex_init
redefinition. Moving the sys/mutex.h include to a lower position within
these two files appears to fix the problem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #14040
2022-10-18 12:29:44 -07:00
Richard Yao d10bd7d288 Coverity model file update
Upon review, it was found that the model for malloc() was incorrect.

In addition, several general purpose memory allocation functions were
missing models:

 * kmem_vasprintf()
 * kmem_asprintf()
 * kmem_strdup()
 * kmem_strfree()
 * spl_vmem_alloc()
 * spl_vmem_zalloc()
 * spl_vmem_free()
 * calloc()

As an experiment to try to find more bugs, some less than general
purpose memory allocation functions were also given models:

 * zfsvfs_create()
 * zfsvfs_free()
 * nvlist_alloc()
 * nvlist_dup()
 * nvlist_free()
 * nvlist_pack()
 * nvlist_unpack()

Finally, the models were improved using additional coverity primitives:

 * __coverity_negative_sink__()
 * __coverity_writeall0__()
 * __coverity_mark_as_uninitialized_buffer__()
 * __coverity_mark_as_afm_allocated__()

In addition, an attempt to inform coverity that certain modelled
functions read entire buffers was used by adding the following to
certain models:

int first = buf[0];
int last = buf[buflen-1];

It was inspired by the QEMU model file.

No additional false positives were found by this, but it is believed
that the more accurate model file will help to catch false positives in
the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14048
2022-10-18 11:06:35 -07:00
Tino Reichardt 27218a32fc Fix declarations of non-global variables
This patch inserts the `static` keyword to non-global variables,
which where found by the analysis tool smatch.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13970
2022-10-18 11:05:32 -07:00
Alexander ab49df487b Linux compat: fix DECLARE_EVENT_CLASS() test when ZFS is built-in
ZFS_LINUX_TRY_COMPILE_HEADER macro doesn't take CONFIG_ZFS=y into
account. As a result, on several latest Linux versions, configure
script marks DECLARE_EVENT_CLASS() available for non-GPL when ZFS
is being built as a module, but marks it unavailable when ZFS is
built-in.
Follow the logic of the neighbor macros and adjust
ZFS_LINUX_TRY_COMPILE_HEADER accordingly, so that it doesn't try
to look for a .ko when ZFS is built-in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14006
2022-10-17 11:08:36 -07:00
Richard Yao 0aae8a449b Fix theoretical array overflow in lua_typename()
Out of the 12 defects in lua that coverity reports, 5 of them involve
`lua_typename()` and out of the dozens of defects in ZFS that lua
reports, 3 of them involve `lua_typename()` due to the ZCP code. Given
all of the uses of `lua_typename()` in the ZCP code, I was surprised
that there were not more. It appears that only 2 were reported because
only 3 called `lua_type()`, which does a defective sanity check that
allows invalid types to be passed.

lua/lua@d4fb848be7 addressed this in
upstream lua 5.3. Unfortunately, we did not get that fix since we use
lua 5.2 and we do not have assertions enabled in lua, so the upstream
solution would not do anything.

While we could adopt the upstream solution and enable assertions, a
simpler solution is to fix the issue by making `lua_typename()` return
`internal_type_error` whenever it is called with an invalid type. This
avoids the array overflow and if we ever see it appear somewhere, we
will know there is a problem with the lua interpreter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13947
2022-10-14 13:41:56 -07:00
Alan Somers a1034ee909 zstream: allow decompress to fix metadata for uncompressed records
If a record is uncompressed on-disk but the block pointer insists
otherwise, reading it will return EIO.  This commit adds an "off" type
to the "zstream decompress" command.  Using it will set the compression
field in a zfs stream to "off" without changing the record's data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alan Somers <asomers@FreeBSD.org>
Sponsored by:	Axcient
Closes #13997
2022-10-14 13:40:00 -07:00
Richard Yao 6a42939fcd Cleanup: Address Clang's static analyzer's unused code complaints
These were categorized as the following:

 * Dead assignment		23
 * Dead increment		4
 * Dead initialization		6
 * Dead nested assignment	18

Most of these are harmless, but since actual issues can hide among them,
we correct them.

That said, there were a few return values that were being ignored that
appeared to merit some correction:

 * `destroy_callback()` in `cmd/zfs/zfs_main.c` ignored the error from
   `destroy_batched()`. We handle it by returning -1 if there is an
   error.

 * `zfs_do_upgrade()` in `cmd/zfs/zfs_main.c` ignored the error from
   `zfs_for_each()`. We handle it by doing a binary OR of the error
   value from the subsequent `zfs_for_each()` call to the existing
   value. This is how errors are mostly handled inside `zfs_for_each()`.
   The error value here is passed to exit from the zfs command, so doing
   a binary or on it is better than what we did previously.

 * `get_zap_prop()` in `module/zfs/zcp_get.c` ignored the error from
   `dsl_prop_get_ds()` when the property is not of type string. We
   return an error when it does. There is a small concern that the
   `zfs_get_temporary_prop()` call would handle things, but in the case
   that it does not, we would be pushing an uninitialized numval onto
   the lua stack. It is expected that `dsl_prop_get_ds()` will succeed
   anytime that `zfs_get_temporary_prop()` does, so that not giving it a
   chance to fix things is not a problem.

 * `draid_merge_impl()` in `tests/zfs-tests/cmd/draid.c` used
   `nvlist_add_nvlist()` twice in ways in which errors are expected to
   be impossible, so we switch to `fnvlist_add_nvlist()`.

A few notable ones did not merit use of the return value, so we
suppressed it with `(void)`:

 * `write_free_diffs()` in `lib/libzfs/libzfs_diff.c` ignored the error
   value from `describe_free()`. A look through the commit history
   revealed that this was intentional.

 * `arc_evict_hdr()` in `module/zfs/arc.c` did not need to use the
   returned handle from `arc_hdr_realloc()` because it is already
   referenced in lists.

 * `spa_vdev_detach()` in `module/zfs/spa.c` has a comment explicitly
   saying not to use the error from `vdev_label_init()` because whatever
   causes the error could be the reason why a detach is being done.

Unfortunately, I am not presently able to analyze the kernel modules
with Clang's static analyzer, so I could have missed some cases of this.
In cases where reports were present in code that is duplicated between
Linux and FreeBSD, I made a conscious effort to fix the FreeBSD version
too.

After this commit is merged, regressions like dee8934 should become
extremely obvious with Clang's static analyzer since a regression would
appear in the results as the only instance of unused code. That assumes
that Coverity does not catch the issue first.

My local branch with fixes from all of my outstanding non-draft pull
requests shows 118 reports from Clang's static anlayzer after this
patch. That is down by 51 from 169.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13986
2022-10-14 13:37:54 -07:00
Richard Yao 19516b69ee Fix potential NULL pointer dereference in lzc_ioctl()
Users are allowed to pass NULL to resultp, but we unconditionally assume
that they never do. When an external user does pass NULL to resultp, we
dereference a NULL pointer.

Clang's static analyzer complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14008
2022-10-14 13:33:22 -07:00
Christian Schwarz 4d5aef3ba9 zfs_domount: fix double-disown of dataset / double-free of zfsvfs_t
Before this patch, in zfs_domount, if zfs_root or d_make_root fails, we
leave zfsvfs != NULL. This will lead to execution of the error handling
`if` statement at the `out` label, and hence to a call to
dmu_objset_disown and zfsvfs_free.

However, zfs_umount, which we call upon failure of zfs_root and
d_make_root already does dmu_objset_disown and zfsvfs_free.

I suppose this patch rather adds to the brittleness of this part of the
code base, but I don't want to invest more time in this right now.
To add a regression test, we'd need some kind of fault injection
facility for zfs_root or d_make_root, which doesn't exist right now.
And even then, I think that regression test would be too closely tied
to the implementation.

To repro the double-disown / double-free, do the following:
1. patch zfs_root to always return an error
2. mount a ZFS filesystem

Here's the stack trace you would see then:

  VERIFY3(ds->ds_owner == tag) failed (0000000000000000 == ffff9142361e8000)
  PANIC at dsl_dataset.c:1003:dsl_dataset_disown()
  Showing stack for process 28332
  CPU: 2 PID: 28332 Comm: zpool Tainted: G           O      5.10.103-1.nutanix.el7.x86_64 #1
  Call Trace:
   dump_stack+0x74/0x92
   spl_dumpstack+0x29/0x2b [spl]
   spl_panic+0xd4/0xfc [spl]
   dsl_dataset_disown+0xe9/0x150 [zfs]
   dmu_objset_disown+0xd6/0x150 [zfs]
   zfs_domount+0x17b/0x4b0 [zfs]
   zpl_mount+0x174/0x220 [zfs]
   legacy_get_tree+0x2b/0x50
   vfs_get_tree+0x2a/0xc0
   path_mount+0x2fa/0xa70
   do_mount+0x7c/0xa0
   __x64_sys_mount+0x8b/0xe0
   do_syscall_64+0x38/0x50
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Christian Schwarz <christian.schwarz@nutanix.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #14025
2022-10-14 11:46:47 -07:00
ColMelvin 642c2dee44 cstyle: Allow URLs in C++ comments
If a C++ comment contained a URL, the `://` part of the URL would
trigger an error because there was no trailing blank, but trailing
blanks make for an invalid URL.  Modify the check to ignore text
within the C++ comment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Closes #13987
2022-10-13 11:05:05 -07:00
Richard Yao ab8d9c1783 Cleanup: 64-bit kernel module parameters should use fixed width types
Various module parameters such as `zfs_arc_max` were originally
`uint64_t` on OpenSolaris/Illumos, but were changed to `unsigned long`
for Linux compatibility because Linux's kernel default module parameter
implementation did not support 64-bit types on 32-bit platforms. This
caused problems when porting OpenZFS to Windows because its LLP64 memory
model made `unsigned long` a 32-bit type on 64-bit, which created the
undesireable situation that parameters that should accept 64-bit values
could not on 64-bit Windows.

Upon inspection, it turns out that the Linux kernel module parameter
interface is extensible, such that we are allowed to define our own
types. Rather than maintaining the original type change via hacks to to
continue shrinking module parameters on 32-bit Linux, we implement
support for 64-bit module parameters on Linux.

After doing a review of all 64-bit kernel parameters (found via the man
page and also proposed changes by Andrew Innes), the kernel module
parameters fell into a few groups:

Parameters that were originally 64-bit on Illumos:

 * dbuf_cache_max_bytes
 * dbuf_metadata_cache_max_bytes
 * l2arc_feed_min_ms
 * l2arc_feed_secs
 * l2arc_headroom
 * l2arc_headroom_boost
 * l2arc_write_boost
 * l2arc_write_max
 * metaslab_aliquot
 * metaslab_force_ganging
 * zfetch_array_rd_sz
 * zfs_arc_max
 * zfs_arc_meta_limit
 * zfs_arc_meta_min
 * zfs_arc_min
 * zfs_async_block_max_blocks
 * zfs_condense_max_obsolete_bytes
 * zfs_condense_min_mapping_bytes
 * zfs_deadman_checktime_ms
 * zfs_deadman_synctime_ms
 * zfs_initialize_chunk_size
 * zfs_initialize_value
 * zfs_lua_max_instrlimit
 * zfs_lua_max_memlimit
 * zil_slog_bulk

Parameters that were originally 32-bit on Illumos:

 * zfs_per_txg_dirty_frees_percent

Parameters that were originally `ssize_t` on Illumos:

 * zfs_immediate_write_sz

Note that `ssize_t` is `int32_t` on 32-bit and `int64_t` on 64-bit. It
has been upgraded to 64-bit.

Parameters that were `long`/`unsigned long` because of Linux/FreeBSD
influence:

 * l2arc_rebuild_blocks_min_l2size
 * zfs_key_max_salt_uses
 * zfs_max_log_walking
 * zfs_max_logsm_summary_length
 * zfs_metaslab_max_size_cache_sec
 * zfs_min_metaslabs_to_flush
 * zfs_multihost_interval
 * zfs_unflushed_log_block_max
 * zfs_unflushed_log_block_min
 * zfs_unflushed_log_block_pct
 * zfs_unflushed_max_mem_amt
 * zfs_unflushed_max_mem_ppm

New parameters that do not exist in Illumos:

 * l2arc_trim_ahead
 * vdev_file_logical_ashift
 * vdev_file_physical_ashift
 * zfs_arc_dnode_limit
 * zfs_arc_dnode_limit_percent
 * zfs_arc_dnode_reduce_percent
 * zfs_arc_meta_limit_percent
 * zfs_arc_sys_free
 * zfs_deadman_ziotime_ms
 * zfs_delete_blocks
 * zfs_history_output_max
 * zfs_livelist_max_entries
 * zfs_max_async_dedup_frees
 * zfs_max_nvlist_src_size
 * zfs_rebuild_max_segment
 * zfs_rebuild_vdev_limit
 * zfs_unflushed_log_txg_max
 * zfs_vdev_max_auto_ashift
 * zfs_vdev_min_auto_ashift
 * zfs_vnops_read_chunk_size
 * zvol_max_discard_blocks

Rather than clutter the lists with commentary, the module parameters
that need comments are repeated below.

A few parameters were defined in Linux/FreeBSD specific code, where the
use of ulong/long is not an issue for portability, so we leave them
alone:

 * zfs_delete_blocks
 * zfs_key_max_salt_uses
 * zvol_max_discard_blocks

The documentation for a few parameters was found to be incorrect:

 * zfs_deadman_checktime_ms - incorrectly documented as int
 * zfs_delete_blocks - not documented as Linux only
 * zfs_history_output_max - incorrectly documented as int
 * zfs_vnops_read_chunk_size - incorrectly documented as long
 * zvol_max_discard_blocks - incorrectly documented as ulong

The documentation for these has been fixed, alongside the changes to
document the switch to fixed width types.

In addition, several kernel module parameters were percentages or held
ashift values, so being 64-bit never made sense for them. They have been
downgraded to 32-bit:

 * vdev_file_logical_ashift
 * vdev_file_physical_ashift
 * zfs_arc_dnode_limit_percent
 * zfs_arc_dnode_reduce_percent
 * zfs_arc_meta_limit_percent
 * zfs_per_txg_dirty_frees_percent
 * zfs_unflushed_log_block_pct
 * zfs_vdev_max_auto_ashift
 * zfs_vdev_min_auto_ashift

Of special note are `zfs_vdev_max_auto_ashift` and
`zfs_vdev_min_auto_ashift`, which were already defined as `uint64_t`,
and passed to the kernel as `ulong`. This is inherently buggy on big
endian 32-bit Linux, since the values would not be written to the
correct locations. 32-bit FreeBSD was unaffected because its sysctl code
correctly treated this as a `uint64_t`.

Lastly, a code comment suggests that `zfs_arc_sys_free` is
Linux-specific, but there is nothing to indicate to me that it is
Linux-specific. Nothing was done about that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Original-patch-by: Andrew Innes <andrew.c12@gmail.com>
Original-patch-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13984
Closes #14004
2022-10-13 10:03:29 -07:00
Richard Yao ff7a0a108f Linux: Remove ZFS_AC_KERNEL_SRC_MODULE_PARAM_CALL_CONST autotools check
On older kernels, the definition for `module_param_call()` typecasts
function pointers to `(void *)`, which triggers -Werror, causing the
check to return false when it should return true.

Fixing this breaks the build process on some older kernels because they
define a `__check_old_set_param()` function in their headers that checks
for a non-constified `->set()`. We workaround that through the c
preprocessor by defining `__check_old_set_param(set)` to `(set)`, which
prevents the build failures.

However, it is now apparent that all kernels that we support have
adopted the GRSecurity change, so there is no need to have an explicit
autotools check for it anymore. We therefore remove the autotools check,
while adding the workaround to our headers for the build time
non-constified `->set()` check done by older kernel headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13984
Closes #14004
2022-10-13 10:03:09 -07:00
наб 4e195e8254 etc: mask zfs-load-key.service
Otherwise, systemd-sysv-generator will generate a service equivalent
that breaks the boot: under systemd this is covered by
zfs-mount-generator

We already do this for zfs-import.service, and other init scripts are
suppressed automatically by the "actual" .service files

Fixes: commit f04b976200 ("Add init script
 to load keys")
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #14010
Closes #14019
2022-10-12 15:27:55 -07:00
George Melikov 87d5ff8ecd CI: bump actions/upload-artifact to v3
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #14018
2022-10-12 15:18:39 -07:00
George Melikov 4bbd7a0fbe CI: bump actions/checkout to v3
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #14018
2022-10-12 15:18:19 -07:00
Richard Yao a6ccb36b94 Add defensive assertions
Coverity complains about possible bugs involving referencing NULL return
values and division by zero. The division by zero bugs require that a
block pointer be corrupt, either from in-memory corruption, or on-disk
corruption. The NULL return value complaints are only bugs if
assumptions that we make about the state of data structures are wrong.
Some seem impossible to be wrong and thus are false positives, while
others are hard to analyze.

Rather than dismiss these as false positives by assuming we know better,
we add defensive assertions to let us know when our assumptions are
wrong.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13972
2022-10-12 11:25:18 -07:00
Richard Yao bfaa1d98f4 scripts/enum-extract.pl should not hard code perl path
This is a portability issue. The issue had already been fixed for
scripts/cstyle.pl by 2dbf1bf829.
scripts/enum-extract.pl was added to the repository the following year
without this portability fix.

Michael Bishop informed me that this broke his attempt to build ZFS
2.1.6 on NixOS, since he was building manually outside of their package
manager (that usually rewrites the shebangs to NixOS' unusual paths).
NixOS puts all of the paths into $PATH, so scripts that portably rely
on env to find the interpreter still work.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> 
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14012
2022-10-11 12:32:07 -07:00
Mark Johnston ed566bf1cd FreeBSD: Fix a pair of bugs in zfs_fhtovp()
- Add a zfs_exit() call in an error path, otherwise a lock is leaked.
- Remove the fid_gen > 1 check.  That appears to be Linux-specific:
  zfsctl_snapdir_fid() sets fid_gen to 0 or 1 depending on whether the
  snapshot directory is mounted.  On FreeBSD it fails, making snapshot
  dirs inaccessible via NFS.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Fixes: 43dbf88178 ("FreeBSD: vfsops: use setgen for error case")
Closes #14001
Closes #13974
2022-10-11 12:29:55 -07:00
Serapheim Dimitropoulos 4dcc2bde9c Stop ganging due to past vdev write errors
= Problem

While examining a customer's system we noticed unreasonable space
usage from a few snapshots due to gang blocks. Under some further
analysis we discovered that the pool would create gang blocks because
all its disks had non-zero write error counts and they'd be skipped
for normal metaslab allocations due to the following if-clause in
`metaslab_alloc_dva()`:
```
	/*
	 * Avoid writing single-copy data to a failing,
	 * non-redundant vdev, unless we've already tried all
	 * other vdevs.
	 */
	if ((vd->vdev_stat.vs_write_errors > 0 ||
	    vd->vdev_state < VDEV_STATE_HEALTHY) &&
	    d == 0 && !try_hard && vd->vdev_children == 0) {
		metaslab_trace_add(zal, mg, NULL, psize, d,
		    TRACE_VDEV_ERROR, allocator);
		goto next;
	}
```

= Proposed Solution

Get rid of the predicate in the if-clause that checks the past
write errors of the selected vdev. We still try to allocate from
HEALTHY vdevs anyway by checking vdev_state so the past write
errors doesn't seem to help us (quite the opposite - it can cause
issues in long-lived pools like the one from our customer).

= Testing

I first created a pool with 3 vdevs:
```
$ zpool list -v volpool
NAME        SIZE  ALLOC   FREE
volpool    22.5G   117M  22.4G
  xvdb     7.99G  40.2M  7.46G
  xvdc     7.99G  39.1M  7.46G
  xvdd     7.99G  37.8M  7.46G
```

And used `zinject` like so with each one of them:
```
$ sudo zinject -d xvdb -e io -T write -f 0.1 volpool
```

And got the vdevs to the following state:
```
$ zpool status volpool
  pool: volpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.
...<cropped>..
action: Determine if the device needs to be replaced, and clear the
...<cropped>..
config:

	NAME        STATE     READ WRITE CKSUM
	volpool     ONLINE       0     0     0
	  xvdb      ONLINE       0     1     0
	  xvdc      ONLINE       0     1     0
	  xvdd      ONLINE       0     4     0

```

I also double-checked their write error counters with sdb:
```
sdb> spa volpool | vdev | member vdev_stat.vs_write_errors
(uint64_t)0  # <---- this is the root vdev
(uint64_t)2
(uint64_t)1
(uint64_t)1
```

Then I checked that I the problem was reproduced in my VM as I the
gang count was growing in zdb as I was writting more data:
```
$ sudo zdb volpool | grep gang
        ganged count:              1384

$ sudo zdb volpool | grep gang
        ganged count:              1393

$ sudo zdb volpool | grep gang
        ganged count:              1402

$ sudo zdb volpool | grep gang
        ganged count:              1414
```

Then I updated my bits with this patch and the gang count stayed the
same.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14003
2022-10-11 12:27:41 -07:00
Richard Yao 70248b82e8 Fix uninitialized value read in vdev_prop_set()
If no errors are encountered, we read an uninitialized error value.

Clang's static analyzer complained about this.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14007
2022-10-11 12:24:36 -07:00
Serapheim Dimitropoulos e5646c5e37 zvol_wait logic may terminate prematurely
Setups that have a lot of zvols may see zvol_wait terminate prematurely
even though the script is still making progress.  For example, we have a
customer that called zvol_wait for ~7100 zvols and by the last iteration
of that script it was still waiting on ~2900. Similarly another one
called zvol_wait for 2200 and by the time the script terminated there
were only 50 left.

This patch adjusts the logic to stay within the outer loop of the script
if we are making any progress whatsoever.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #13998
2022-10-11 12:12:04 -07:00
Richard Yao 72c99dc959 Handle possible null pointers from malloc/strdup/strndup()
GCC 12.1.1_p20220625's static analyzer caught these.

Of the two in the btree test, one had previously been caught by Coverity
and Smatch, but GCC flagged it as a false positive. Upon examining how
other test cases handle this, the solution was changed from
`ASSERT3P(node, !=, NULL);` to using `perror()` to be consistent with
the fixes to the other fixes done to the ZTS code.

That approach was also used in ZED since I did not see a better way of
handling this there. Also, upon inspection, additional unchecked
pointers from malloc()/calloc()/strdup() were found in ZED, so those
were handled too.

In other parts of the code, the existing methods to avoid issues from
memory allocators returning NULL were used, such as using
`umem_alloc(size, UMEM_NOFAIL)` or returning `ENOMEM`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13979
2022-10-06 17:18:40 -07:00
Richard Yao 2ba240f358 PAM: Fix unchecked return value from zfs_key_config_load()
9a49c6b782 was intended to fix this issue,
but I had missed the case in pam_sm_open_session(). Clang's static
analyzer had not reported it and I forgot to look for other cases.

Interestingly, GCC gcc-12.1.1_p20220625's static analyzer had caught
this as multiple double-free bugs, since another failure after the
failure in zfs_key_config_load() will cause us to attempt to free the
memory that zfs_key_config_load() was supposed to allocate, but had
cleaned up upon failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13978
2022-10-05 17:09:24 -07:00
Jorgen Lundman 4b629d04a5 Avoid calling rw_destroy() on uninitialized rwlock
First the function `memset(&key, 0, ...)` but
any call to "goto error;" would call zio_crypt_key_destroy(key) which
calls `rw_destroy()`. The `rw_init()` is moved up to be right after the
memset. This way the rwlock can be released.

The ctx does allocate memory, but that is handled by the memset to 0
and icp skips NULL ptrs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #13976
2022-10-05 17:07:50 -07:00
shodanshok 062d3d056b Remove ambiguity on demand vs prefetch stats reported by arc_summary
arc_summary currently list prefetch stats as "demand prefetch"
However, a hit/miss can be due to demand or prefetch, not both.
To remove any confusion, this patch removes the "Demand" word
from the affected lines.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #13985
2022-10-04 11:00:02 -07:00
Finix1979 6694ca5539 Avoid unnecessary metaslab_check_free calling
The metaslab_check_free() function only needs to be called in the
GANG|DEDUP|etc case because zio_free_sync() will internally call
metaslab_check_free().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Finix1979 <yancw@info2soft.com>
Closes #13977
2022-10-04 10:55:35 -07:00
Umer Saleem 383c3eb33d Add membar_sync abi change
It appears membar_sync was not present in libzfs.abi with other
membar_* functions. This commit updates libzfs.abi for membar_sync.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13969
2022-10-04 09:54:58 -07:00
Umer Saleem d9ac17a57f Expose libzutil error info in libpc_handle_t
In libzutil, for zpool_search_import and zpool_find_config, we use
libpc_handle_t internally, which does not maintain error code and it is
not exposed in the interface. Due to this, the error information is not
propagated to the caller. Instead, an error message is printed on
stderr.

This commit adds lpc_error field in libpc_handle_t and exposes it in
the interface, which can be used by the users of libzutil to get the
appropriate error information and handle it accordingly.

Users of the API can also control if they want to print the error
message on stderr.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13969
2022-10-04 09:54:35 -07:00
Richard Yao d62bafee9f Fix memory leak found by GCC static analyzer
GCC 12.1.1_p20220625's -fanalyzer found and reported this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13975
2022-10-03 13:41:58 -07:00
Richard Yao 67395be0c2 Fix userland dereference NULL return value bugs
* `zstream_do_token()` does not handle failures from `libzfs_init()`

 * `ztest_global_vars_to_zdb_args()` does not handle failures from
   `calloc()`.

 * `zfs_snapshot_nvl()` will pass an offset to a NULL pointer as a
   source to `strlcpy()` if the provided nvlist is `NULL`.

We handle these by doing what the existing error handling does for other
errors involving these functions.

Coverity complained about these. It had complained about several more,
but one was fixed by 570ca4441e and
another was a false positive. The remaining complaints labelled
"dereferece null return vaue" involve fetching things stored in
in-kernel data structures via `list_head()/list_next()`,
`AVL_PREV()/AVL_NEXT()` and `zfs_btree_find()`. Most of them occur in
void functions that have no error handling. They are much harder to
analyze than the two fixed in this patch, so they are left for a
follow-up patch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13971
2022-09-30 17:02:57 -07:00
Richard Yao a36b37d4de Fix potential NULL pointer dereference in dsl_dataset_promote_check()
If the `list_head()` returns NULL, we dereference it, right before we
check to see if it returned NULL.

We have defined two different pointers that both point to the same
thing, which are `origin_head` and `origin_ds`. Almost everything uses
`origin_ds`, so we switch them to use `origin_ds`.

We also promote `origin_ds` to a const pointer so that the compiler
verifies that nothing modifies it.

Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13967
2022-09-30 16:59:51 -07:00
Tino Reichardt a2d5643f88 Fix double const qualifier declarations
Some header files define structures like this one:

typedef const struct zio_checksum_info {
	/* ... */
	const char	*ci_name;
} zio_abd_checksum_func_t;

So we can use `zio_abd_checksum_func_t` for const declarations now.
It's not needed that we use the `const` qualifier again like this:
`const zio_abd_checksum_func_t *varname;`

This patch solves the double const qualifiers, which were found by
smatch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13961
2022-09-30 15:34:39 -07:00
Richard Yao 55d7afa4ad Reduce false positives from Static Analyzers
Both Clang's Static Analyzer and Synopsys' Coverity would ignore
assertions. Following Clang's advice, we annotate our assertions:

https://clang-analyzer.llvm.org/annotations.html#custom_assertions

This makes both Clang's Static Analyzer and Coverity properly identify
assertions. This change reduced Clang's reported defects from 246 to
180. It also reduced the false positives reported by Coverityi by 10,
while enabling Coverity to find 9 more defects that previously were
false negatives.

A couple examples of this would be CID-1524417 and CID-1524423. After
submitting a build to coverity with the modified assertions, CID-1524417
disappeared while the report for CID-1524423 no longer claimed that the
assertion tripped.

Coincidentally, it turns out that it is possible to more accurately
annotate our headers than the Coverity modelling file permits in the
case of format strings. Since we can do that and this patch annotates
headers whenever `__coverity_panic__()` would have been used in the
model file, we drop all models that use `__coverity_panic__()` from the
model file.

Upon seeing the success in eliminating false positives involving
assertions, it occurred to me that we could also modify our headers to
eliminate coverity's false positives involving byte swaps. We now have
coverity specific byteswap macros, that do nothing, to disable
Coverity's false positives when we do byte swaps. This allowed us to
also drop the byteswap definitions from the model file.

Lastly, a model file update has been done beyond the mentioned
deletions:

 * The definitions of `umem_alloc_aligned()`, `umem_alloc()` andi
   `umem_zalloc()` were originally implemented in a way that was
   intended to inform coverity that when KM_SLEEP has been passed these
   functions, they do not return NULL. A small error in how this was
   done was found, so we correct it.

 * Definitions for umem_cache_alloc() and umem_cache_free() have been
   added.

In practice, no false positives were avoided by making these changes,
but in the interest of correctness from future coverity builds, we make
them anyway.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13902
2022-09-30 15:30:12 -07:00
Richard Yao dee8934e8f Fix unreachable code in zstreamdump
82226e4f44 was intended to prevent a
warning from being printed in situations where it was inappropriate, but
accidentally disabled it entirely by setting featureflags in the wrong
case statement.

Coverity reported this as dead code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13946
2022-09-29 10:16:37 -07:00
Serapheim Dimitropoulos 4acc36ed7c Fix panic in dsl_process_sub_livelist for EINTR
= Issue

Recently we hit an assertion panic in `dsl_process_sub_livelist` while
exporting the spa and interrupting `bpobj_iterate_nofree`. In that case
`bpobj_iterate_nofree` stops mid-way returning an EINTR without clearing
the intermediate AVL tree that keeps track of the livelist entries it
has encountered so far. At that point the code has a VERIFY for the
number of elements of the AVL expecting it to be zero (which is not the
case for EINTR).

= Fix

Cleanup any intermediate state before destroying the AVL when
encountering EINTR. Also added a comment documenting the scenario where
the EINTR comes up. There is no need to do anything else for the calles
of `dsl_process_sub_livelist` as they already handle the EINTR case.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #13939
2022-09-29 09:39:48 -07:00
Richard Yao 1b87195c3c Fix unchecked return values
2a493a4c71 was intended to fix all
instances of coverity reported unchecked return values, but
unfortunately, two were missed by mistake. This commit fixes the
unchecked return values that had been missed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13945
2022-09-29 09:02:57 -07:00
Richard Yao 570ca4441e Miscellaneous ZTS fixes
Coverity had various complaints about minor issues. They are all fairly
straightforward to understand without reading additional files, with the
exception of the draid.c issue. vdev_draid_rand() takes a 128-bit
starting seed, but we were passing a pointer to a 64-bit value, which
understandably made Coverity complain. This is perhaps the only
significant issue fixed in this patch, since it causes stack corruption.

These are not all of the issues in the ZTS that Coverity caught, but a
number of them are already fixed in other PRs. There is also a class of
TOUTOC complaints that involve very minor things in the ZTS (e.g.
access() before unlink()). I have yet to decide whether they are false
positives (since this is not security sensitive code) or something to
cleanup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13943
2022-09-29 08:56:42 -07:00
Ameer Hamza 55c12724d3 zed: mark disks as REMOVED when they are removed
ZED does not take any action for disk removal events if there is no
spare VDEV available. Added zpool_vdev_remove_wanted() in libzfs
and vdev_remove_wanted() in vdev.c to remove the VDEV through ZED
on removal event.  This means that if you are running zed and
remove a disk, it will be properly marked as REMOVED.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13797
2022-09-28 09:48:46 -07:00
Mateusz Guzik eb9bec0a5d Bring per_txg_dirty_frees_percent back to 30
The current value causes significant artificial slowdown during mass
parallel file removal, which can be observed both on FreeBSD and Linux
when running real workloads.

Sample results from Linux doing make -j 96 clean after an allyesconfig
modules build:

before: 4.14s user 6.79s system 48% cpu 22.631 total
after:	4.17s user 6.44s system 153% cpu 6.927 total

FreeBSD results in the ticket.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Mateusz Guzik <mjguzik@gmail.com>
Closes #13932
Closes #13938
2022-09-27 17:38:03 -07:00
Toomas Soome af65073a07 btree_test: smatch did detect few issues
Add missing header.
Properly ignore return values.

Memory leak/unchecked malloc. We do allocate a bit too early (and
fail to validate the result). From this, smatch is angry when we
overwrite the value of 'node' later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #13941
2022-09-27 17:09:21 -07:00
Christian Schwarz e872ea16f2 DMU_BACKUP_FEATURE: indicate that bit 28 and 29 are reserved
Bit 28 is used by an internal Nutanix feature which might be
upstreamed in the future.

Bit 29 is the last unused bit. It is reserved to indicate a
to-be-designed extension to the stream format which will accomodate
more feature flags.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Issue #13795
Closes #13796
2022-09-27 16:55:32 -07:00
Christian Schwarz 5c9666382a DMU_BACKUP_FEATURE: remove unused BLAKE3 feature
Commit 985c33b132 added DMU_BACKUP_FEATURE_BLAKE3 but it is not used by
the code.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Issue #13795
Closes #13796
2022-09-27 16:53:40 -07:00
Richard Yao 9a49c6b782 PAM: Fix uninitialized value read
Clang's static analyzer found that config.uid is uninitialized when
zfs_key_config_load() returns an error.

Oddly, this was not included in the unchecked return values that
Coverity found.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13957
2022-09-27 16:48:35 -07:00
Richard Yao a51288aabb Fix unsafe string operations
Coverity caught unsafe use of `strcpy()` in `ztest_dmu_objset_own()`,
`nfs_init_tmpfile()` and `dump_snapshot()`. It also caught an unsafe use
of `strlcat()` in `nfs_init_tmpfile()`.

Inspired by this, I did an audit of every single usage of `strcpy()` and
`strcat()` in the code. If I could not prove that the usage was safe, I
changed the code to use either `strlcpy()` or `strlcat()`, depending on
which function was originally used. In some cases, `snprintf()` was used
to replace multiple uses of `strcat` because it was cleaner.

Whenever I changed a function, I preferred to use `sizeof(dst)` when the
compiler is able to provide the string size via that. When it could not
because the string was passed by a caller, I checked the entire call
tree of the function to find out how big the buffer was and hard coded
it. Hardcoding is less than ideal, but it is safe unless someone shrinks
the buffer sizes being passed.

Additionally, Coverity reported three more string related issues:

 * It caught a case where we do an overlapping memory copy in a call to
   `snprintf()`. We fix that via `kmem_strdup()` and `kmem_strfree()`.

 * It caught `sizeof (buf)` being used instead of `buflen` in
   `zdb_nicenum()`'s call to `zfs_nicenum()`, which is passed to
   `snprintf()`. We change that to pass `buflen`.

 * It caught a theoretical unterminated string passed to `strcmp()`.
   This one is likely a false positive, but we have the information
   needed to do this more safely, so we change this to silence the false
   positive not just in coverity, but potentially other static analysis
   tools too. We switch to `strncmp()`.

 * There was a false positive in tests/zfs-tests/cmd/dir_rd_update.c. We
   suppress it by switching to `snprintf()` since other static analysis
   tools might complain about it too. Interestingly, there is a possible
   real bug there too, since it assumes that the passed directory path
   ends with '/'. We add a '/' to fix that potential bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13913
2022-09-27 16:47:24 -07:00
Richard Yao 88b199c24e Cleanup spa_export_common()
Coverity complains about a possible NULL pointer dereference. This is
impossible, but it suspects it because we do a NULL check against
`spa->spa_root_vdev`. This NULL check was never necessary and makes the
code harder to understand, so we drop it.

In particular, we dereference `spa->spa_root_vdev` when `new_state !=
POOL_STATE_UNINITIALIZED && !hardforce`. The first is only true when
spa_reset is called, which only occurs under fault injection.  The
second is true unless `zpool export -F $POOLNAME` is used.  Therefore,
we effectively *always* dereference the pointer. In the cases where we
do not, there is no reason to think it is unsafe.  Therefore this change
is safe to make.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13905
2022-09-27 16:45:51 -07:00
Richard Yao 31b4e008f1 LUA: Fix CVE-2014-5461
Apply the fix from upstream.

http://www.lua.org/bugs.html#5.2.2-1
https://www.opencve.io/cve/CVE-2014-5461

It should be noted that exploiting this requires the `SYS_CONFIG`
privilege, and anyone with that privilege likely has other opportunities
to do exploits, so it is unlikely that bad actors could exploit this
unless system administrators are executing untrusted ZFS Channel
Programs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13949
2022-09-27 16:44:13 -07:00
Richard Yao fdc2d30371 Cleanup: Specify unsignedness on things that should not be signed
In #13871, zfs_vdev_aggregation_limit_non_rotating and
zfs_vdev_aggregation_limit being signed was pointed out as a possible
reason not to eliminate an unnecessary MAX(unsigned, 0) since the
unsigned value was assigned from them.

There is no reason for these module parameters to be signed and upon
inspection, it was found that there are a number of other module
parameters that are signed, but should not be, so we make them unsigned.
Making them unsigned made it clear that some other variables in the code
should also be unsigned, so we also make those unsigned. This prevents
users from setting negative values that could potentially cause bad
behaviors. It also makes the code slightly easier to understand.

Mostly module parameters that deal with timeouts, limits, bitshifts and
percentages are made unsigned by this. Any that are boolean are left
signed, since whether booleans should be considered signed or unsigned
does not matter.

Making zfs_arc_lotsfree_percent unsigned caused a
`zfs_arc_lotsfree_percent >= 0` check to become redundant, so it was
removed. Removing the check was also necessary to prevent a compiler
error from -Werror=type-limits.

Several end of line comments had to be moved to their own lines because
replacing int with uint_t caused us to exceed the 80 character limit
enforced by cstyle.pl.

The following were kept signed because they are passed to
taskq_create(), which expects signed values and modifying the
OpenSolaris/Illumos DDI is out of scope of this patch:

	* metaslab_load_pct
	* zfs_sync_taskq_batch_pct
	* zfs_zil_clean_taskq_nthr_pct
	* zfs_zil_clean_taskq_minalloc
	* zfs_zil_clean_taskq_maxalloc
	* zfs_arc_prune_task_threads

Also, negative values in those parameters was found to be harmless.

The following were left signed because either negative values make
sense, or more analysis was needed to determine whether negative values
should be disallowed:

	* zfs_metaslab_switch_threshold
	* zfs_pd_bytes_max
	* zfs_livelist_min_percent_shared

zfs_multihost_history was made static to be consistent with other
parameters.

A number of module parameters were marked as signed, but in reality
referenced unsigned variables. upgrade_errlog_limit is one of the
numerous examples. In the case of zfs_vdev_async_read_max_active, it was
already uint32_t, but zdb had an extern int declaration for it.

Interestingly, the documentation in zfs.4 was right for
upgrade_errlog_limit despite the module parameter being wrongly marked,
while the documentation for zfs_vdev_async_read_max_active (and friends)
was wrong. It was also wrong for zstd_abort_size, which was unsigned,
but was documented as signed.

Also, the documentation in zfs.4 incorrectly described the following
parameters as ulong when they were int:

	* zfs_arc_meta_adjust_restarts
	* zfs_override_estimate_recordsize

They are now uint_t as of this patch and thus the man page has been
updated to describe them as uint.

dbuf_state_index was left alone since it does nothing and perhaps should
be removed in another patch.

If any module parameters were missed, they were not found by `grep -r
'ZFS_MODULE_PARAM' | grep ', INT'`. I did find a few that grep missed,
but only because they were in files that had hits.

This patch intentionally did not attempt to address whether some of
these module parameters should be elevated to 64-bit parameters, because
the length of a long on 32-bit is 32-bit.

Lastly, it was pointed out during review that uint_t is a better match
for these variables than uint32_t because FreeBSD kernel parameter
definitions are designed for uint_t, whose bit width can change in
future memory models.  As a result, we change the existing parameters
that are uint32_t to use uint_t.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13875
2022-09-27 16:42:41 -07:00
Richard Yao 7584fbe846 Cleanup: Switch to strlcpy from strncpy
Coverity found a bug in `zfs_secpolicy_create_clone()` where it is
possible for us to pass an unterminated string when `zfs_get_parent()`
returns an error. Upon inspection, it is clear that using `strlcpy()`
would have avoided this issue.

Looking at the codebase, there are a number of other uses of `strncpy()`
that are unsafe and even when it is used safely, switching to
`strlcpy()` would make the code more readable. Therefore, we switch all
instances where we use `strncpy()` to use `strlcpy()`.

Unfortunately, we do not portably have access to `strlcpy()` in
tests/zfs-tests/cmd/zfs_diff-socket.c because it does not link to
libspl. Modifying the appropriate Makefile.am to try to link to it
resulted in an error from the naming choice used in the file. Trying to
disable the check on the file did not work on FreeBSD because Clang
ignores `#undef` when a definition is provided by `-Dstrncpy(...)=...`.
We workaround that by explictly including the C file from libspl into
the test. This makes things build correctly everywhere.

We add a deprecation warning to `config/Rules.am` and suppress it on the
remaining `strncpy()` usage. `strlcpy()` is not portably avaliable in
tests/zfs-tests/cmd/zfs_diff-socket.c, so we use `snprintf()` there as a
substitute.

This patch does not tackle the related problem of `strcpy()`, which is
even less safe. Thankfully, a quick inspection found that it is used far
more correctly than strncpy() was used. A quick inspection did not find
any problems with `strcpy()` usage outside of zhack, but it should be
said that I only checked around 90% of them.

Lastly, some of the fields in kstat_t varied in size by 1 depending on
whether they were in userspace or in the kernel. The origin of this
discrepancy appears to be 04a479f706 where
it was made for no apparent reason. It conflicts with the comment on
KSTAT_STRLEN, so we shrink the kernel field sizes to match the userspace
field sizes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13876
2022-09-27 16:35:29 -07:00
Jitendra Patidar 3ed9d6883b Enforce "-F" flag on resuming recv of full/newfs on existing dataset
When receiving full/newfs on existing dataset, then it should be done
with "-F" flag. Its enforced for initial receive in checks done in
zfs_receive_one function of libzfs. Similarly, on resuming full/newfs
recv on existing dataset, it should be done with "-F" flag.

When dataset doesn't exist, then full/new recv is done on newly created
dataset and it's marked INCONSISTENT. But when receiving on existing
dataset, recv is first done on %recv and its marked INCONSISTENT.
Existing dataset is not marked INCONSISTENT. Resume of full/newfs
receive with dataset not INCONSISTENT indicates that its resuming newfs
on existing dataset. So, enforce "-F" flag in this case.

Also return an error from dmu_recv_resume_begin_check() in zfs kernel,
when its resuming full/newfs recv without force.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #13856
Closes #13857
2022-09-27 16:34:27 -07:00
Richard Yao a2163a96ae Fix bad free in skein code
Clang's static analyzer found a bad free caused by skein_mac_atomic().
It will allocate a context on the stack and then pass it to
skein_final(), which attempts to free it. Upon inspection,
skein_digest_atomic() also has the same problem.

These functions were created to match the OpenSolaris ICP API, so I was
curious how we avoided this in other providers and looked at the SHA2
code. It appears that SHA2 has a SHA2Final() helper function that is
called by the exported sha2_mac_final()/sha2_digest_final() as well as
the sha2_mac_atomic() and sha2_digest_atomic() functions. The real work
is done in SHA2Final() while some checks and the free are done in
sha2_mac_final()/sha2_digest_final().

We fix the use after free in the skein code by taking inspiration from
the SHA2 code. We introduce a skein_final_nofree() that does most of the
work, and make skein_final() into a function that calls it and then
frees the memory.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13954
2022-09-27 12:36:58 -07:00
Richard Yao f7bda2de97 Fix userspace memory leaks found by Clang Static Analzyer
Recently, I have been making a push to fix things that coverity found.
However, I was curious what Clang's static analyzer reported, so I ran
it and found things that coverity had missed.

* contrib/pam_zfs_key/pam_zfs_key.c: If prop_mountpoint is passed more
  than once, we leak memory.
* module/zfs/zcp_get.c: We leak memory on temporary properties in
  userspace.
* tests/zfs-tests/cmd/draid.c: On error from vdev_draid_rand(), we leak
  memory if best_map had been allocated by a prior iteration.
* tests/zfs-tests/cmd/mkfile.c: Memory used by the loop is not freed
  before program termination.

Arguably, these are all minor issues, but if we ignore them, then they
could obscure serious bugs, so we fix them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13955
2022-09-26 17:18:05 -07:00
Chris Zubrzycki 5e7a2f4665 Update zfs-mount to load before fstab, matches systemd service.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Zubrzycki <github@mid-earth.net>
Closes #13895
2022-09-26 17:11:43 -07:00
Richard Yao 8ef15f9322 Cleanup: Remove ineffective unsigned comparisons against 0
Coverity found a number of places where we either do MAX(unsigned, 0) or
do assertions that a unsigned variable is >= 0. These do nothing, so
let us drop them all.

It also found a spot where we do `if (unsigned >= 0 && ...)`. Let us
also drop the unsigned >= 0 check.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13871
2022-09-26 17:02:38 -07:00
Richard Yao 52afc3443d Linux: Fix uninitialized variable usage in zio_do_crypt_data()
Coverity complained about this. An error from `hkdf_sha512()` before uio
initialization will cause pointers to uninitialized memory to be passed
to `zio_crypt_destroy_uio()`. This is a regression that was introduced
by cf63739191. Interestingly, this never
affected FreeBSD, since the FreeBSD version never had that patch ported.
Since moving uio initialization to the top of this function would slow
down the qat_crypt() path, we only move the `memset()` calls to the top
of the function. This is sufficient to fix this problem.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13944
2022-09-26 16:44:22 -07:00
Tino Reichardt bf5b42f9c8 Fix double declaration of getauxval() for FreeBSD PPC
The extern declaration is only for Linux, move this line
into the right #ifdef section.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Martin Matuska <mm@FreeBSD.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13934
Closes #13936
2022-09-26 10:32:22 -07:00
Richard Yao ebe1d03616 Fix userland resource leaks
Coverity caught these. With the exception of the file descriptor leak in
tests/zfs-tests/cmd/draid.c, they are all memory leaks.

Also, there is a piece of dead code in zfs_get_enclosure_sysfs_path().
We delete it as cleanup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13921
2022-09-23 16:55:26 -07:00
Richard Yao 2a493a4c71 Fix unchecked return values and unused return values
Coverity complained about unchecked return values and unused values that
turned out to be unused return values.

Different approaches were used to handle the different cases of
unchecked return values:

* cmd/zdb/zdb.c: VERIFY0 was used in one place since the existing code
  had no error handling. An error message was printed in another to
  match the rest of the code.

* cmd/zed/agents/zfs_retire.c: We dismiss the return value with `(void)`
  because the value is expected to be potentially unset.

* cmd/zpool_influxdb/zpool_influxdb.c: We dismiss the return value with
  `(void)` because the values are expected to be potentially unset.

* cmd/ztest.c: VERIFY0 was used since we want failures if something goes
  wrong in ztest.

* module/zfs/dsl_dir.c: We dismiss the return value with `(void)`
  because there is no guarantee that the zap entry will always be there.
  For example, old pools imported readonly would not have it and we do
  not want to fail here because of that.

* module/zfs/zfs_fm.c: `fnvlist_add_*()` was used since the
  allocations sleep and thus can never fail.

* module/zfs/zvol.c: We dismiss the return value with `(void)` because
  we do not need it. This matches what is already done in the analogous
  `zfs_replay_write2()`.

* tests/zfs-tests/cmd/draid.c: We suppress one return value with
  `(void)` since the code handles errors already. The other return value
  is handled by switching to `fnvlist_lookup_uint8_array()`.

* tests/zfs-tests/cmd/file/file_fadvise.c: We add error handling.

* tests/zfs-tests/cmd/mmap_sync.c: We add error handling for munmap, but
  ignore failures on remove() with (void) since it is expected to be
  able to fail.

* tests/zfs-tests/cmd/mmapwrite.c: We add error handling.

As for unused return values, they were all in places where there was
error handling, so logic was added to handle the return values.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13920
2022-09-23 16:52:03 -07:00
Richard Yao d25153d555 set_global_var_parse_kv() should pass the pointer from strdup()
A comment says that the caller should free k_out, but the pointer passed
via k_out is not the same pointer we received from strdup(). Instead,
it is a pointer into the region we received from strdup(). The free
function should always be called with the original pointer, so this is
likely a bug.

We solve this by calling `strdup()` a second time and then freeing the
original pointer.

Coverity reported this as a memory leak.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13867
2022-09-23 10:51:14 -07:00
Tony Hutter e9b12d4196 zpool: Don't print "repairing" on force faulted drives
If you force fault a drive that's resilvering, it's scan stats can get
frozen in time, giving the false impression that it's being resilvered.
This commit checks the vdev state to see if the vdev is healthy before
reporting "resilvering" or "repairing" in zpool status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13927
Closes #13930
2022-09-23 10:24:19 -07:00
John Wren Kennedy ce55d6ae46 ZTS: fallocate tests fail with hard coded values
Currently, these two tests pass on disks with 512 byte sectors. In
environments where the backing store is different, the number of
blocks allocated to write the same file may differ. This change
modifies the reported size check to detect an expected change in the
reported number of blocks without specifying a particular number.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes  #13931
2022-09-22 16:42:34 -06:00
Brian Behlendorf 505df8d133 Dynamically size dbuf hash mutex array
Incorrectly sizing the array of hash locks used to protect the
dbuf hash table can lead to contention and reduce performance.
We could unconditionally allocate a larger array for the locks
but it's wasteful, particularly for a low-memory system.
Instead, dynamically allocate the array of locks and scale
it based on total system memory.

Additionally, add a new `dbuf_mutex_cache_shift` module option
which can be used to override the hash lock array size.  This is
disabled by default (dbuf_mutex_hash_shift=0) and can only be
set at module load time.  The minimum target array size is set
to 8192, this matches the current constant value.

Note that the count of the dbuf hash table and count of the
mutex array were added to the /proc/spl/kstat/zfs/dbufstats
kstat.

Finally, this change removes the _KERNEL conditional checks.
These were not required since for the user space build there
is no difference between the kmem and vmem interfaces.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13928
2022-09-22 12:59:56 -07:00
Brian Behlendorf 223b04d23d Revert "Reduce dbuf_find() lock contention"
This reverts commit 34dbc618f5.  While this
change resolved the lock contention observed for certain workloads, it
inadventantly reduced the maximum hash inserts/removes per second.  This
appears to be due to the slightly higher acquisition cost of a rwlock vs
a mutex.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2022-09-22 12:59:41 -07:00
Richard Yao e506a0ce40 Cleanup: Change 1 used in bitshifts to 1ULL
Coverity complains about this. It is not a bug as long as we never shift
by more than 31, but it is not terrible to change the constants from 1
to 1ULL as clean up.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13914
2022-09-22 11:28:33 -07:00
Mateusz Guzik c629f0bf62 Retire ZFS_TEARDOWN_TRY_ENTER_READ
There were never any users and it so happens the operation is not even
supported by rrm locks -- the macros were wrong for Linux and FreeBSD
when not using it's RMS locks.
    
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13906
2022-09-20 15:34:41 -07:00
Mateusz Guzik 402426c7d8 Add membar_sync
Provides the missing full barrier variant to the membar primitive set.

While not used right now, this is probably going to change down the
road.

Name taken from Solaris, to follow the existing routines.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13907
2022-09-20 15:32:44 -07:00
youzhongyang 62e2a2881f Fix minor issues in namespace delegation support
get_user_ns() is only done once for each namespace, so put_user_ns() 
should be done once too.
    
Fix two typos in user_namespace/user_namespace_002.ksh and 
user_namespace/user_namespace_003.ksh.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #13918
2022-09-20 15:25:21 -07:00
Mateusz Guzik fbf874a4ac FreeBSD: handle V_PCATCH
See https://cgit.FreeBSD.org/src/commit/?id=a75d1ddd74312f5dd79bc1e965f7077679659f2e

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13910
2022-09-20 15:22:32 -07:00
Mateusz Guzik 3e5caef4c5 FreeBSD: catch up to 1400068
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13909
2022-09-20 15:21:30 -07:00
Richard Yao 7c6d94728c Call va_end() before return in zpool_standard_error_fmt()
Commit ecd6cf800b63704be73fb264c3f5b6e0dafc068d by marks in OpenSolaris
at Tue Jun 26 07:44:24 2007 -0700 introduced a bug where we fail to call
`va_end()` before returning.

The man page for va_start() says:

"Each invocation of va_start() must be matched by a corresponding
invocation of va_end() in the same function."

Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13904
2022-09-20 15:20:56 -07:00
Richard Yao de6c0d3d8c Fix potential NULL pointer dereference in zfsdle_vdev_online()
Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13903
2022-09-20 15:20:04 -07:00
Ameer Hamza c50b3f14d3 Delay ZFS_PROP_SHARESMB property to handle it for encrypted raw receive
For encrypted raw receive, objset creation is delayed until a call to
dmu_recv_stream(). ZFS_PROP_SHARESMB property requires objset to be
populated when calling zpl_earlier_version(). To correctly handle the
ZFS_PROP_SHARESMB property for encrypted raw receive, this change
delays setting the property.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13878
2022-09-20 15:19:05 -07:00
Richard Yao 3f400b0f58 FreeBSD: Cleanup zfs_readdir()
The FreeBSD project's coverity scans found dead code in `zfs_readdir()`.
Also, the comment above `zfs_readdir()` is out of date.

I fixed the comment and deleted all of the dead code, plus additional
dead code that was found upon review.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13924
2022-09-20 14:50:16 -07:00
Richard Yao 9276e202eb FreeBSD: Fix uninitialized pointer read in spa_import_rootpool()
The FreeBSD project's coverity scans found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13923
2022-09-20 14:43:03 -07:00
Richard Yao e8bdc74528 Cleanup: Remove unused uu_pname code
Coverity caught a possible NULL pointer dereference in dead code. We can
delete it all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13900
2022-09-19 17:33:52 -07:00
Richard Yao f272960d52 Fix usage of zed_log_msg() and zfs_panic_recover()
Coverity complained about the format specifiers not matching variables.
In one case, the variable is a constant, so we fix it. In another, we
were missing an argument (about which coverity also complained).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13888
2022-09-19 17:32:18 -07:00
Richard Yao 891ac937be Linux: Fix use-after-free in zfsvfs_create()
Coverity reported that we pass a pointer to zfsvfs to
`dmu_objset_disown()` after freeing zfsvfs in zfsvfs_create_impl() after
a failure in zfsvfs_init().

We have nearly identical duplicate versions of this code for FreeBSD and
Linux, but interestingly, the FreeBSD version of this code differs in
such a way that it does not suffer from this bug. We remove the
difference from the FreeBSD version to fix this bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13883
2022-09-19 17:30:58 -07:00
Martin Matuška 042d43a1dd FreeBSD: fix static module build broken in 7bb707ffa
param_set_arc_free_target(SYSCTL_HANDLER_ARGS) and
param_set_arc_no_grow_shift(SYSCTL_HANDLER_ARGS) defined in
sysctl_os.c must be made available to arc_os.c.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #13915
2022-09-19 17:21:45 -07:00
Mateusz Guzik 9a671fe7ec FreeBSD: stop passing LK_INTERLOCK to VOP_LOCK
There is an ongoing effort to eliminate this feature.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13908
2022-09-19 17:17:27 -07:00
Tino Reichardt 48cf170d5a Add PPC cpu feature tests for FreeBSD and Linux
Add needed cpu feature tests for powerpc architecture.

Overview:
zfs_altivec_available() - needed by RAID-Z
zfs_vsx_available()     - needed by BLAKE3
zfs_isa207_available()  - needed by SHA2

Part 1 - Userspace
- use getauxval() for Linux and elf_aux_info() for FreeBSD
- direct including <sys/auxv.h> fails with double definitions
- so we self define the needed functions and definitions

Part 2 - Kernel space FreeBSD
- use exported cpu_features of <powerpc/cpu.h>

Part 3 - Kernel space Linux
- use cpu_has_feature() function of <asm/cpufeature.h>

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13725
2022-09-16 14:25:53 -07:00
Tino Reichardt eeca9d27d7 Add zfs_blake3_impl to zfs.4
The zfs module parameter zfs_blake3_impl got no manual page entry while
adding BLAKE3 to OpenZFS. This commit adds the required notes about the
parameter into zfs.4

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Ryan Moeller <ryan@freqlabs.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13725
2022-09-16 14:25:53 -07:00
Tino Reichardt 75e8b5ad84 Fix BLAKE3 tuneable and module loading on Linux and FreeBSD
Apply similar options to BLAKE3 as it is done for zfs_fletcher_4_impl.

The zfs module parameter on Linux changes from icp_blake3_impl to
zfs_blake3_impl.

You can check and set it on Linux via sysfs like this:
```
[bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl
cycle [fastest] generic sse2 sse41 avx2

[bash]# echo sse2 > /sys/module/zfs/parameters/zfs_blake3_impl
[bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl
cycle fastest generic [sse2] sse41 avx2
```

The modprobe module parameters may also be used now:
```
[bash]# modprobe zfs zfs_blake3_impl=sse41
[bash]# cat /sys/module/zfs/parameters/zfs_blake3_impl
cycle fastest generic sse2 [sse41] avx2
```

On FreeBSD the BLAKE3 implementation can be set via sysctl like this:
```
[bsd]# sysctl vfs.zfs.blake3_impl
vfs.zfs.blake3_impl: cycle [fastest] generic sse2 sse41 avx2
[bsd]# sysctl vfs.zfs.blake3_impl=sse2
vfs.zfs.blake3_impl: cycle [fastest] generic sse2 sse41 avx2 \
  -> cycle fastest generic [sse2] sse41 avx2
```

This commit changes also some Blake3 internals like these:
- blake3_impl_ops_t was renamed to blake3_ops_t
- all functions are named blake3_impl_NAME() now

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13725
2022-09-16 14:25:53 -07:00
Brian Behlendorf 7dee043af5 zfs_enter rework followup
The zpl_fadvise() function was recently added and was not included
in the initial patch.  Update it accordingly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13831
2022-09-16 14:25:53 -07:00
Richard Yao 4df8ccc83d Fix null pointer dereferences in PAM
Coverity caught these.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13889
2022-09-16 14:02:54 -07:00
наб 6c8e9f09c2 Handle ECKSUM as new EZFS_CKSUM ‒ "insufficient replicas"
Add a meaningful error message for ECKSUM to common error messages.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #6805 
Closes #13808
Closes #13898
2022-09-16 13:59:25 -07:00
Ameer Hamza 577d41d3b2 zfs recv hangs if max recordsize is less than received recordsize
- Some optimizations for bqueue enqueue/dequeue.
- Added a fix to prevent deadlock when both bqueue_enqueue_impl()
and bqueue_dequeue() waits for signal to be triggered.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13855
2022-09-16 13:52:25 -07:00
Richard Yao 8da218a7a2 Update coverity model
`uu_panic()` needs to be modelled and the definition of `vpanic()` from
the original coverity model was missing
`__coverity_format_string_sink__()`.

We also model `libspl_assertf()` as part of an attempt to eliminate
false positives.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13901
2022-09-16 13:45:15 -07:00
Chunwei Chen 1b6f3368dd Fix unable to export zpool without nfs-utils
Don't return error in nfs_disable_share when nfs is not available, since
it wouldn't have been able to share in the first place.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #13534
Closes #13800
2022-09-16 13:43:26 -07:00
Chunwei Chen 768eacedef zfs_enter rework
Replace ZFS_ENTER and ZFS_VERIFY_ZP, which have hidden returns, with
functions that return error code. The reason we want to do this is
because hidden returns are not obvious and had caused some missing fail
path unwinding.

This patch changes the common, linux, and freebsd parts. Also fixes
fail path unwinding in zfs_fsync, zpl_fsync, zpl_xattr_{list,get,set}, and
zfs_lookup().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #13831
2022-09-16 13:36:47 -07:00
Richard Yao b24d1c77f7 Add zfs_btree_verify_intensity kernel module parameter
I see a few issues in the issue tracker that might be aided by being
able to turn this on. We have no module parameter for it, so I would
like to add one.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13874
2022-09-15 16:22:33 -07:00
Richard Yao ddb1fd91c0 Fix incorrect size given to bqueue_enqueue() call in dmu_redact.c
We pass sizeof (struct redact_record *) rather than sizeof (struct
redact_record). Passing the pointer size is wrong.

Coverity caught this in two places.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13885
2022-09-15 16:21:21 -07:00
Mateusz Piotrowski fa22ec569c Use correct mdoc macros for arguments
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Closes #13890
2022-09-15 14:22:00 -07:00
Richard Yao e949d36040 Fix assertions in crypto reference helpers
The assertions are racy and the use of `membar_exit()` did nothing to
fix that.

The helpers use atomic functions, so we cleverly get values from the
atomics that we can use to ensure that the assertions operate on the
correct values.

We also use `membar_producer()` prior to decrementing reference counts
so that operations that happened prior to a decrement to 0 will be
guaranteed to happen before the decrement on architectures that reorder
atomics.

This also slightly improves performance by eliminating unnecessary
reads, although I doubt it would be measurable in any benchmark.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13880
2022-09-15 13:24:00 -07:00
John Wren Kennedy dc2fe24ca2 ZTS: parameter expansion in zfs_unshare_006_pos
zfs_unshare_006 checks to see if a dataset still has an active SMB
share after doing an NFS unshare -a. The test could fail because the
check for the SMB share does not expect dashes in a dataset name to be
converted to underscores as pathname delimiters are.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #13893
2022-09-15 13:14:35 -07:00
Richard Yao 621a7ebe58 Add coverity model to repository
Other projects such as the python project include their coverity models
in their repositories. This provides transparency, which is beneficial
in open source projects. Therefore, it is a good idea to include the
coverity model in our repository too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13884
2022-09-15 11:50:19 -07:00
Richard Yao fd8c3012b3 Fix use-after-free bugs in icp code
These were reported by Coverity as "Read from pointer after free" bugs.
Presumably, it did not report it as a use-after-free bug because it does
not understand the inline assembly that implements the atomic
instruction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13881
2022-09-15 11:46:42 -07:00
George Melikov 6f8602a5ed CI: revert --with-config=dist to hotfix Ubuntu 20.04
Recently Github action runners started to fail on kmod build.  
Revert --with-config=dist from ./configure section of github 
runners to stabilize CI for now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #13894
2022-09-14 16:26:57 -07:00
Richard Yao ccec88f11a FreeBSD: Fix integer conversion for vnlru_free{,_vfsops}()
When reviewing #13875, I noticed that our FreeBSD code has an issue
where it converts from `int64_t` to `int` when calling
`vnlru_free{,_vfsops}()`. The result is that if the int64_t is `1 <<
36`, the int will be 0, since the low bits are 0. Even when some low
bits are set, a value such as `((1 << 36) + 1)` would truncate to 1,
which is wrong.

There is protection against this on 32-bit platforms, but on 64-bit
platforms, there is no check to protect us, so we add a check.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13882
2022-09-14 12:51:55 -07:00
Richard Yao 4a6e8b99f5 Add assertion to dsl_dataset_set_compression_sync
Coverity pointed out that if we somehow receive SPA_FEATURE_NONE, we
will use a negative number as an array index. A defensive assertion
seems appropriate.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13872
2022-09-14 12:50:03 -07:00
Richard Yao d954ca19ba Fix theoretical "use-after-free" in dbuf_prefetch_indirect_done()
Coverity complains about a "use-after-free" bug in
`dbuf_prefetch_indirect_done()` because we use a pointer value after
freeing its buffer. The pointer is used for refcounting in ARC (as the
reference holder). There is a theoretical situation where the pointer
would be reused in a way that causes the refcounting to collide, so we
change the order in which we call arc_buf_destroy() and
dbuf_prefetch_fini() to match the rest of the function. This prevents
the theoretical situation from being a possibility.

Also, we have a few return statements with a value, despite this being a
void function. We clean those up while we are making changes here.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13869
2022-09-13 17:58:29 -07:00
Richard Yao fcd7293d4e Remove incorrect free() in zfs_get_pci_slots_sys_path()
Coverity found this. We attempted to free tmp, which is a pointer to a
string that should be freed by the caller.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13864
2022-09-13 17:00:53 -07:00
Richard Yao cf66e7e594 Cleanup: Make memory barrier definitions consistent across kernels
We inherited membar_consumer() and membar_producer() from OpenSolaris,
but we had replaced membar_consumer() with Linux's smp_rmb() in
zfs_ioctl.c. The FreeBSD SPL consequently implemented a shim for the
Linux-only smp_rmb().

We reinstate membar_consumer() in platform independent code and fix the
FreeBSD SPL to implement membar_consumer() in a way analogous to Linux.

Reviewed-by: Konstantin Belousov <kib@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13843
2022-09-13 16:59:33 -07:00
Richard Yao 8fdc229a9c Fix memory leak in ztest
Coverity found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13863
2022-09-13 16:53:21 -07:00
Richard Yao d5d10f2aef Cleanup dead spa_boot code
Unused code detected by coverity.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13868
2022-09-13 16:40:10 -07:00
Richard Yao 710fd1ded6 zpool_load_compat() should create strings of length ZFS_MAXPROPLEN
Otherwise, `strlcat()` can overflow them.

Coverity found this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13866
2022-09-12 12:54:43 -07:00
Richard Yao e5327e7f97 vdev_draid_lookup_map() should not iterate outside draid_maps
Coverity reported this as an out-of-bounds read.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13865
2022-09-12 12:51:17 -07:00
Richard Yao 7195c04d98 Fix file descriptor handling in zdb_copy_object()
Coverity found a file descriptor leak. Eyeballing it showed that we had
no handling for the `open()` call failing either. We can address both of
these at once.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13862
2022-09-12 12:34:10 -07:00
Richard Yao 13f2b8fb92 Fix use-after-free in btree code
Coverty static analysis found these.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #10989
Closes #13861
2022-09-12 11:22:15 -07:00
Richard Yao 0e4c830bc1 Cleanup: Use OpenSolaris functions to call scheduler
In our codebase, `cond_resched() and `schedule()` are Linux kernel
functions that have replaced the OpenSolaris `kpreempt()` functions in
the codebase to such an extent that `kpreempt()` in zfs_context.h was
broken. Nobody noticed because we did not actually use it. The header
had defined `kpreempt()` as `yield()`, which works on OpenSolaris and
Illumos where `sched_yield()` is a wrapper for `yield()`, but that does
not work on any other platform.

The FreeBSD platform specific code implemented shims for these, but the
shim for `schedule()` forced us to wait, which is different than merely
rescheduling to another thread as the original Linux code does, while
the shim for `cond_resched()` had the same definition as its kernel
kpreempt() shim.

After studying this, I have concluded that we should reintroduce the
kpreempt() function in platform independent code with the following
definitions:

	- In the Linux kernel:
		kpreempt(unused)	-> cond_resched()

	- In the FreeBSD kernel:
		kpreempt(unused)	-> kern_yield(PRI_USER)

	- In userspace:
		kpreempt(unused)	-> sched_yield()

In userspace, nothing changes from this cleanup. In the kernels, the
function `fm_fini()` will now call `kern_yield(PRI_USER)` on FreeBSD and
`cond_resched()` on Linux.  This is instead of `pause("schedule", 1)` on
FreeBSD and `schedule()` on Linux. This makes our behavior consistent
across platforms.

Note that Linux's SPL continues to use `cond_resched()` and
`schedule()`.  However, those functions have been removed from both the
FreeBSD code and userspace code.

This should have the benefit of making it slightly easier to port the
code to new platforms by making how things should be mapped less
confusing.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13845
2022-09-12 09:55:37 -07:00
Don Brady ede037cda7 Make zfs-share service resilient to stale exports
The are a few cases where stale entries in /etc/exports.d/zfs.exports 
will cause the nfs-server service to fail when starting up.

Since the nfs-server startup consumes /etc/exports.d/zfs.exports, the 
zfs-share service (which rebuilds the list of zfs exports) should run 
before the nfs-server service.

To make the zfs-share service resilient to stale exports, this change 
truncates the zfs config file as part of the zfs share -a operation.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #13775
2022-09-09 10:54:16 -07:00
Ryan Moeller 60d995727a FreeBSD: Replace legacy make_dev() interface usage
The function make_dev_s() was introduced to replace make_dev() in
FreeBSD 11.0.  It allows further specification of properties and flags
and returns an error code on failure.  Using this we can fail loading
the module more gracefully than a panic in situations such as when a
device named zfs already exists.  We already use it for zvols.

Use make_dev_s() for /dev/zfs.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13854
2022-09-08 10:40:18 -07:00
Tony Hutter e27e692bcc zed: Fix config_sync autoexpand flood
Users were seeing floods of `config_sync` events when autoexpand was
enabled.  This happened because all "disk status change" udev events
invoke the autoexpand codepath, which calls zpool_relabel_disk(),
which in turn cause another "disk status change" event to happen,
in a feedback loop.  Note that "disk status change" happens every time
a user calls close() on a block device.

This commit breaks the feedback loop by only allowing an autoexpand
to happen if the disk actually changed size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes: #7132
Closes: #7366
Closes #13729
2022-09-08 10:32:30 -07:00
Alexander Motin 37f6845c6f Improve too large physical ashift handling
When iterating through children physical ashifts for vdev, prefer
ones above the maximum logical ashift, that we can actually use,
but within the administrator defined maximum.

When selecting top-level vdev ashift, do not set it to the defined
maximum in case physical ashift is even higher, but just ignore one.
Using the maximum does not prevent misaligned writes, but reduces
space efficiency.  Since ZFS tries to write data sequentially and
aggregates the writes, in many cases large misanigned writes may be
not as bad as the space penalty otherwise.

Allow internal physical ashifts for vdevs higher than SHIFT_MAX.
May be one day allocator or aggregation could benefit from that.

Reduce zfs_vdev_max_auto_ashift default from 16 (64KB) to 14 (16KB),
so that ZFS may still use bigger ashifts up to SHIFT_MAX (64KB),
but only if it really has to or explicitly told to, but not as an
"optimization".

There are some read-intensive NVMe SSDs that report Preferred Write
Alignment of 64KB, and attempt to build RAIDZ2 of those leads to a
space inefficiency that can't be justified.  Instead these changes
make ZFS fall back to logical ashift of 12 (4KB) by default and
only warn user that it may be suboptimal for performance.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #13798
2022-09-08 10:30:53 -07:00
Finix1979 320f0c6022 Add Linux posix_fadvise support
The purpose of this PR is to accepts fadvise ioctl from userland
to do read-ahead by demand.

It could dramatically improve sequential read performance especially
when primarycache is set to metadata or zfs_prefetch_disable is 1.

If the file is mmaped, generic_fadvise is also called for page cache
read-ahead besides dmu_prefetch.

Only POSIX_FADV_WILLNEED and POSIX_FADV_SEQUENTIAL are supported in
this PR currently.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Finix Yan <yancw@info2soft.com>
Closes #13694
2022-09-08 10:29:41 -07:00
Richard Yao 380b08098e Linux SPL module init: Handle memory allocation failures correctly
Upon inspection of our code, I noticed that we assume that
__alloc_percpu() cannot fail, and while it probably never has failed in
practice, technically, it can fail, so we should handle that.

Additionally, we incorrectly assume that `taskq_create()` in
spl_kmem_cache_init() cannot fail. The same remark applies to it.

Lastly, `spl-init()` failures should always return negative error
values, but in some places, we are returning positive 1, which is
incorrect. We change those values to their correct error codes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13847
2022-09-08 10:28:20 -07:00
pkubaj dff541f698 Fix build on FreeBSD/powerpc64*
There's no VSX handler on FreeBSD for now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Piotr Kubaj <pkubaj@FreeBSD.org>
Closes #13848
2022-09-08 10:27:25 -07:00
Christian Schwarz 5724073517 make DMU_OT_IS_METADATA and DMU_OT_IS_ENCRYPTED return B_TRUE or B_FALSE
Without this patch, the

    ASSERT3U(dbuf_is_metadata(db), ==, arc_is_metadata(buf));

at the beginning of dbuf_assign_arcbuf can panic
if the object type is a DMU_OT_NEWTYPE that has
DMU_OT_METADATA set.

While we're at it, fix DMU_OT_IS_ENCRYPTED as well.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13842
2022-09-07 17:04:15 -07:00
Walter Huf 238cd4b863 Add xattr_handler support for Android kernels
Some ARM BSPs run the Android kernel, which has
a modified xattr_handler->get() function signature.
This adds support to compile against these kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Walter Huf <hufman@gmail.com>
Closes #13824
2022-09-06 10:02:18 -07:00
Rob Wing 983096a1b4 FreeBSD: add kqfilter support for zvol cdev
The only event hooked up is NOTE_ATTRIB, which is triggered when the
device is resized.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Wing <rew@FreeBSD.org>
Closes #13773
2022-09-06 09:49:33 -07:00
Rob Wing 9d0887402b FreeBSD: add knlist_init_sx() for exclusive locks
This will be used to implement kqfilter support for zvol cdevs.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Wing <rew@FreeBSD.org>
Closes #13773
2022-09-06 09:48:57 -07:00
Richard Yao 11df48ab8b Cleanup Raid-Z Typo fixes
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13834
2022-09-06 09:43:21 -07:00
Samuel 7c0e3941cd Fix column width in 'zpool iostat -v' and 'zpool list -v'
This commit fixes a minor spacing issue caused when
enumerating vdev names, which originated from #13031

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Samuel Wycliffe <samuelwycliffe@gmail.com>
Closes #13811
2022-09-06 09:37:47 -07:00
Umer Saleem 59767479ac Add DD_FIELD string for snapshots_changed property
This commit adds DD_FIELD string used in extensified dsl_dir zap object
for snapshots_changed property.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13819
2022-09-02 13:33:50 -07:00
Andriy Gapon ee9f3bca55 Add zfs.sync.snapshot_rename
Only the single snapshot rename is provided.
The recursive or more complex rename can be scripted.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #13802
2022-09-02 13:31:19 -07:00
Ryan Moeller 7bb707ffaf FreeBSD: Organize sysctls
FreeBSD had a few platform-specific ARC tunables in the wrong place:

- Move FreeBSD-specifc ARC tunables into the same vfs.zfs.arc node as
  the rest of the ARC tunables.
- Move the handlers from arc_os.c to sysctl_os.c and add compat sysctls
  for the legacy names.

While here, some additional clean up:

- Most handlers are specific to a particular variable and don't need a
  pointer passed through the args.
- Group blocks of related variables, handlers, and sysctl declarations
  into logical sections.
- Match variable types for temporaries in handlers with the type of the
  global variable.
- Remove leftover comments.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13756
2022-09-02 13:26:24 -07:00
Ryan Moeller 4723eba8c0 FreeBSD: Mark ZFS_MODULE_PARAM_CALL as MPSAFE
ZFS_MODULE_PARAM_CALL handlers implement their own locking if needed
and do not require Giant.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13756
2022-09-02 13:26:04 -07:00
Ameer Hamza 899355d293 Add zilstat script to report zil kstats in a user friendly manner
Added a python script to process both global and per dataset
zil kstats and report them in a user friendly manner similar
to arcstat and dbufstat.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13704
2022-09-02 13:24:07 -07:00
Alexander Motin f933b3fd4d Apply arc_shrink_shift to ARC above arc_c_min
It makes sense to free memory in smaller chunks when approaching
arc_c_min to let other kernel subsystems to free more, since after
that point we can't free anything.  This also matches behavior on
Linux, where to shrinker reported only the size above arc_c_min.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #13794
2022-09-02 13:21:18 -07:00
Richard Yao 0b30dc484f FreeBSD: Cleanup dead code from VFS
The vfs_*_feature() macros turn anything that uses them into dead code,
so we can delete all of it.

As a side effect, zfs_set_fuid_feature() is now identical in
module/os/freebsd/zfs/zfs_vnops_os.c and
module/os/linux/zfs/zfs_vnops_os.c. A few other functions are identical
too. Future cleanup could move these into a common file.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13832
2022-09-02 13:20:10 -07:00
Andrew Innes 58e8054bce Alloc zdb_cd_t to fix stack issue
Alloc zdb_cd_t since it is too large for the stack on windows
which results in `zdb` crashing immediately.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Innes <andrew.c12@gmail.com>
Co-authored-by: Jorgen Lundman <lundman@lundman.net>
Closes #13807
2022-09-02 13:15:18 -07:00
George Wilson 2d5622f5be Importing from cachefile can trip assertion
When importing from cachefile, it is possible that the builtin retry
logic will trip an assertion because it also fails to find the pool.
This fix addresses that case and returns the correct error message to
the user.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #13781
2022-08-26 14:04:27 -07:00
Christian Schwarz 5bc0318047 ZTS: zvol_stress: fix race condition with zinject usage
In automated ZTS runs, I'd occasionally hit

    log_fail "Expected to see some write errors"

because there weren't any write errors.

The reason is that we're not syncing the zpool before `zinject -c`.
If the writes by `dd` aren't synced out at the time `zinject -c` runs,
they will not hit an error and we'll hit the log_fail above.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13793
2022-08-25 14:22:10 -07:00
Brian Behlendorf 9f346abbe8 Revert "Avoid panic with recordsize > 128k, raw sending and no large_blocks"
This reverts commit 80a650b7bb.  This change
inadvertently introduced a regression in ztest where one of the new ASSERTs
is triggered in dsl_scan_visitbp().

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12275 
Closes #13799
2022-08-25 13:33:32 -07:00
Umer Saleem a582d52993 Updates for snapshots_changed property
Currently, snapshots_changed property is stored in dd_props_zapobj, due
to which the property is assumed to be local. This causes a difference
in behavior with respect to other readonly properties.

This commit stores the snapshots_changed property in dd_object. Source
is not set to local in this case, which makes it consistent with other
readonly properties.

This commit also updates the date string format to include seconds.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13785
2022-08-24 14:20:43 -07:00
George Amanakis 0c4064d9a0 Fix zpool status in case of unloaded keys
When scrubbing an encrypted filesystem with unloaded key still report an
error in zpool status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13675
Closes #13717
2022-08-22 17:42:01 -07:00
Paul Dagnelie 17e212652d Prevent zevent list from consuming all of kernel memory
There are a couple changes included here. The first is to introduce 
a cap on the size the ZED will grow the zevent list to. One million 
entries is more than enough for most use cases, and if you are 
overflowing that value, the problem needs to be addressed another 
way. The value is also tunable, for those who want the limit to be 
higher or lower. 
 
The other change is to add a kernel module parameter that allows 
snapshot creation/deletion to be exempted from the history logging; 
for most workloads, having these things logged is valuable, but for 
some workloads it produces large quantities of log spam and isn't 
especially helpful.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Issue #13374 
Closes #13753
2022-08-22 12:36:22 -07:00
gregory-lee-bartholomew d22dd77c4d contrib: dracut: zfs-snapshot-bootfs: exit status fix
When the zfs-snapshot-bootfs service attempts to create a snapshot 
that already exists, the exit status of the command is non-zero and 
the service reports failed to the systemd service manager. This is a 
common occurrence if bootfs.snapshot is left set on the kernel command 
line and it should not be considered a failure. 
 
This service was originally set to ignore this error by prefixing 
the command with - on the ExecStart line, but the leading - appears 
to have been dropped in #13359.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #13769
2022-08-12 14:28:15 -07:00
r-ricci e713b69e51 arcstat: fix -p option
When the -p option is used, a list of floats is passed to sep.join(),
which expects strings. Fix this by converting each value to a string.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Roberto Ricci <ricci@disroot.org>
Closes #12916 
Closes #13767
2022-08-12 14:21:52 -07:00
George Melikov fbc210fab2 Enable relatime by default
Linux sets relatime on mount by default for any file system,
but relatime=off in ZFS disables it explicitly.

Let's be consistent with other file systems on Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #13614
2022-08-12 14:20:25 -07:00
Tony Hutter b3d0568cfd ZTS: Fix zpool_expand_001_pos
`zpool_expand_001_pos` was often failing due to not seeing autoexpand
commands in the `zpool history`.  During testing, I found this to be
unreliable (sometimes the "online" wouldn't appear in `zpool history`)
and unnecessary, as we could simply check that the pool increased in
size.

This commit revamps the test to check for the expanded pool size
and corresponding new free space.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13743
2022-08-09 13:26:46 -07:00
Christian Schwarz 91983265b6 Add comment on acb_zio_dummy
Thanks to George Wilson for clarifying this on Slack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13698
2022-08-08 16:55:13 -07:00
Coleman Kane ad0967638b Linux 6.0 compat: register_shrinker() now var-arg
The 6.0 kernel added a printf-style var-arg for args > 0 to the
register_shrinker function, in order to add names to shrinkers, in
commit e33c267ab70de4249d22d7eab1cc7d68a889bac2. This enables the
shrinkers to have friendly names exposed in /sys/kernel/debug/shrinker/.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #13748
2022-08-08 16:18:30 -07:00
Ryan Moeller 947465b984 libzfs: Remove unused zpool_get_physpath()
This is an oddly specific function that has never had any consumers in
the history of this repo.  Get rid of it and the pile of helper
functions that exist for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13724
2022-08-04 17:04:09 -07:00
Stéphane Lesimple 4fc1ea9c6c zpool: fix redundancy check after vdev removal
The presence of indirect vdevs was confusing get_redundancy(), which
considered a pool with e.g. only mirror top-level vdevs and at least
one indirect vdev (due to the removal of a previous vdev) as already
having a broken redundancy, which is not the case. This lead to the
possibility of compromising the redundancy of a pool by adding
mismatched vdevs without requiring the use of `-f`, and with no
visible notice or warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stéphane Lesimple <speed47_github@speed47.net>
Closes #13705
Closes #13711
2022-08-04 17:02:57 -07:00
Brian Behlendorf c26045b435 Linux 5.20 compat: blk_cleanup_disk()
As of the Linux 5.20 kernel blk_cleanup_disk() has been removed,
all callers should use put_disk().

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13728
2022-08-04 16:57:49 -07:00
Brian Behlendorf bebdf52a16 Linux 5.20 compat: bdevname()
As of the Linux 5.20 kernel bdevname() has been removed, all
callers should use snprintf() and the "%pg" format specifier.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13728
2022-08-04 16:57:33 -07:00
Paul Dagnelie 673aa7e6cf Don't double-zero buffers in fault management nvlists
This is a small cleanup for a trivial problem which happened to
be noticed while another issue was being investigated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #13730
2022-08-04 16:53:47 -07:00
Umer Saleem 9681de4657 Add snapshots_changed as property
Make dd_snap_cmtime property persistent across mount and unmount
operations by storing in ZAP and restore the value from ZAP on hold
into dd_snap_cmtime instead of updating it.

Expose dd_snap_cmtime as 'snapshots_changed' property that provides a
mechanism to quickly determine whether snapshot list for dataset has
changed without having to mount a dataset or iterate the snapshot list.

It specifies the time at which a snapshot for a dataset was last
created or deleted. This allows us to be more efficient how often we
query snapshots.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13635
2022-08-02 16:45:30 -07:00
Ryan Moeller 5ad44a0ce9 FreeBSD: Ignore symlink to i386 includes
A symlink to i386 includes is created in the build dir on amd64 since
freebsd/freebsd-src@d07600c563

Tell git to ignore it like the other include links.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13719
2022-08-02 16:34:23 -07:00
Brian Behlendorf 2f157cbe86 Linux 5.19 compat: META
Update the META file to reflect compatibility with the 5.19 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13715
2022-08-02 10:04:38 -07:00
Tino Reichardt 68aa3379ec Skip checksum benchmarks on systems with slow cpu
The checksum benchmarking on module load may take a really long time
on embedded systems with a slow cpu. Avoid all benchmarks >= 1MiB on
systems, where EdonR is slower then 300 MiB/s.

This limit is currently hardcoded via the define LIMIT_PERF_MBS.

This is the new benchmark output of a slow Intel Atom:

```
 implementation    1k    4k   16k   64k  256k    1m    4m   16m
 edonr-generic    209   257   268   259   262     0     0     0
 skein-generic    129   150   151   150   150     0     0     0
 sha256-generic    50    55    56    56    56     0     0     0
 sha512-generic    76    86    88    89    88     0     0     0
 blake3-generic    63    62    62    62    61     0     0     0
 blake3-sse2      114   292   301   307   309     0     0     0
```

Reviewed-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13695
2022-08-01 09:51:45 -07:00
Tino Reichardt 51946eda70 Fix checkstyle warning: E275 missing whitespace after keyword
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13710
2022-08-01 09:49:35 -07:00
Alek P e8cf3a4f76 Implement a new type of zfs receive: corrective receive (-c)
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #9372
2022-07-28 15:52:46 -07:00
Tino Reichardt 5fae33e047 FreeBSD compile fix
The file module/os/freebsd/zfs/zfs_ioctl_compat.c fails compiling
because of this error: 'static' is not at beginning of declaration

This commit fixes the three places within that file.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13702
2022-07-28 14:19:41 -07:00
Brian Behlendorf 34aa0f0487 ZTS: Fix io_uring support check
Not all Linux distribution kernels enable io_uring support by
default.  Update the run time check to verify that the booted
kernel was built with CONFIG_IO_URING=y.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Co-authored-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13648
Closes #13685
2022-07-26 14:39:23 -07:00
Ameer Hamza 3a1ce49141 Add createtxg sort support for simple snapshot iterator
- When iterating snapshots with name only, e.g., "-o name -s name",
libzfs uses simple snapshot iterator and results are displayed
in alphabetic order. This PR adds support for faster version of
createtxg sort by avoiding nvlist parsing for properties. Flags
"-o name -s createtxg" will enable createtxg sort while using
simple snapshot iterator.
- Added support to read createtxg property directly from zfs handle
for filesystem, volume and snapshot types instead of parsing nvlist.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13577
2022-07-25 14:04:46 -07:00
Brian Behlendorf 8792dd24cd ZTS: Fix occasional inherit_001_pos.ksh failure
The mountpoint may still be busy when the `zfs unmount -a` command
is run causing an unexpected failure.  Retry the unmount a couple
of times since it should not remain busy for long.

    19:10:50.29 NOTE: Reading state from .../inheritance/state021.cfg
    19:10:50.32 cannot unmount '/TESTPOOL': pool or dataset is busy
    19:10:50.32 ERROR: zfs unmount -a exited 1

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13686
2022-07-25 09:52:42 -07:00
Christian Schwarz bf61a507a2 zdb: dump spill block pointer if present
Output will look like so:

  $ sudo zdb -dddd -vv testpool/fs 2
  Dataset testpool/fs [ZPL], ID 260, cr_txg 8, 25K, 7 objects, rootbp DVA[0]=<0:1800be00:200> DVA[1]=<0:1c00be00:200> [L0 DMU objset] fletcher4 lz4 unencrypted LE contiguous unique double size=1000L/200P birth=16L/16P fill=7 cksum=d03b396cd:489ca835517:d4b04a4d0a62:1b413aac454d53

      Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
           2    1   128K    512     1K     512    512    0.00  ZFS plain file (K=inherit) (Z=inherit=lz4)
                                                 192   bonus  System attributes
      dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED SPILL_BLKPTR
      dnode maxblkid: 0
      path    /testfile
      uid     0
      gid     0
      atime   Fri Jul 15 12:36:35 2022
      mtime   Fri Jul 15 12:36:35 2022
      ctime   Fri Jul 15 12:36:51 2022
      crtime  Fri Jul 15 12:36:35 2022
      gen 10
      mode    100600
      size    0
      parent  34
      links   1
      pflags  840800000004
      SA xattrs: 248 bytes, 2 entries

          security.selinux = nutanix_u:object_r:unlabeled_t:s0\000
          user.foo = xbLQJjyVvEVPGGuRHV/gjkFFO1MdehKnLjjd36ZaoMVaUqtqFoMMYT5Ya9yywHApJNoK/1hNJfO3\012XCJWv9/QUTKamoWW9xVDE7yi8zn166RNw5QUhf84cZ3JNLnw6oN

Spill block: 0:10005c00:200 0:14005c00:200 200L/200P F=1 B=16/16 cksum=1cdfac47a4:910c5caa557:195d0493dfe5a:332b6fde6ad547
Indirect blocks:

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13640
2022-07-20 17:16:29 -07:00
ixhamza fb087146de Add support for per dataset zil stats and use wmsum counters
ZIL kstats are reported in an inclusive way, i.e., same counters are
shared to capture all the activities happening in zil. Added support
to report zil stats for every datset individually by combining them
with already exposed dataset kstats.

Wmsum uses per cpu counters and provide less overhead as compared
to atomic operations. Updated zil kstats to replace wmsum counters
to avoid atomic operations.

Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13636
2022-07-20 17:14:06 -07:00
Alexander Motin 33dba8c792 Fix scrub resume from newly created hole
It may happen that scan bookmark points to a block that was turned
into a part of a big hole.  In such case dsl_scan_visitbp() may skip
it and dsl_scan_check_resume() will not be called for it.  As result
new scan suspend won't be possible until the end of the object, that
may take hours if the object is a multi-terabyte ZVOL on a slow HDD
pool, stretching TXG to all that time, creating all sorts of problems.

This patch changes the resume condition to any greater or equal block,
so even if we miss the bookmarked block, the next one we find will
delete the bookmark, allowing new suspend.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13643
2022-07-20 17:02:36 -07:00
Tino Reichardt 97fd1ea42a Fix memory allocation for the checksum benchmark
Allocation via kmem_cache_alloc() is limited to less then 4m for
some architectures.

This commit limits the benchmarks with the linear abd cache to 1m
on all architectures and adds 4m + 16m benchmarks via non-linear
abd_alloc().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13669
Closes #13670
2022-07-20 17:01:32 -07:00
ixhamza f371cc18f8 Expose ZFS dataset case sensitivity setting via sb_opts
Makes the case sensitivity setting visible on Linux in /proc/mounts.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13607
2022-07-14 10:38:16 -07:00
Tony Hutter 9fe2f262aa zed: Look for NVMe DEVPATH if no ID_BUS
We tried replacing an NVMe drive using autoreplace, only
to see zed reject it with:

zed[27955]: zed_udev_monitor: /dev/nvme5n1 no devid source

This happened because ZED saw that ID_BUS was not set by udev
for the NVMe drive, and thus didn't think it was "real drive".
This commit allows NVMe drives to be autoreplaced even if
ID_BUS is not set.

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13512 
Closes #13646
2022-07-14 10:19:37 -07:00
Tino Reichardt 1d3ba0bf01 Replace dead opensolaris.org license link
The commit replaces all findings of the link:
http://www.opensolaris.org/os/licensing with this one:
https://opensource.org/licenses/CDDL-1.0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13619
2022-07-11 14:16:13 -07:00
Tony Hutter e4ab3f40df zed: Ignore false 'atari' partitions in autoreplace
libudev will sometimes falsely identify an 'atari' partition on a
blank disk, preventing it from being used in an autoreplace.  This
seems to be a known issue.  The workaround is to just ignore the
fake partition and continue with the autoreplace.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13497
Closes #13632
2022-07-11 13:35:19 -07:00
Tony Hutter 677ca1e825 rpm: Silence "unversioned Obsoletes" warnings on EL 9
Get rid of RPM warnings on AlmaLinux 9:

"It's not recommended to have unversioned Obsoletes"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13584
Closes #13638
2022-07-11 11:35:01 -07:00
Brian Behlendorf e6489be347 Linux: Align MODULE_LICENSE macro text
Specify the lua and zstd license text in the manor in which the
kernel MODULE_LICENSE macro requires it.  The now duplicate entries
were merged and a comment added to make it clear what they apply to.

Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13641
2022-07-11 11:29:12 -07:00
Finix1979 cb01da6805 Call nvlist_free before return
Fixes a small kernel memory leak which would occur if a pool failed
to import because the `DMU_POOL_VDEV_ZAP_MAP` key can't be read from 
a presumably damaged MOS config.  In the case of a missing key there 
was no leak.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Finix1979 <yancw@info2soft.com>
Closes #13629
2022-07-07 11:43:58 -07:00
Alexander Motin 74230a5bc1 Avoid memory copy when verifying raidz/draid parity
Before this change for every valid parity column raidz_parity_verify()
allocated new buffer and copied there existing data, then recalculated
the parity and compared the result with the copy.  This patch removes
the memory copy, simply swapping original buffer pointers with newly
allocated empty ones for parity recalculation and comparison. Original
buffers with potentially incorrect parity data are then just freed,
while new recalculated ones are used for repair.

On a pool of 12 4-wide raidz vdevs, storing 1.5TB of 16MB blocks, this
change reduces memory traffic during scrub by 17% and total unhalted
CPU time by 25%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13613
2022-07-05 16:27:29 -07:00
Alexander Motin 1ac7d194e5 Avoid memory copies during mirror scrub
Issuing several scrub reads for a block we may use the parent ZIO
buffer for one of child ZIOs.  If that read complete successfully,
then we won't need to copy the data explicitly.  If block has only
one copy (typical for root vdev, which is also a mirror inside),
then we never need to copy -- succeed or fail as-is.  Previous
code also copied data from buffer of every successfully completed
child ZIO, but that just does not make any sense.

On healthy N-wide mirror this saves all N+1 (or even more in case
of ditto blocks) memory copies for each scrubbed block, allowing
CPU to focus mostly on check-summing.  For other vdev types it
should save one memory copy per block copy at root vdev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13606
2022-07-05 16:26:20 -07:00
наб 6fca6195cd Re-fix -Wwrite-strings on FreeBSD
Follow up fix for a926aab902.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
Closes #13610
2022-06-30 11:31:09 -07:00
Toyam Cox eefe83eaa6 dracut: fix boot on non-zfs-root systems
Simply prevent overwriting root until it needs to be overwritten.

Dracut could change this value before this module is called, but won't
change the kernel command line.

Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Toyam Cox <vaelatern@voidlinux.org>
Closes #13592
2022-06-30 10:47:58 -07:00
gregory-lee-bartholomew 5a4dd3a262 contrib: dracut: README.md
Change zfs-snapshot-bootfs.service to zfs-rollback-bootfs.service in
cmdline point 4 of README.md.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #13609
2022-06-30 10:43:27 -07:00
George Amanakis 34e5423f83 Fix dnode byteswapping
If a dnode has a spill pointer, and we use DN_SLOTS_TO_BONUSLEN() then
we will possibly include the spill pointer in the len calculation and it
will be byteswapped. Then dnode_byteswap() will carry on and swap the
spill pointer again. Fix this by using DN_MAX_BONUS_LEN() instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13002 
Closes #13015
2022-06-29 17:06:16 -07:00
gregory-lee-bartholomew 2d434e8ae4 contrib: dracut: zfs-{rollback,snapshot}-bootfs: explicit snapname fix
Due to a missing semicolon on the ExecStart line, it wasn't possible
to specify the snapshot name on the bootfs.{rollback,snapshot}
kernel parameters if the boot dataset name was obtained from the
root=zfs:... kernel parameter.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #13585
2022-06-29 16:56:04 -07:00
Brian Behlendorf 07f2793e86 dracut: fix typo in mount-zfs.sh.in
Format the `zpool get` command correctly.  The -o option must
be followed by "all" or the requested field name.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13602
2022-06-29 15:33:38 -07:00
наб 60e389ca10 module: lua: ldo: fix pragma name
/home/nabijaczleweli/store/code/zfs/module/lua/ldo.c:175:32: warning:
unknown option after ‘#pragma GCC diagnostic’ kind [-Wpragmas]
  175 | #pragma GCC diagnostic ignored "-Winfinite-recursion"a
      |                                ^~~~~~~~~~~~~~~~~~~~~~

Fixes: a6e8113fed ("Silence
-Winfinite-recursion warning in luaD_throw()")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
2022-06-29 14:08:59 -07:00
наб 404601aca0 linux: libzfs: util: don't fallthrough to to end-of-switch
lib/libzfs/os/linux/libzfs_util_os.c:262:3: error: fallthrough
annotation does not directly precede switch label
                zfs_fallthrough;
                ^
./lib/libspl/include/sys/feature_tests.h:34:26: note: expanded from
macro 'zfs_fallthrough'

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
2022-06-29 14:08:59 -07:00
наб a2f6bff976 tests: modernise zdb_decompress
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
2022-06-29 14:08:59 -07:00
наб dd66857d92 Remaining {=> const} char|void *tag
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
2022-06-29 14:08:59 -07:00
наб a926aab902 Enable -Wwrite-strings
Also, fix leak from ztest_global_vars_to_zdb_args()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
2022-06-29 14:08:54 -07:00
gaoyanping e7d90362e5 Fix znode group permission different from acl mask
Zp->z_mode is set at the same time inode->i_mode
is being changed. This has the effect of keeping both
in sync without relying on zfs_znode_update_vfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yanping.gao <yanping.gao@xtaotech.com>
Closes #13581
2022-06-29 13:38:46 -07:00
Kristof Provost 325096545a FreeBSD: only define B_FALSE/B_TRUE if NEED_SOLARIS_BOOLEAN is not set
If NEED_SOLARIS_BOOLEAN is defined we define an enum boolean_t, which
defines B_TRUE/B_FALSE as well. If we have both the define and the enum
things don't build (because that translates to
'enum { 0, 1 }     boolean_t').

While here also remove an incorrect '#else'. With it in place we only
parse a section if the include guard is triggered. So we'd only use that
code if this file is included twice. This is clearly unintended, and
also means we don't get the 'boolean_t' definition. Fix this.

Reviewed-by: Warner Losh <imp@bsdimp.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Kristof Provost <kprovost@netgate.com>
Sponsored-By: Rubicon Communications, LLC ("Netgate")
Closes #13596
2022-06-28 14:11:38 -07:00
Alexander Motin 827322991f Fix and disable blocks statistics during scrub
Block statistics calculation during scrub I/O issue in case of sorted
scrub accounted ditto blocks several times.  Embedded blocks on other
side were not accounted at all.  This change moves the accounting from
issue to scan stage, that fixes both problems and also allows to avoid
pool-wide locking and the lock contention it created.

Since this statistics is quite specific and is not even exposed now
anywhere, disable its calculation by default to not waste CPU time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13579
2022-06-28 11:23:31 -07:00
Brian Behlendorf 43569ee374 Fix objtool: missing int3 after ret warning
Resolve straight-line speculation warnings reported by objtool
for x86_64 assembly on Linux when CONFIG_SLS is set.  See the
following LWN article for the complete details.

https://lwn.net/Articles/877845/

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:19:43 -07:00
Brian Behlendorf 8aceded193 Fix -Wformat-overflow warning in zfs_project_handle_dir()
Switch to using asprintf() to satisfy the compiler and resolve the
potential format-overflow warning.  Not the conditional before the
sprintf() would have prevented this regardless.

    cmd/zfs/zfs_project.c: In function ‘zfs_project_handle_dir’:
    cmd/zfs/zfs_project.c:241:38: error: ‘/’ directive writing
    1 byte into a region of size between 0 and 4352
    [-Werror=format-overflow=]
    cmd/zfs/zfs_project.c:241:17: note: ‘sprintf’ output between
    2 and 4609 bytes into a destination of size 4352

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:19:38 -07:00
Brian Behlendorf f11431a317 Fix -Wformat-truncation warning in upgrade_set_callback()
Extend the buffer slightly resolve the warning.

    cmd/zfs/zfs_main.c: In function ‘upgrade_set_callback’:
    cmd/zfs/zfs_main.c:2446:22: error: ‘%llu’ directive output
    may be truncated writing between 1 and 20 bytes into a
    region of size 16 [-Werror=format-truncation=]
    cmd/zfs/zfs_main.c:2445:24: note: ‘snprintf’ output between
    2 and 21 bytes into a destination of size 16

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:19:31 -07:00
Brian Behlendorf c175f5ebb2 Fix -Wuse-after-free warning in dbuf_destroy()
Move the use of the db pointer after it is freed.  It's only used as
a tag so a dereference would never occur, but there's no reason we
can't invert the order to resolve the warning.

    module/zfs/dbuf.c: In function 'dbuf_destroy':
    module/zfs/dbuf.c:2953:17: error:
    pointer 'db' may be used after 'free' [-Werror=use-after-free]

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:19:24 -07:00
Brian Behlendorf 9619bcdefb Fix -Wuse-after-free warning in dbuf_issue_final_prefetch_done()
Move the use of the private pointer after it is freed.  It's only
used as a tag so a dereference would never occur, but there's no
harm in inverting the order to resolve the warning.

    module/zfs/dbuf.c: In function 'dbuf_issue_final_prefetch_done':
    module/zfs/dbuf.c:3204:17: error:
    pointer 'private' may be used after 'free' [-Werror=use-after-free]

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:19:19 -07:00
Brian Behlendorf ff7e405f83 Fix -Wattribute-warning in dsl layer
The memcpy(), memmove(), and memset() functions have been annotated
to perform bounds checking when using FORTIFY_SOURCE.  A warning is
now generted when writing beyond the end of the specified field.

Alternately, the new struct_group() macro could be used to create
an anonymous union member for use by memcpy().  However, since this
is the only place the macro would be helpful it's preferable to
restructure the code slights to avoid the need for additional
compatibility code when the macro does not exist.

https://lore.kernel.org/lkml/20211118183807.1283332-1-keescook@chromium.org/T/

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:19:12 -07:00
Brian Behlendorf 18df6afdfc Fix -Wattribute-warning in edonr
The wrong union memory was being accessed in EdonRInit resulting in
a write beyond size of field compiler warning.  Reference the correct
member to resolve the warning.  The warning was correct and this in
case the mistake was harmless.

    In function ‘fortify_memcpy_chk’,
    inlined from ‘EdonRInit’ at zfs/module/icp/algs/edonr/edonr.c:494:3:
    ./include/linux/fortify-string.h:344:25: error: call to
    ‘__write_overflow_field’ declared with attribute warning:
    detected write beyond size of field (1st parameter);
    maybe use struct_group()? [-Werror=attribute-warning]

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:19:04 -07:00
Brian Behlendorf b0f7dd276c Fix -Wattribute-warning in zfs_log_xvattr()
Restructure the code in zfs_log_xvattr() to use a lr_attr_end
structure when accessing lr_attr_t elements located after the
variable sized array.  This makes the code more understandable
and resolves the accessing beyond the end of the field warnings.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:18:57 -07:00
Brian Behlendorf a6e8113fed Silence -Winfinite-recursion warning in luaD_throw()
This code should be kept inline with the upstream lua version as much
as possible.  Therefore, we simply want to silence the warning.  This
check was enabled by default as part of -Wall in gcc 12.1.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13528
Closes #13575
2022-06-27 14:18:50 -07:00
George Amanakis 80a650b7bb Avoid panic with recordsize > 128k, raw sending and no large_blocks
The current codebase does not support raw sending buffers with block
size > 128kB when large_blocks is not active. This can happen in the
codepath dsl_dataset_sync()->dmu_objset_sync()->zio_nowait() which
calls back dmu_objset_write_done()->dsl_dataset_block_born(). If
dsl_dataset_sync() completes its run before dsl_dataset_block_born() is
called, we will end up not activating some of the necessary flags, while
having blocks based on those flags written in the filesystem. A
subsequent send will then panic.

Fix this by directly deciding in dmu_objset_sync() whether these flags
need to be activated later by dsl_dataset_sync(). Instead of panicking
due to a NULL pointer dereference in dmu_dump_write() in case of a send,
print out an error message. Also during scrub verify there are no
contradicting filesystem flags.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12275
Closes #12438
2022-06-27 14:17:25 -07:00
Alexander Motin 1cd72b9c13 Avoid two 64-bit divisions per scanned block
Change math to make it like the ARC, using multiplications instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13591
2022-06-27 11:08:21 -07:00
Alexander Motin c0bf952c84 Several B-tree optimizations
- Introduce first element offset within a leaf.  It allows to reduce
by ~50% average memmove() size when adding/removing elements.  If the
added/removed element is in the first half of the leaf, we may shift
elements before it and adjust the bth_first instead of moving more
elements after it.
 - Use memcpy() instead of memmove() when we know there is no overlap.
 - Switch from uint64_t to uint32_t.  It does not limit anything,
but 32-bit arches should appreciate it greatly in hot paths.
 - Store leaf capacity in struct btree to avoid 64-bit divisions.
 - Adjust zfs_btree_insert_into_leaf() to always result in balanced
leaves after splitting, no matter where the new element was inserted.
Not that we care about it much, but it should also allow B-trees with
as little as two elements per leaf instead of 4 previously.

When scrubbing pool of 12 SSDs, storing 1.5TB of 4KB zvol blocks this
reduces amount of time spent in memmove() inside the scan thread from
13.7% to 5.7% and total scrub time by ~15 seconds out of 9 minutes.
It should also reduce spacemaps load time, but I haven't measured it.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13582
2022-06-24 13:55:58 -07:00
Alan Somers ccf89b39fe Add a "zstream decompress" subcommand
It can be used to repair a ZFS file system corrupted by ZFS bug #12762.
Use it like this:

zfs send -c <DS> | \
zstream decompress <OBJECT>,<OFFSET>[,<COMPRESSION_ALGO>] ... | \
zfs recv <DST_DS>

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored-by:  Axcient
Workaround for #12762
Closes #13256
2022-06-24 13:28:42 -07:00
Alexander Motin 1c0c729ab4 Several sorted scrub optimizations
- Reduce size and comparison complexity of q_exts_by_size B-tree.
Previous code used two 64-bit divisions and many other operations to
compare two B-tree elements.  It created enormous overhead.  This
implementation moves the math to the upper level and stores the score
in the B-tree elements themselves.  Since all that we need to store in
that B-tree is the extent score and offset, those can fit into single
8 byte value instead of 24 bytes of q_exts_by_addr element and can be
compared with single operation.
 - Better decouple secondary tree logic from main range_tree by moving
rt_btree_ops and related functions into dsl_scan.c as ext_size_ops.
Those functions are very small to worry about the code duplication and
range_tree does not need to know details such as rt_btree_compare.
 - Instead of accounting number of pending bytes per pool, that needs
atomic on global variable per block, account the number of non-empty
per-vdev queues, that change much more rarely.
 - When extent scan is interrupted by TXG end, continue it in the next
TXG instead of selecting next best extent.  It allows to avoid leaving
one truncated (and so likely not the best any more) extent each TXG.

On top of some other optimizations this saves about 1.5 minutes out of
10 to scrub pool of 12 SSDs, storing 1.5TB of 4KB zvol blocks.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13576
2022-06-24 09:50:37 -07:00
Toomas Soome 83691bebf0 Use macros for quotes and such
Use Dq,Pq/Po/Pc macros. illumos dumpadm is now in section 8.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #13586
2022-06-24 09:48:10 -07:00
Brian Behlendorf ad8b9f940c Scrub mirror children without BPs
When scrubbing a raidz/draid pool, which contains a replacing or
sparing mirror with multiple online children, only one child will
be read.  This is not normally a serious concern because the DTL
records are used to determine where a good copy of the data is.
As long as the data can be read from one child the mirror vdev
will use it to repair gaps in any of its children.  Furthermore,
even if the data which was read is corrupt the raidz code will
detect this and issue its own repair I/O to correct the damage
in the mirror vdev.

However, in the scenario where the DTL is wrong due to silent
data corruption (say due to overwriting one child) and the scrub
happens to read from a child with good data, then the other damaged
mirror child will not be detected nor repaired.

While this is possible for both raidz and draid vdevs, it's most
pronounced when using draid.  This is because by default the zed
will sequentially rebuild a draid pool to a distributed spare,
and the distributed spare half of the mirror is always preferred
since it delivers better performance.  This means the damaged
half of the mirror will go undetected even after scrubbing.

For system administrations this behavior is non-intuitive and in
a worst case scenario could result in the only good copy of the
data being unknowingly detached from the mirror.

This change resolves the issue by reading all replacing/sparing
mirror children when scrubbing.  When the BP isn't available for
verification, then compare the data buffers from each child.  They
must all be identical, if not there's silent damage and an error
is returned to prompt the top-level vdev to issue a repair I/O to
rewrite the data on all of the mirror children.  Since we can't
tell which child was wrong a checksum error is logged against the
replacing or sparing mirror vdev.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13555
2022-06-23 10:36:28 -07:00
Tino Reichardt deb1213098 Fix memory allocation issue for BLAKE3 context
The kmem_alloc(sizeof (*ctx), KM_NOSLEEP) call on FreeBSD can't be
used in this code segment. Work around this by pre-allocating a percpu
context array for later use.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13568
2022-06-21 14:32:09 -07:00
Matthew Thode b17663f571 Remove install of zfs-load-module.service for dracut
The zfs-load-module.service service is not currently provided by
the OpenZFS repository so we cannot safely assume it exists.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Thode <mthode@mthode.org>
Closes #13574
2022-06-21 10:37:20 -07:00
Alexander Motin d51f4ea5f9 FreeBSD: Improve crypto_dispatch() handling
Handle crypto_dispatch() return values same as crp->crp_etype errors.
On FreeBSD 12 many drivers returned same errors both ways, and lack
of proper handling for the first ended up in assertion panic later.
It was changed in FreeBSD 13, but there is no reason to not be safe.

While there, skip waiting for completion, including locking and
wakeup() call, for sessions on synchronous crypto drivers, such as
typical aesni and software.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13563
2022-06-17 15:38:51 -07:00
Andrew f609739985 expose snapshot count via stat(2) of .zfs/snapshot (#13559)
Increase nlinks in stat results of ./zfs/snapshot based on snapshot
count. This provides quick and efficient method for administrators to
get snapshot counts without having to use libzfs or list the snapdir
contents.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #13559
2022-06-17 11:44:49 -07:00
ixhamza 10891b37fa libzfs: Prevent overridding of error code
zfs_send_cb_impl fails to report error for some flags.

Use second error variable for send_conclusion_record.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13558
2022-06-15 14:26:12 -07:00
Alexander Motin dd8671459f Reduce ZIO io_lock contention on sorted scrub
During sorted scrub multiple threads (one per vdev) are issuing many
ZIOs same time, all using the same scn->scn_zio_root ZIO as parent.
It causes huge lock contention on the single global lock on that ZIO.
Improve it by introducing per-queue null ZIOs, children to that one,
and using them instead as proxy.

For 12 SSD pool storing 1.5TB of 4KB blocks on 80-core system this
dramatically reduces lock contention and reduces scrub time from 21
minutes down to 12.5, while actual read stages (not scan) are about
3x faster, reaching 100K blocks per second per vdev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13553
2022-06-15 14:25:08 -07:00
crass bc00d2c711 Add support for ARCH=um for x86 sub-architectures
When building modules (as well as the kernel) with ARCH=um, the options
-Dsetjmp=kernel_setjmp and -Dlongjmp=kernel_longjmp are passed to the C
preprocessor for C files. This causes the setjmp and longjmp used in
module/lua/ldo.c to be kernel_setjmp and kernel_longjmp respectively in
the object file. However, the setjmp and longjmp that is intended to be
called is defined in an architecture dependent assembly file under the
directory module/lua/setjmp. Since it is an assembly and not a C file,
the preprocessor define is not given and the names do not change. This
becomes an issue when modpost is trying to create the Module.symvers
and sees no defined symbol for kernel_setjmp and kernel_longjmp. To fix
this, if the macro CONFIG_UML is defined, then setjmp and longjmp
macros are undefined.

When building with ARCH=um for x86 sub-architectures, CONFIG_X86 is not
defined. Instead, CONFIG_UML_X86 is defined. Despite this, the UML x86
sub-architecture can use the same object files as the x86 architectures
because the x86 sub-architecture UML kernel is running with the same
instruction set as CONFIG_X86. So the modules/Kbuild build file is
updated to add the same object files that CONFIG_X86 would add when
CONFIG_UML_X86 is defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Glenn Washburn <development@efficientek.com>
Closes #13547
2022-06-15 14:22:52 -07:00
Damian Szuberski 9884319666 Fix clang 13 compilation errors
```
os/linux/zfs/zvol_os.c:1111:3: error: ignoring return value of function
  declared with 'warn_unused_result' attribute [-Werror,-Wunused-result]
                add_disk(zv->zv_zso->zvo_disk);
                ^~~~~~~~ ~~~~~~~~~~~~~~~~~~~~

zpl_xattr.c:1579:1: warning: no previous prototype for function
  'zpl_posix_acl_release_impl' [-Wmissing-prototypes]
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13551
2022-06-15 14:20:28 -07:00
Allan Jude 4ff7a8fa2f Replace ZPROP_INVAL with ZPROP_USERPROP where it means a user property
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Klara Inc.
Closes #12676
2022-06-14 11:27:53 -07:00
Ryan Moeller 9e605cf155 spl: Use a clearer name for the user namespace fd
This fd has nothing to do with cleanup, that's just the name of the
field in zfs_cmd_t that was used to pass it to the kernel.

Call it what it is, an fd for a user namespace.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13554
2022-06-14 08:14:19 -07:00
Ryan Moeller def1a401f4 libzfs: zfs_userns: Don't leak the namespace fd
zfs_userns opens a file descriptor for the kernel to look up a
namespace, but does not close it.

Close the fd when we're done with it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13554
2022-06-14 08:13:38 -07:00
Julian Brunner 482505fd42 Add weekly and monthly systemd timers for trimming
On machines using systemd, trim timers can be enabled on a per-pool
basis. Weekly and monthly timer units are provided. Timers can be
enabled as follows:

systemctl enable zfs-trim-weekly@rpool.timer --now
systemctl enable zfs-trim-monthly@datapool.timer --now

Each timer will pull in zfs-trim@${poolname}.service, which is not
schedule-specific.

The manpage zpool-trim has been updated accordingly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Julian Brunner <julian.brunner@gmail.com>
Closes #13544
2022-06-10 18:22:14 -07:00
Alexander Motin 87b46d63b2 Improve sorted scan memory accounting
Since we use two B-trees q_exts_by_size and q_exts_by_addr, we should
count 2x sizeof (range_seg_gap_t) per node.  And since average B-tree
memory efficiency is about 75%, we should increase it to 3x.

Previous code under-counted up to 30% of the memory usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13537
2022-06-10 10:01:46 -07:00
Will Andrews 4ed5e25074 Add Linux namespace delegation support
This allows ZFS datasets to be delegated to a user/mount namespace
Within that namespace, only the delegated datasets are visible
Works very similarly to Zones/Jailes on other ZFS OSes

As a user:
```
 $ unshare -Um
 $ zfs list
no datasets available
 $ echo $$
1234
```

As root:
```
 # zfs list
NAME                            ZONED  MOUNTPOINT
containers                      off    /containers
containers/host                 off    /containers/host
containers/host/child           off    /containers/host/child
containers/host/child/gchild    off    /containers/host/child/gchild
containers/unpriv               on     /unpriv
containers/unpriv/child         on     /unpriv/child
containers/unpriv/child/gchild  on     /unpriv/child/gchild

 # zfs zone /proc/1234/ns/user containers/unpriv
```

Back to the user namespace:
```
 $ zfs list
NAME                             USED  AVAIL     REFER  MOUNTPOINT
containers                       129M  47.8G       24K  /containers
containers/unpriv                128M  47.8G       24K  /unpriv
containers/unpriv/child          128M  47.8G      128M  /unpriv/child
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will.andrews@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Sponsored-by: Buddy <https://buddy.works>
Closes #12263
2022-06-10 09:51:46 -07:00
Allan Jude a1aa8f14c8 Revert parts of 938cfeb0f2
When read and writing the UID/GID, we always want the value
relative to the root user namespace, the kernel will take care
of remapping this to the user namespace for us.

Calling from_kuid(user_ns, uid) with a unmapped uid will return -1
as that uid is outside of the scope of that namespace, and will result
in the files inside the namespace all being owned by 'nobody' and not
being allowed to call chmod or chown on them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #12263
2022-06-10 09:51:32 -07:00
Alexander Motin fc5200aa9b AVL: Remove obsolete branching optimizations
Modern Clang and GCC can successfully implement simple conditions
without branching with math and flag operations.  Use of arrays for
translation no longer helps as much as it was 14+ years ago.

Disassemble of the code generated by Clang 13.0.0 on FreeBSD 13.1,
Clang 14.0.4 on FreeBSD 14 and GCC 10.2.1 on Debian 11 with this
change still shows no branching instructions.

Profiling of CPU-bound scan stage of sorted scrub shows reproducible
reduction of time spent inside avl_find() from 6.52% to 4.58%.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13540
2022-06-09 15:27:36 -07:00
Ryan Moeller cdeb98a116 libzfs: Rename msg bufs to errbuf for consistency
`libzfs_pool.c` uses the name `msg` where everywhere else in libzfs uses
`errbuf` for the error message buffer.

Use the name consistent with the rest of libzfs and use ERRBUFLEN
instead of 1024.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13539
2022-06-09 15:24:51 -07:00
Ryan Moeller d68a2bfa99 libzfs: Define the defecto standard errbuf size
Every errbuf array in libzfs is 1024 chars.

Define ERRBUFLEN in a shared header, and use it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13539
2022-06-09 15:24:24 -07:00
Tony Hutter 6f73d02168 zvol: Support blk-mq for better performance
Add support for the kernel's block multiqueue (blk-mq) interface in
the zvol block driver.  blk-mq creates multiple request queues on
different CPUs rather than having a single request queue.  This can
improve zvol performance with multithreaded reads/writes.

This implementation uses the blk-mq interfaces on 4.13 or newer
kernels.  Building against older kernels will fall back to the
older BIO interfaces.

Note that you must set the `zvol_use_blk_mq` module param to
enable the blk-mq API.  It is disabled by default.

In addition, this commit lets the zvol blk-mq layer process whole
`struct request` IOs at a time, rather than breaking them down
into their individual BIOs.  This reduces dbuf lock contention
and overhead versus the legacy zvol submit_bio() codepath.

	sequential dd to one zvol, 8k volblocksize, no O_DIRECT:

	legacy submit_bio()     292MB/s write  453MB/s read
	this commit             453MB/s write  885MB/s read

It also introduces a new `zvol_blk_mq_chunks_per_thread` module
parameter. This parameter represents how many volblocksize'd chunks
to process per each zvol thread.  It can be used to tune your zvols
for better read vs write performance (higher values favor write,
lower favor read).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13148
Issue #12483
2022-06-09 08:10:38 -06:00
Tino Reichardt 985c33b132 Introduce BLAKE3 checksums as an OpenZFS feature
This commit adds BLAKE3 checksums to OpenZFS, it has similar
performance to Edon-R, but without the caveats around the latter.

Homepage of BLAKE3: https://github.com/BLAKE3-team/BLAKE3
Wikipedia: https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE3

Short description of Wikipedia:

  BLAKE3 is a cryptographic hash function based on Bao and BLAKE2,
  created by Jack O'Connor, Jean-Philippe Aumasson, Samuel Neves, and
  Zooko Wilcox-O'Hearn. It was announced on January 9, 2020, at Real
  World Crypto. BLAKE3 is a single algorithm with many desirable
  features (parallelism, XOF, KDF, PRF and MAC), in contrast to BLAKE
  and BLAKE2, which are algorithm families with multiple variants.
  BLAKE3 has a binary tree structure, so it supports a practically
  unlimited degree of parallelism (both SIMD and multithreading) given
  enough input. The official Rust and C implementations are
  dual-licensed as public domain (CC0) and the Apache License.

Along with adding the BLAKE3 hash into the OpenZFS infrastructure a
new benchmarking file called chksum_bench was introduced.  When read
it reports the speed of the available checksum functions.

On Linux: cat /proc/spl/kstat/zfs/chksum_bench
On FreeBSD: sysctl kstat.zfs.misc.chksum_bench

This is an example output of an i3-1005G1 test system with Debian 11:

implementation      1k      4k     16k     64k    256k      1m      4m
edonr-generic     1196    1602    1761    1749    1762    1759    1751
skein-generic      546     591     608     615     619     612     616
sha256-generic     240     300     316     314     304     285     276
sha512-generic     353     441     467     476     472     467     426
blake3-generic     308     313     313     313     312     313     312
blake3-sse2        402    1289    1423    1446    1432    1458    1413
blake3-sse41       427    1470    1625    1704    1679    1607    1629
blake3-avx2        428    1920    3095    3343    3356    3318    3204
blake3-avx512      473    2687    4905    5836    5844    5643    5374

Output on Debian 5.10.0-10-amd64 system: (Ryzen 7 5800X)

implementation      1k      4k     16k     64k    256k      1m      4m
edonr-generic     1840    2458    2665    2719    2711    2723    2693
skein-generic      870     966     996     992    1003    1005    1009
sha256-generic     415     442     453     455     457     457     457
sha512-generic     608     690     711     718     719     720     721
blake3-generic     301     313     311     309     309     310     310
blake3-sse2        343    1865    2124    2188    2180    2181    2186
blake3-sse41       364    2091    2396    2509    2463    2482    2488
blake3-avx2        365    2590    4399    4971    4915    4802    4764

Output on Debian 5.10.0-9-powerpc64le system: (POWER 9)

implementation      1k      4k     16k     64k    256k      1m      4m
edonr-generic     1213    1703    1889    1918    1957    1902    1907
skein-generic      434     492     520     522     511     525     525
sha256-generic     167     183     187     188     188     187     188
sha512-generic     186     216     222     221     225     224     224
blake3-generic     153     152     154     153     151     153     153
blake3-sse2        391    1170    1366    1406    1428    1426    1414
blake3-sse41       352    1049    1212    1174    1262    1258    1259

Output on Debian 5.10.0-11-arm64 system: (Pi400)

implementation      1k      4k     16k     64k    256k      1m      4m
edonr-generic      487     603     629     639     643     641     641
skein-generic      271     299     303     308     309     309     307
sha256-generic     117     127     128     130     130     129     130
sha512-generic     145     165     170     172     173     174     175
blake3-generic      81      29      71      89      89      89      89
blake3-sse2        112     323     368     379     380     371     374
blake3-sse41       101     315     357     368     369     364     360

Structurally, the new code is mainly split into these parts:
- 1x cross platform generic c variant: blake3_generic.c
- 4x assembly for X86-64 (SSE2, SSE4.1, AVX2, AVX512)
- 2x assembly for ARMv8 (NEON converted from SSE2)
- 2x assembly for PPC64-LE (POWER8 converted from SSE2)
- one file for switching between the implementations

Note the PPC64 assembly requires the VSX instruction set and the
kfpu_begin() / kfpu_end() calls on PowerPC were updated accordingly.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Closes #10058
Closes #12918
2022-06-08 15:55:57 -07:00
Brian Behlendorf b9d98453f9 autoconf: AC_MSG_CHECKING consistency
Make the wording more consistent for the kernel AC_MSG_CHECKING
output (e.g. "checking whether ...".).  Additionally, group some
of the VFS interface checks with the others.  No functional change.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13529
2022-06-01 09:59:37 -07:00
Brian Behlendorf 4c6526208d Linux 5.19 compat: asm/fpu/internal.h
As of the Linux 5.19 kernel the asm/fpu/internal.h header was
entirely removed.  It has been effectively empty since the 5.16
kernel and provides no required functionality.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13529
2022-06-01 09:59:15 -07:00
Alexander Motin 42cf2ad0e4 Remove wrong assertion in log spacemap
It is typical, but not generally true that if log summary has more
blocks it must also have unflushed metaslabs.  Normally with metaslabs
flushed in order it works, but there are known exceptions, such as
device removal or metaslab being loaded during its flush attempt.

Before 600a02b884 if spa_flush_metaslabs() hit loading metaslab it
usually stopped (unless memlimit is also exceeded), but now it may
flush more metaslabs, just skipping that particular one.  This
increased chances of assertion to fire when the skipped metaslab is
flushed on next iteration if all other metaslabs in that summary
entry are already flushed out of order.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13486 
Closes #13513
2022-06-01 09:54:35 -07:00
Rich Ercolani bc8192cd5b Corrected parameters for zstd early abort
That'll teach me to try and recall them from the definition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13519
2022-05-31 15:41:33 -07:00
Allan Jude 2310dba9eb Fix typo in zil_commit() comment block
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #13518
2022-05-31 15:37:46 -07:00
Brian Behlendorf a70e613070 Linux 5.18 compat: META
Update the META file to reflect compatibility with the 5.18 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13527
2022-05-31 14:38:00 -07:00
Brian Behlendorf 91350681b8 Linux 5.19 compat: zap_flags_t conflict
As of the Linux 5.19 kernel an identically named zap_flags_t typedef
is declared in the include/linux/mm_types.h linux header.  Sadly,
the inclusion of this header cannot be easily avoided.  To resolve
the conflict a #define is used to remap the name in the OpenZFS
sources when building against the Linux kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-05-31 12:04:39 -07:00
Brian Behlendorf d41e864181 Linux 5.19 compat: bdev_start_io_acct() / bdev_end_io_acct()
As of the Linux 5.19 kernel the disk_*_io_acct() helper functions
have been replaced by the bdev_*_io_acct() functions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-05-31 12:04:35 -07:00
Brian Behlendorf c2c2e7bb8b Linux 5.19 compat: aops->read_folio()
As of the Linux 5.19 kernel the readpage() address space operation
has been replaced by read_folio().

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-05-31 12:04:31 -07:00
Brian Behlendorf a12a5cb5b8 Linux 5.19 compat: blkdev_issue_secure_erase()
Linux 5.19 commit torvalds/linux@44abff2c0 splits the secure
erase functionality from the blkdev_issue_discard() function.
The blkdev_issue_secure_erase() must now be issued to issue
a secure erase.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-05-31 12:04:26 -07:00
Brian Behlendorf e2c31f2bc7 Linux 5.19 compat: bdev_max_secure_erase_sectors()
Linux 5.19 commit torvalds/linux@44abff2c0 removed the
blk_queue_secure_erase() helper function.  The preferred
interface is to now use the bdev_max_secure_erase_sectors()
function to check for discard support.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-05-31 12:04:22 -07:00
Brian Behlendorf 5e4aedaca7 Linux 5.19 compat: bdev_max_discard_sectors()
Linux 5.19 commit torvalds/linux@70200574cc removed the
blk_queue_discard() helper function.  The preferred interface
is to now use the bdev_max_discard_sectors() function to check
for discard support.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-05-31 12:04:17 -07:00
Brian Behlendorf 5f264996f4 Linux 5.18 compat: bio_alloc()
As for the Linux 5.18 kernel bio_alloc() expects a block_device struct
as an argument.  This removes the need for the bio_set_dev() compatibility
code for 5.18 and newer kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13515
2022-05-31 12:04:03 -07:00
Kevin Jin 152d6fda54 Fix inflated quiesce time caused by lwb_tx during zil_commit()
In current zil_commit() process, transaction lwb_tx is assigned in
zil_lwb_write_issue(), and is committed in zil_lwb_flush_vdevs_done().
Thus, during lwb write out process, the txg is held in open or quiesing
state, until zil_lwb_flush_vdevs_done() is called. If the zil's zio
latency is high, it will cause txg_sync_thread() to starve.

The goal here is to defer waiting for zil_lwb_flush_vdevs_done to the
'syncing' txg state. That is, in zil_sync().

In this patch, it achieves the goal without holding transaction.
A new function zil_lwb_flush_wait_all() is introduced. It waits for
the completion of all the zil_lwb_flush_vdevs_done() by given txg.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes #12321
2022-05-26 09:36:14 -07:00
Brian Behlendorf d98a67a53a Replace EXTRA_DIST with dist_noinst_DATA
The EXTRA_DIST variable is ignored when used in the FALSE conditional
of a Makefile.am.  This results in the `make dist` target omitting
these files from the generated tarball unless CONFIG_USER is defined.
This issue can be avoided by switching to use the dist_noinst_DATA
variable which is handled as expected by autoconf.

This change also adds support for --with-config=dist as an alias
for --with-config=srpm and updates the GitHub workflows to use it.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13459
Closes #13505
2022-05-26 09:24:50 -07:00
Ryan Moeller b62829295e Silence unused-but-set-variable warning
This was breaking the kmod port build on FreeBSD with Clang 13.

Use the same trick as we do for ASSERT() to make DNODE_VERIFY() use
its parameter at compile time without actually using it at run time
in non-debug builds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13507
2022-05-25 17:26:59 -07:00
Alexander Motin 6aa8c21a2a More speculative prefetcher improvements
- Make prefetch distance adaptive: up to 4MB prefetch doubles for
every, hit same as before, but after that it grows by 1/8 every time
the prefetch read does not complete in time to satisfy the demand.
My tests show that 4MB is sufficient for wide NVMe pool to saturate
single reader thread at 2.5GB/s, while new 64MB maximum allows the
same thread to reach 1.5GB/s on wide HDD pool.  Further distance
increase may increase speed even more, but less dramatic and with
higher latency.

 - Allow early reuse of inactive prefetch streams: streams that never
saw hits can be reused immediately if there is a demand, while others
can be reused after 1s of inactivity, starting with the oldest.  After
2s of inactivity streams are deleted to free resources same as before.
This allows by several times increase strided read performance on HDD
pool in presence of simultaneous random reads, previously filling the
zfetch_max_streams limit for seconds and so blocking most of prefetch.

 - Always issue intermediate indirect block reads with SYNC priority.
Each of those reads if delayed for longer may delay up to 1024 other
block prefetches, that may be not good for wide pools.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13452
2022-05-25 10:12:52 -07:00
наб 1d89b989c1 automake: don't install /e/d/zfs or /e/z/zfs-functions +x
_SCRIPTS means it's made +x when installing; _DATA is made -x.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13496
Closes #13503
2022-05-25 09:29:47 -07:00
Paul Dagnelie 7829b465a7 Cancel in-progress rebuilds when we finish removal
This issue was discovered by zloop runs. When a mirror or other 
redundant top-level vdev has a disk failure, and the disk is replaced, 
the rebuild process occurs. A removal can happen while this is in 
progress. If the removal completes before the rebuild does, the 
removal process will try to free the vdev that is still in use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #13498
2022-05-25 09:25:13 -07:00
Umer Saleem b37093a188 rpm: Keep debug symbols if configured with '--enable-debuginfo'
Do not strip debug information from packages if '--enable-debuginfo' is
configured.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13500
2022-05-25 09:22:11 -07:00
Brian Behlendorf 61ef68727b Standardize RHEL version check in packages
This is a follow up to 3c35662299 which standardizes how the RHEL
version check is done.  This simpler "0%{?rhel}" check is used
elsewhere in the packages so we do the same here.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13501
2022-05-25 09:20:17 -07:00
Rich Ercolani 3bbc26097e Unbreak zstd build on sparc64
It turns out that wrapping the atomic macro in () breaks build
on Linux/SPARC64. Oops.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13506
2022-05-25 09:18:49 -07:00
Brian Behlendorf 82aa4f6f85 Switch sed -E to -r for better portability
GNU sed 4.1.2 does not support the -E flag and this version is used by
some cross-compiling tool chains.  Switch -E to -r which is understood.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13502
2022-05-25 09:13:51 -07:00
Neal Gompa (ニール・ゴンパ) 4dc1c8a0b8 rpm: Use the correct version-release information in dependencies
This tightly links the subpackages together and ensures that everything
is upgraded together.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Neal Gompa <ngompa@datto.com>
Closes #13489
2022-05-24 14:07:01 -07:00
Alexander Motin 84d0a03f3e Refactor Log Size Limit
Original Log Size Limit implementation blocked all writes in case of
limit reached until the TXG is committed and the log is freed.  It
caused huge delays and following speed spikes in application writes.

This implementation instead smoothly throttles writes, using exactly
the same mechanism as used for dirty data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: jxdking <lostking2008@hotmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Issue #12284
Closes #13476
2022-05-24 09:46:35 -07:00
Rich Ercolani f375b23c02 Tiered early abort, zstd edition
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13244
2022-05-24 09:43:22 -07:00
Ryan Moeller 2e05765006 FreeBSD: libspl: Add locking around statfs globals
Makes getmntent and getmntany thread-safe for external consumers of
libzfs zpool_disable_datasets, zfs_iter_mounted, libzfs_mnttab_update,
libzfs_mnttab_find.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13484
2022-05-24 09:40:20 -07:00
Rich Ercolani 3c35662299 Modified ncompress requirement in RPM to exclude RHEL9
The bug this was working around is no longer present.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13480 
Closes #13490
2022-05-24 09:39:32 -07:00
Brian Behlendorf cf70c0f8ae zed: Take no action on scrub/resilver checksum errors
When scrubbing/resilvering a pool it can be counter productive to
cancel the scan and kick of a replace operation to a hot spare
when encountering checksum errors.  In this case, the best course
of action is to allow the scrub/resilver to complete as quickly
as possible and to keep the vdevs fully online if possible.

Realistically, this is less of an issue for a RAIDZ since a
traditional resilver must be used and checksums will be verified.
However, this is not the case for a mirror or dRAID pool which is
sequentially resilvered and checksum verification is deferred
until after the replace operation completes.

Regardless, we apply this policy to all pool types since it's
a good idea for all vdevs.  Degrading additional vdevs has the
potential to make a bad situation worse.  Note the checksum
errors will still be reported as both an event and by
`zpool status`.  This change only prevents the ZED from
proactively taking any action.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13499
2022-05-24 09:36:07 -07:00
Brian Behlendorf 2cd0f98f4a Verify BPs in spa_load_verify_cb() and dsl_scan_visitbp()
We want `zpool import` to be highly robust and never panic, even
when encountering corrupt metadata.  This is already handled in the
arc_read() code path, which covers most cases, but spa_load_verify_cb()
relies on zio_read() and is responsible for verifying the block pointer.

During import it is also possible to encounter blocks pointers which
contain ZIO_COMPRESS_INHERIT and ZIO_CHECKSUM_INHERIT values.  Relax
the verification function slightly to allow this.

Futhermore, extend dsl_scan_recurse() to verify the block pointer
contents of level zero blocks which are not of type DMU_OT_DNODE or
DMU_OT_OBJSET.  This is handled by arc_read() in the other cases.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13124 
Closes #13360
2022-05-20 10:36:14 -07:00
Mark Johnston 03df6bad94 zdb: Fix handling of nul termination in symlink targets
The SA attribute containing the symlink target does not include a nul
terminator, so when printing the target zdb would sometimes include
garbage at the end of the string.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #13482
2022-05-20 10:32:49 -07:00
наб e02d84d3a5 linux: libshare: smb: don't swallow net(1) errors
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13191
Closes #13470
2022-05-18 15:56:38 -07:00
наб 2b4f2fc93c libzfs: return (allocated) strings instead of filling buffers
This also expands the zfs version output from 127 characters to However
Many Are Actually Set

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13330
2022-05-18 12:52:10 -07:00
наб 38f4d99f76 linux: libzfs: simplify module-loaded check
The short-path is now one access() call,
we always modprobe zfs (ZFS_MODULE_LOADING which doesn't use the libzfs
boolean parsing is gone),
and we use a simple inotify IN_CREATE loop with a timerfd timeout
rather than 10ms kernel-style polling

There's one substantial difference: ZFS_MODULE_TIMEOUT=-1
now means "never give up", rather than "wait 10 minutes"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13330
2022-05-18 12:51:42 -07:00
наб 89e81bc6ad Remove final K&R definitions
Clang trunk now warns -Wstrict-prototypes on this, and they're removed
in C2x

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13447
2022-05-18 12:10:47 -07:00
наб 6b575417e2 libspl/include: remove unused/empty headers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13447
2022-05-18 12:10:43 -07:00
наб 2f713390d1 kmodtool: cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13447
2022-05-18 12:10:39 -07:00
наб 7062a956f7 rpm: don't spec obsolete_name/version anymore
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13447
2022-05-18 12:10:35 -07:00
наб 7506f5af92 Add make regen-tests to regenerate the test bundle
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13447
2022-05-18 12:10:09 -07:00
heeplr 08b32c6fa9 zed: support subject as header in zed_notify_email()
Some minimal MUAs don't support passing the subjects as cmdline option.
This commit checks if "@SUBJECT@" is missing in ZED_EMAIL_OPTS and then
prepends a subject header to the notification message.
Also set a default for ${subject}.

Reviewed-by: Ahelenia Ziemia<C5><84>ska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Daniel Hiepler <d-git@coderdu.de>
Closes #13440
2022-05-18 10:27:53 -07:00
Andrew 00ac77464e Expose zpool guids through kstats
There are times when end-users may wish to have
a fast and convenient method to get zpool guid
without having to use libzfs. This commit
exposes the zpool guid via kstats in similar
manner to the zpool state.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #13466
2022-05-18 10:25:33 -07:00
Coleman Kane c0cf6ed679 Fix compiler warnings about zero-length arrays in inline bitops
The compiler appears to be expanding the unused NULL pointer into a
zero-length array via the inline bitops code. When -Werror=array-bounds
is used, this causes a build failure. Recommended solution is allocate
temporary structures, fill with zeros (to avoid uninitialized data use
warnings), and pass the pointer to those to the inline calls.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #13463 
Closes #13465
2022-05-17 13:07:39 -07:00
наб 276b08cb00 linux: libzutil: zfs_strip_path: only strip known prefixes
This mirrors FreeBSD:
  # zpool create -o cachefile= testpsko media/testpsko
  # zpool create -o cachefile= testpsko2 $PWD/testpsko2
  $ ./zpool list -v
  NAME                                              SIZE  ALLOC   FREE
  filling                                          25.5T  6.85T  18.6T
    mirror-0                                       3.64T   500G  3.15T
      ata-HGST_HUS726T4TALE6L4_V6K2L4RR                -      -      -
      ata-HGST_HUS726T4TALE6L4_V6K2MHYR                -      -      -
    raidz1-1                                       21.8T  6.36T  15.5T
      ata-HGST_HUS728T8TALE6L4_VDKT237K                -      -      -
      ata-HGST_HUS728T8TALE6L4_VDGY075D                -      -      -
      ata-HGST_HUS728T8TALE6L4_VDKVRRJK                -      -      -
  cache                                                -      -      -
    nvme0n1p4                                      63.0G  12.8G  50.2G
  tarta-boot                                        240M  50.0M   190M
    mirror-0                                        240M  50.0M   190M
      tarta-boot                                       -      -      -
      tarta-boot-nvme                                  -      -      -
  tarta-zoot                                       55.5G  6.96G  48.5G
    mirror-0                                       55.5G  6.96G  48.5G
      tarta-zoot                                       -      -      -
      tarta-zoot-nvme                                  -      -      -
  testpsko                                         39.5G   744K  39.5G
    media/testpsko1                                39.5G   744K  39.5G
  testpsko2                                        39.5G   130K  39.5G
    /home/nabijaczleweli/store/code/zfs/testpsko2  39.5G   130K  39.5G

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13413
Closes #9771
2022-05-16 15:56:57 -07:00
наб 5ac80603bd libzfs: constify zfs_strip_partition(), zfs_strip_path()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13413
2022-05-16 15:56:53 -07:00
наб ca3b105cd9 libzfs: pool: zpool_vdev_name: use libzfs_envvar_is_set
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13413
2022-05-16 15:56:48 -07:00
наб e9072c76f8 zpool: max_width: monomorphise subtype iteration
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13413
2022-05-16 15:56:27 -07:00
наб de82164518 linux: spl: generic: ddi_strto*: match solaris ddi_strto*(9)
Recognise initial whitespace, + in both cases,
and - also in unsigneds

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13434
2022-05-13 10:15:47 -07:00
наб 354a1bfb8e linux: spl: generic: ddi_strtou##type: elide unused flag
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13434
2022-05-13 10:15:44 -07:00
наб c25b281378 Remove hw_serial, ddi_strtoul()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13434
2022-05-13 10:15:31 -07:00
Mateusz Piotrowski 0c11a2738d Fix typos in zfs-bookmark examples
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Closes #13456
2022-05-12 09:34:24 -07:00
наб 53600767ee linux: libshare/nfs: don't do anything unless exportfs is available
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
Closes #13324
2022-05-12 09:27:12 -07:00
наб 086af23e68 linux: libshare/smb: cache smb_available
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:27:08 -07:00
наб 1ec9218faa libzfs: zfs_unshare: minor cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:27:04 -07:00
наб 88d5580e51 tests: add zfs_unshare_008_pos checking whitespace escaping
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:27:00 -07:00
наб 9b06aa634a libshare/nfs: escape mount points when needed
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
Closes #13153
2022-05-12 09:26:54 -07:00
наб a31fcd4bad linux: libshare/nfs: bsearch() over valid keys
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:50 -07:00
наб 5b14feec06 libzfs: mount: zfs_unshare: don't reallocate mountpoint
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:47 -07:00
наб b4d9a82f62 Replace libzfs sharing _nfs() and _smb() APIs with protocol lists
With the additional benefit of removing all the _all() functions and
treating a NULL list as "all" ‒ the remaining all function is for all
/datasets/, which is consistent with the rest of the API

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:42 -07:00
наб 471e9a108e Publish libshare protocols, use enum-based API
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:38 -07:00
наб 21d976a621 libshare: delineate obsolete errors
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:34 -07:00
наб 63ce6dd988 libshare: use AVL tree with static data, pass all data in arguments
This makes it so we don't leak a consistent 64 bytes anymore,
makes the searches simpler and faster, removes /all allocations/
from the driver (quite trivially, since they were absolutely needless),
and makes libshare thread-safe (except, maybe, linux/smb, but that only
does pointer-width loads/stores so it's also mostly fine, except for
leaking smb_shares)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:25 -07:00
наб 4ccbb23971 libshare: interface: {=> const} char *
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:22 -07:00
наб 2faf05612f libshare/smb: cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:18 -07:00
наб 566e4a58b7 libshare/nfs: destaticify nfs_lock_fd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:13 -07:00
наб ee668b8331 freebsd: libshare/nfs: write directly in translate_opts()
This renders it thread-safe

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:26:08 -07:00
наб 5f0c1c4ebd freebsd: libshare/nfs: constify static const data
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13165
2022-05-12 09:25:31 -07:00
Brian Behlendorf 45588be285 Add missing AC_MSG_RESULT(no) to configure
When the HAVE_IOPS_MKDIR_USERNS check fails output result
as required.
 
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13454
2022-05-12 09:12:32 -07:00
Brian Behlendorf 196e7c7617 ztest: reduce runtile of zloop.sh in CI
The zloop.sh script is primarily designed to randomly stress
the DMU and SPA layers.  This can result in some unrealistic
(or even impossible) scenarios being tested which then fail.

Since the longer we run zloop.sh the more likely this is to occur
this commit reduces the runtime.  The intention being that normally
this will result in a clean CI run unless the PR does introduce
serious breaking change.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13453
2022-05-12 09:11:29 -07:00
Rich Ercolani bd88c036e6 Added a workaround for Linux KASAN builds
Linux passes -Wframe-larger-than=1024, which breaks
our build in a number of places with -Werror.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13450
2022-05-11 13:26:55 -07:00
наб 5a142533f8 udev: zvol_id: simplify/modernise
zero-alloc, sensibler errors, don't close (or free) before exit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13337
2022-05-11 10:58:19 -07:00
наб 1f5bc12893 ztest: O_CLOEXEC ztest_fd_rand
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13411
2022-05-11 10:33:12 -07:00
наб 888914486e ztest: take -B ./path/to/ztest, LD_LIBRARY_PATH=./path/lib:$L_L_P
This changes the behaviour of -B from the illumos one which would,
in the example in the manual, take just ./chroots/lenny;
this, however, is more versatile, and scales much better for systems
with ZFS in /usr/local, for example

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13411
Closes #1770
2022-05-11 10:33:12 -07:00
наб 951a3889d1 tests: many_fds: simplify, modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13411
2022-05-11 10:33:12 -07:00
наб 510ee280c0 Remove enable_extended_FILE_stdio()
Even on Illumos it's only available in the 32-bit programming
environment, and, quoth enable_extended_FILE_stdio(3C):
> Historically, 32-bit Solaris applications have been limited to using
> only the file descriptors 0 through 255 with the standard I/O
> functions (see stdio(3C)) in the C library. The extended FILE
> facility allows well-behaved 32-bit applications to use any
> valid file descriptor with the standard I/O functions.
where "well-behaved" means that it
> does not directly access any fields in the FILE structure pointed
> to by the FILE pointer associated with any standard I/O stream,

And the stdio/flush.c implementation reads:
  /*
   * if this is not an internal extended FILE then check
   * if _file is being changed from underneath us.
   * It should not be because if
   * it is then then we lose our ability to guard against
   * silent data corruption.
   */
  if (!iop->__xf_nocheck && bad_fd > -1 && iop->_magic != bad_fd) {
      (void) fprintf(stderr,
          "Application violated extended FILE safety mechanism.\n"
          "Please read the man page for extendedFILE.\nAborting\n");
      abort();
  }

This appears to be an insane workaround for broken implementation with
exposed FILE internals and _file being an u8, both only on non-LP64;
it's shimmed out on all LP64 targets in Illumos,
and we shim it out as well: just get rid of it

This appears to've been originally fixed in illumos-gate
a5f69788de7ac07553de47f7fec8c05a9a94c105 ("PSARC 2006/162 Extended FILE
space for 32-bit Solaris processes", "1085341 32-bit stdio routines
should support file descriptors >255"), which also bears extendedFILE
and enable_extended_FILE_stdio(3C):
  -       unsigned char   _file;  /* UNIX System file descriptor */
  +       unsigned char   _magic; /* Old home of the file descriptor */
  +                               /* Only fileno(3C) can retrieve the
  				value now */
and
  +/*
  + * Macros to aid the extended fd FILE work.
  + * This helps isolate the changes to only the 32-bit code
  + * since 64-bit Solaris is not affected by this.
  + */
  +#ifdef  _LP64
  +#define        GET_FD(iop)             ((iop)->_file)
  +#define        SET_FILE(iop, fd)       ((iop)->_file = (fd))
  +#else
  +#define        GET_FD(iop)             \
  +               (((iop)->__extendedfd) ? _file_get(iop) : (iop)->_magic)
  +#define        SET_FILE(iop, fd)       (iop)->_magic = (fd); (iop)->__extendedfd = 0
  +#endif

Also remove the 1k setrlimit(NOFILE) calls: that's the default on Linux,
with 64k on Illumos and 171k on FreeBSD

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13411
2022-05-11 10:33:12 -07:00
szubersk e0911f7b7f autoconf: Fail when __copy_from_user_inatomic is a non-GPL symbol
A followup to 849c14e048
Fix https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1009242

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13389
2022-05-11 10:32:51 -07:00
Brian Atkinson f567d67fda Adding ZTS test for O_APPEND
Commit 63b18e4 fixed an issue in zpl_aio_write() to make sure that
kiocb->ki_pos was updated correctly when opening a file with O_APPEND.
Adding a test to verify O_APPEND functionality with lseek can make
sure that all other distros/kernel versions also have the correct
behavior.

Also moved the threadappends_001_pos test into this append test
directory in functional ZTS directory. This way the two append tests
are together for organization purposes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #13424
2022-05-11 08:38:16 -07:00
наб 3ed04d66aa Remove constrained path on clean
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:21:13 -07:00
наб e8ca724393 ztest: fix in-tree detection for automatic zdb path
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:21:09 -07:00
наб c6a5d7d997 ztest: use $ZDB instead of $ZDB_PATH for zdb
Which actually gets zdb as set in common.sh

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:21:04 -07:00
наб 1a58941275 scripts: zpool.sh: cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:21:00 -07:00
наб c982359460 cppcheck: explicitly exclude kernel code from userspace checks
Thus extracting the final shred of utility

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:55 -07:00
наб c1851bf081 Makefile: respect V=1 for checks
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:51 -07:00
наб 779ac93e96 autogen.sh: paper over automake <1.14's lack of %reldir% support
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:46 -07:00
наб 612b8dff5b contrib: bash_completion.d: install, fix dist
dist diff:
  -zfs-2.1.99/contrib/bash_completion.d/zfs

install diff:
  +destdir/usr/local/etc/bash_completion.d
  +destdir/usr/local/etc/bash_completion.d/zfs

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:40 -07:00
наб 0a9aaa7f0c cmd: move single-file binaries up, extract udev programs to udev/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:34 -07:00
наб eaf94bda6c Replace config/config.awk with simple sed invocation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:28 -07:00
наб ea04cc4a22 autoconf: use include directives instead of recursing down test data
We drop /multiple/ seconds off the generation, a dozen off a clean
rebuild, 185 files, and trivialise the distribution,
which can now be trivially generated via the provided snippets

Dist diff:
  -zfs-2.1.99/tests/zfs-tests/tests/functional/pam/utilities.kshlib
  +zfs-2.1.99/tests/zfs-tests/tests/functional/pam/utilities.kshlib.in

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:19 -07:00
наб 0425d58852 autoconf: use include directives instead of recursing down tests (mostly)
Only down to tests/zfs-tests/tests, but pull out C programs into the
main Makefile ‒ this means we get correct dependency tracking for all
programs (and parallelise across them)

dist diff:
  -zfs-2.1.99/tests/zfs-tests/tests/stress/
  -zfs-2.1.99/tests/zfs-tests/tests/stress/Makefile.am
  -zfs-2.1.99/tests/zfs-tests/tests/stress/Makefile.in

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:20:09 -07:00
наб 48f4379974 autoconf: use include directives instead of recursing down etc
dist diff:
  -zfs-2.1.99/etc/systemd/system/50-zfs.preset.in
  +zfs-2.1.99/etc/systemd/system/50-zfs.preset

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:19:58 -07:00
наб 50d2c9e4fd autoconf: use include directives instead of recursing down contrib
Also make the pyzfs build actually out-of-tree and quiet by default

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rapptz <rapptz@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:19:44 -07:00
наб 674a9f3727 autoconf: use include directives instead of recursing down udev
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:19:36 -07:00
наб 2820719800 autoconf: use include directives instead of recursing down rpm
Also simplify .gitignore

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:19:25 -07:00
наб 93a8c85e7b Move test-runner.1 into top-level man/
dist delta:
  +zfs-2.1.99/man/man1/test-runner.1
  -zfs-2.1.99/tests/test-runner/man/
  -zfs-2.1.99/tests/test-runner/man/test-runner.1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:19:18 -07:00
наб 0f6c4fd00e autoconf: use include directives instead of recursing down man
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:19:10 -07:00
наб 0a8b1fc625 autoconf: use include directives instead of recursing down scripts
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:19:02 -07:00
наб 09a7ad38a5 autoconf: single-step includes
Still descend, but only once: we get a lot of mileage out of nodist_

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:18:51 -07:00
наб 5cdca5b1da autoconf: use include directives instead of recursing down cmd
No installation diff, dist lost
  -zfs-2.1.99/cmd/fsck_zfs/fsck.zfs
which was distributed erroneously, since it's generated

Also clean gitrev on clean

Also add -e 'any possible bashisms' to default checkbashisms flags,
and fully parallelise it and shellcheck, and it works out-of-tree, too

Also align the Release in the dist META file correctly

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:18:38 -07:00
наб 3ff81c4601 cmd: zvol_id: don't build with -fno-stack-protector anymore
#569 was opened in 2012 and closed in 2015;
if the issue was still there we'd presumably've seen it?

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:18:29 -07:00
наб c8970f52ed autoconf: use include directives instead of recursing down lib
As a bonus, this also adds zfs-mount-generator (previously undescended
down) and libzstd (not included) to CppCheck

As a bonus bonus, abigail rules work out-of-tree, too

Against current trunk:
  $ diff -U0 ./destdir.listing ~/store/code/zfs/destdir.listing
  -destdir/usr/local/include/libspl/sscanf.h

  $ diff --color -U0 ./zfs-2.1.99.tar.gz.listing ../oot/zfs-2.1.99.tar.gz.listing | grep -v @@ | grep -v /Makefile
  -zfs-2.1.99/config/Abigail.am
  -zfs-2.1.99/lib/libspl/include/util/
  -zfs-2.1.99/lib/libspl/include/util/sscanf.h

  $ diff --color -U0 ./zfs-2.1.99.tar.gz.listing ../oot/zfs-2.1.99.tar.gz.listing | grep -v @@ | grep /Makefile
  -zfs-2.1.99/lib/libavl/Makefile.in
  -zfs-2.1.99/lib/libefi/Makefile.in
  -zfs-2.1.99/lib/libicp/Makefile.in
  -zfs-2.1.99/lib/libnvpair/Makefile.in
  -zfs-2.1.99/lib/libshare/Makefile.in
  -zfs-2.1.99/lib/libspl/include/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/freebsd/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/freebsd/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/freebsd/sys/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/freebsd/sys/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/linux/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/linux/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/linux/sys/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/linux/sys/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/Makefile.in
  -zfs-2.1.99/lib/libspl/include/rpc/Makefile.am
  -zfs-2.1.99/lib/libspl/include/rpc/Makefile.in
  -zfs-2.1.99/lib/libspl/include/sys/dktp/Makefile.am
  -zfs-2.1.99/lib/libspl/include/sys/dktp/Makefile.in
  -zfs-2.1.99/lib/libspl/include/sys/Makefile.am
  -zfs-2.1.99/lib/libspl/include/sys/Makefile.in
  -zfs-2.1.99/lib/libspl/include/util/Makefile.am
  -zfs-2.1.99/lib/libspl/include/util/Makefile.in
  -zfs-2.1.99/lib/libspl/Makefile.in
  -zfs-2.1.99/lib/libtpool/Makefile.in
  -zfs-2.1.99/lib/libunicode/Makefile.in
  -zfs-2.1.99/lib/libuutil/Makefile.in
  -zfs-2.1.99/lib/libzfsbootenv/Makefile.in
  -zfs-2.1.99/lib/libzfs_core/Makefile.in
  -zfs-2.1.99/lib/libzfs/Makefile.in
  -zfs-2.1.99/lib/libzpool/Makefile.in
  -zfs-2.1.99/lib/libzstd/Makefile.in
  -zfs-2.1.99/lib/libzutil/Makefile.in
  -zfs-2.1.99/lib/Makefile.in

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:18:11 -07:00
наб 6fc34371e1 libzfs: pool: fix false-positives -Wmaybe-uninitialised
As noted by gcc (Debian 10.2.1-6) 10.2.1 20210110

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:18:06 -07:00
наб be91239efa module: Makefile: cppcheck: zfs_config.h lives in builddir
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:18:02 -07:00
наб ae43120abc config: user: drop ZONENAME, avoid lying about being Linux-only
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:17:55 -07:00
наб 0fb1bf6586 libspl: include: remove [util/]sscanf.h
Unused, empty, installs in a weird location

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:17:49 -07:00
наб 841fd303b8 copy-builtin: add hooks with sed/>>
The order in fs/Makefile doesn't matter,
the order in fs/Kconfig is preserved (ext2 is included as the first
thing in the first if BUILD block, and only once), but I don't think it
matters much either, realistically

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:17:43 -07:00
наб c6f923fba4 autogen.sh: explicitly build its containing directory
No changes in currently-accepted usages (no-argument), but allows
  /src/path/autogen.sh && /src/path/configure
for simpler out-of-tree builds

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:17:36 -07:00
наб ea532b0e6e Remove compat spl-x.y.z from the distribution
This was added in 93ce2b4ca5 ("Update
build system and packaging"), which merged the SPL and ZFS trees,
and included in 0.8.0; "the next major release" was 2.0.0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:17:26 -07:00
наб 5fc332ae4b Makefile: zfs-tests lives in srcdir; zfs-tests: accept common.sh path
This fixes out-of-tree builds

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:16:47 -07:00
Rich Ercolani ecaccf764a Workaround broken VDEV_UPATH
Sometimes, for reasons I haven't looked into yet, VDEV_UPATH
gets set to /dev/(null), breaking all these scripts.

It'd be nice to have a fallback case to avoid total failure.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13436
2022-05-10 10:14:07 -07:00
Rich Ercolani a30927f763 Add workaround for broken Linux pipes
Linux has an unresolved hang if you resize a pipe with bytes
in it.

Since there's no obvious way to detect this happening, added a
workaround to disable resizing the pipe buffer if you set an
environment variable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13309
2022-05-09 16:33:55 -07:00
hping a18d13c200 abd_os: remove redundant refcount creation for abd_children
Refcount creation for abd_zero_scatter->abd_children is redundant in
abd_alloc_zero_scatter, as it has been done in abd_init_struct.

In addition, abd_children is undefined when ZFS_DEBUG is disabled, the
reference of abd_children in abd_alloc_zero_scatter breaks build of
libzpool when ZFS_DEBUG is disabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Ping Huang <huangping@smartx.com>
Closes #13429
2022-05-09 16:30:16 -07:00
наб d1fd6dbf6d man: zpool-import.8: -d -or -c
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13437
2022-05-09 16:03:17 -07:00
Aidan Harris 493b6e5607 Fix functions without a prototype
clang-15 emits the following error message for functions without
a prototype:

fs/zfs/os/linux/spl/spl-kmem-cache.c:1423:27: error:
  a function declaration without a prototype is deprecated
  in all versions of C [-Werror,-Wstrict-prototypes]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aidan Harris <me@aidanharr.is>
Closes #13421
2022-05-06 11:57:37 -07:00
наб e77d59ebb2 zpool: guard vs_noalloc and vs_pspace with VDEV_STAT_VALID()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13412
2022-05-05 09:45:12 -07:00
Rich Ercolani 7bf06f7262 Corrected edge case in uncompressed ARC->L2ARC handling
I genuinely don't know why this didn't come up before,
but adding the LZ4 early abort pointed out this flaw,
in which we're allocating a buffer of one size, and
then telling the compressor that we're handing it buffers
of a different size, which may be Very Different - say,
allocating 512b and then telling it the inputs are 128k.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13375
2022-05-04 11:59:30 -07:00
Mateusz Guzik 81b8b2d004 FreeBSD: use zero_region instead of allocating a dedicated page
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13406
2022-05-04 11:46:37 -07:00
WHR a894ae75b8 Fix incorrect use of unit prefix names in man pages
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #13363
2022-05-04 11:44:12 -07:00
Alexander Motin c55b293287 Improve mg_aliquot math
When calculating mg_aliquot alike to #12046 use number of unique data
disks in the vdev, not the total number of children vdev.  Increase
default value of the tunable from 512KB to 1MB to compensate.

Before this change each disk in striped pool was getting 512KB of
sequential data, in 2-wide mirror -- 1MB, in 3-wide RAIDZ1 -- 768KB.
After this change in all the cases each disk should get 1MB.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #13388
2022-05-04 11:33:42 -07:00
Brian Behlendorf 34dbc618f5 Reduce dbuf_find() lock contention
Holding a dbuf is a common operation which can become highly contended
in dbuf_find() when acquiring the dbuf hash mutex.  This is particularly
true on Linux when reading/writing volumes since by default up to 32
threads from the zvol_taskq may be taking a hold of the same dbuf.
This should also be observable on FreeBSD as long as there are enough
processes accessing the volume concurrently.

This is further aggregrated by the fact that only the block id will
be unique when calculating the dbuf hash for a single volume.  The
objset id, object id, and level will be the same for data blocks.
This has been observed to result in a somehwat less than uniform hash
distribution and a longer than expected max hash chain depth (~20)
on a large memory system (256 GB) using volumes.

This commit improves the siutation by switching the hash mutex to
an rwlock to allow concurrent lookups, and increasing DBUF_RWLOCKS
from 2048 to 8192 to further reduce the odds of a hash collision.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13405
2022-05-04 11:17:29 -07:00
Shaan Nobee 411f4a018d Speed up WB_SYNC_NONE when a WB_SYNC_ALL occurs simultaneously
Page writebacks with WB_SYNC_NONE can take several seconds to complete 
since they wait for the transaction group to close before being 
committed. This is usually not a problem since the caller does not 
need to wait. However, if we're simultaneously doing a writeback 
with WB_SYNC_ALL (e.g via msync), the latter can block for several 
seconds (up to zfs_txg_timeout) due to the active WB_SYNC_NONE 
writeback since it needs to wait for the transaction to complete 
and the PG_writeback bit to be cleared.

This commit deals with 2 cases:

- No page writeback is active. A WB_SYNC_ALL page writeback starts 
  and even completes. But when it's about to check if the PG_writeback 
  bit has been cleared, another writeback with WB_SYNC_NONE starts. 
  The sync page writeback ends up waiting for the non-sync page 
  writeback to complete.

- A page writeback with WB_SYNC_NONE is already active when a 
  WB_SYNC_ALL writeback starts. The WB_SYNC_ALL writeback ends up 
  waiting for the WB_SYNC_NONE writeback.

The fix works by carefully keeping track of active sync/non-sync 
writebacks and committing when beneficial.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shaan Nobee <sniper111@gmail.com>
Closes #12662
Closes #12790
2022-05-03 13:23:26 -07:00
Pawel Jakub Dawidek a64d757aa4 FreeBSD: Clean up the use of ioflags
- Prefer O_* flags over F* flags that mostly mirror O_* flags anyway,
  but O_* flags seem to be preferred.
- Simplify the code as all the F*SYNC flags were defined as FFSYNC flag.
- Don't define FRSYNC flag, so we don't generate unnecessary ZIL commits.
- Remove EXCL define, FreeBSD ignores the excl argument for zfs_create()
  anyway.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #13400
2022-05-02 16:26:28 -07:00
Jitendra Patidar 159c6fd154 Add missing replay entry in zvol_replay_vector for TX_SETSAXATTR
Commit 361a7e8 (log xattr=sa create/remove/update to ZIL) introduced a
TX_SETSAXATTR, but missed to add a corresponding entry in
zvol_replay_vector. Adding a missing replay entry in zvol_replay_vector.

Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #13396
Closes #13395
2022-05-02 11:01:26 -07:00
Brian Behlendorf 1bf3abc634 Silence unused-but-set-variable warnings
Clang 13.0.0 added support for `Wunused-but-set-parameter` and
`-Wunused-but-set-variable` which correctly detects two unused
variables in zstd resulting in a build failure.  This commit
annotates these instances accordingly.

  https://releases.llvm.org/13.0.1/tools/clang/docs/ReleaseNotes.html#id6

In FSE_createCTable(), malloc() is intentionally defined as NULL when
compiled in the kernel so the variable is unused.

  zstd/lib/compress/fse_compress.c:307:12: error: variable 'size'
  set but not used [-Werror,-Wunused-but-set-variable]

Additionally, in ZSTD_seqDecompressedSize() the assert is compiled
out similarly resulting in an unused variable.

  zstd/lib/compress/zstd_compress_superblock.c:412:12: error: variable
  'litLengthSum' set but not used [-Werror,-Wunused-but-set-variable]

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13382
2022-04-29 14:21:11 -07:00
Rich Ercolani f2330bd156 Default zfs_max_recordsize to 16M
Increase the default allowed maximum recordsize from 1M to 16M.
As described in the zfs(4) man page, there are significant costs
which need to be considered before using very large blocks.
However, there are scenarios where they make good sense and
it should no longer be necessary to artificially restrict their
use behind a module option.

Note that for 32-bit platforms we continue to leave this
restriction in place due to the limited virtual address space
available (256-512MB).  On these systems only a handful
of blocks could be cached at any one time severely impacting
performance and potentially stability.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12830
Closes #13302
2022-04-28 15:12:24 -07:00
Brian Behlendorf 63b18e4097 Fix O_APPEND for Linux 3.15 and older kernels
When using a Linux kernel which predates the iov_iter interface the
O_APPEND flag should be applied in zpl_aio_write() via the call to
generic_write_checks().  The updated pos variable  was incorrectly
ignored resulting in the current offset being used.

This issue should only realistically impact the RHEL/CentOS 7.x
kernels which are based on Linux 3.10.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13370 
Closes #13377
2022-04-27 12:56:17 -07:00
Satadru Pramanik 7dde17e860 Linux 5.18 compat: replace __set_page_dirty_nobuffers
Replace __set_page_dirty_nobuffers with filemap_dirty_folio.

Upstream-commit: 6b1f86f8e9c7f9de7ca1cb987b2cf25e99b1ae3a
("Merge tag 'folio-5.18b' of
git://git.infradead.org/users/willy/pagecache ")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Authored-by: Satadru Pramanik <satadru@gmail.com>
Signed-off-by: Satadru Pramanik <satadru@gmail.com>
Closes #13325
Closes #13380
2022-04-27 12:54:17 -07:00
наб 2a70a09072 zfs: holds: dequadratify
Before:
  15  0m0.177s
  30  0m0.653s
  45  0m1.289s
  60  0m2.129s
  75  0m3.264s
  90  0m4.397s
  100 0m5.996s
  117 0m8.552s

After:
  30  0m0.053s
  117 0m0.125s

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13372
Closes #13373
2022-04-27 11:55:40 -07:00
наб 55d36e47c8 zfs: holds: general cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13373
2022-04-27 11:55:25 -07:00
Damian Szuberski 849c14e048 PPC get_user workaround
Linux 5.12 PPC 5.12 get_user() and __copy_from_user_inatomic()
inline helpers very indirectly include a reference to the GPL'd
array mmu_feature_keys[] and fails to build. Workaround this by
using copy_from_user() and throwing EFAULT for any calls to
__copy_from_user_inatomic(). This is a workaround until a fix
for Linux commit 7613f5a66becfd0e43a0f34de8518695888f5458
"powerpc/64s/kuap: Use mmu_has_feature()" is fully addressed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #11958
Closes #12590
Closes #13367
2022-04-26 10:52:40 -07:00
Damian Szuberski a0dfd98a25 autoconf: Pretend CONFIG_MODULES is always on
- Unconditionally inject `CONFIG_MODULES` make variable
  and `#define CONFIG_MODULES` to Kbuild in `ZFS_LINUX_COMPILE`
  autoconf function to emulate loadable kernel modules support.
  This allows OpenZFS to perform Linux checks despite
  `CONFIG_MODULES=n` in the actual Linux config.

- Add `ZFS_AC_KERNEL_CONFIG_MODULES` check which encompasses
  the logic from `ZFS_AC_KERNEL_TEST_MODULE` with additional
  diagnostic messages to the user

- Removed `ZFS_AC_KERNEL_TEST_MODULE` as it merely duplicates
  every check in `ZFS_AC_KERNEL_CONFIG_DEFINED`

- Moved `ZFS_AC_MODULE_SYMVERS` after `ZFS_AC_KERNEL_CONFIG_DEFINED`
  so the user has a chance to see the proper diagnostic from the
  steps before.

A workaround for Linux's

```
commit 3e3005df73b535cb849cf4ec8075d6aa3c460f68
Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Wed Mar 31 22:38:03 2021 +0900

kbuild: unify modules(_install) for in-tree and external modules

If you attempt to build or install modules ('make modules(_install)'
with CONFIG_MODULES disabled, you will get a clear error message, but
nothing for external module builds.

Factor out the modules and modules_install rules into the common part,
so you will get the same error message when you try to build external
modules with CONFIG_MODULES=n.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #10832
Closes #13361
2022-04-26 10:47:09 -07:00
Alexander Motin 600a02b884 Improve log spacemap load time
Previous flushing algorithm limited only total number of log blocks to
the minimum of 256K and 4x number of metaslabs in the pool.  As result,
system with 1500 disks with 1000 metaslabs each, touching several new
metaslabs each TXG could grow spacemap log to huge size without much
benefits.  We've observed one of such systems importing pool for about
45 minutes.

This patch improves the situation from five sides:
 - By limiting maximum period for each metaslab to be flushed to 1000
TXGs, that effectively limits maximum number of per-TXG spacemap logs
to load to the same number.
 - By making flushing more smooth via accounting number of metaslabs
that were touched after the last flush and actually need another flush,
not just ms_unflushed_txg bump.
 - By applying zfs_unflushed_log_block_pct to the number of metaslabs
that were touched after the last flush, not all metaslabs in the pool.
 - By aggressively prefetching per-TXG spacemap logs up to 16 TXGs in
advance, making log spacemap load process for wide HDD pool CPU-bound,
accelerating it by many times.
 - By reducing zfs_unflushed_log_block_max from 256K to 128K, reducing
single-threaded by nature log processing time from ~10 to ~5 minutes.

As further optimization we could skip bumping ms_unflushed_txg for
metaslabs not touched since the last flush, but that would be an
incompatible change, requiring new pool feature.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12789
2022-04-26 10:44:21 -07:00
George Amanakis 0409d33273 Improve zpool status output, list all affected datasets
Currently, determining which datasets are affected by corruption is
a manual process.

The primary difficulty in reporting the list of affected snapshots is
that since the error was initially found, the snapshot where the error
originally occurred in, may have been deleted. To solve this issue, we
add the ID of the head dataset of the original snapshot which the error
was detected in, to the stored error report. Then any time a filesystem
is deleted, the errors associated with it are deleted as well. Any time
a clone promote occurs, we modify reports associated with the original
head to refer to the new head. The stored error reports are identified
by this head ID, the birth time of the block which the error occurred
in, as well as some information about the error itself are also stored.

Once this information is stored, we can find the set of datasets
affected by an error by walking back the list of snapshots in the given
head until we find one with the appropriate birth txg, and then traverse
through the snapshots of the clone family, terminating a branch if the
block was replaced in a given snapshot. Then we report this information
back to libzfs, and to the zpool status command, where it is displayed
as follows:

 pool: test
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:00:00 with 800 errors on Fri Dec  3
08:27:57 2021
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          sdb       ONLINE       0     0 1.58K

errors: Permanent errors have been detected in the following files:

        test@1:/test.0.0
        /test/test.0.0
        /test/1clone/test.0.0

A new feature flag is introduced to mark the presence of this change, as
well as promotion and backwards compatibility logic. This is an updated
version of #9175. Rebase required fixing the tests, updating the ABI of
libzfs, updating the man pages, fixing bugs, fixing the error returns,
and updating the old on-disk error logs to the new format when
activating the feature.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Co-authored-by: TulsiJain <tulsi.jain@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #9175
Closes #12812
2022-04-25 17:25:42 -07:00
наб 95146fd7ba tests: cli_user: zfs_001_neg: print the problematic lines
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13352
2022-04-25 12:49:56 -07:00
наб 6625262c70 man: zfs-send.8: fix -X synopses and description
Also clean up the horrendously verbose -X handling in zfs_main()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13352
2022-04-25 12:46:22 -07:00
Richard Laager 7eba3891e9 zvol_wait: Ignore locked zvols
"When an encrypted zvol is locked the zfs-volume-wait service does not
start.  The /sbin/zvol_wait should not wait for links when the volume
has property keystatus=unavailable."
-- https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1888405

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Thanks: James Dingwall <james-launchpad@dingwall.me.uk>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10662
2022-04-22 15:37:03 -07:00
наб 0cdda2edb3 Linux 5.18 compat: kobj_type.default_attrs replaced with default_groups
Upstream-commit: cdb4f26a63c391317e335e6e683a614358e70aeb ("kobject:
 kobj_type: remove default_attrs")
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13357
2022-04-22 14:27:10 -07:00
наб 520a7c8a85 linux: module: zfs: sysfs: constify types and attrs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13357
2022-04-22 14:27:02 -07:00
наб c8871ff784 scripts: zfs.sh: explicitly unload all modules via rmmod
modprobe -r only works for depmodded modules, but this also means we
have to re-iterate legacy modules, and in the right order

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13356
2022-04-21 16:33:12 -07:00
наб bee24d6929 scripts: zfs.sh: explicitly ignore unloaded modules when unloading
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13356
2022-04-21 16:32:28 -07:00
Damian Szuberski 2f31b585e7 Strengthen Linux kernel capabilities detection
- Add `CONFIG_BLOCK` Linux config requirement to
  `ZFS_AC_KERNEL_CONFIG_DEFINED`. OpenZFS won't compile without
  that block device support due to large amount of functional
  dependencies on it.

- Remove dependency on `groups_alloc()` in
  `ZFS_AC_KERNEL_SRC_GROUP_INFO_GID` to circumvent the missing stub
  in Linux 4.X kernel headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13351
2022-04-21 09:37:11 -07:00
наб 47a02e3972 contrib: dracut: remove getargbool polyfill
It was originally released in dracut 008 in February 2011;
we can probably drop it now

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:54 -07:00
наб e3fc330d6c Add dracut.zfs.7
Thorough documentation with a dracut.bootup(7)-style flowchart,
dracut.cmdline(7)-style cmdline listing,
and per-file docs like the old README

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:47 -07:00
наб 1cc9cc2f89 contrib: dracut: zfs-needshutdown: don't list
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:39 -07:00
наб 6ebdb0b20d contrib: dracut: zfs-{rollback,snapshot}-bootfs: order after key loading
This fixes at least one race I got with an encrypted root

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:32 -07:00
наб 30c6dce7f7 contrib: dracut: don't require essentials to be under the same encroot
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:25 -07:00
наб eaf1e06045 contrib: dracut: inline single-use import_pool, move single-use ask_for_password
Also don't set ROOTFS_MOUNTED; the final mention was removed in dracut
011 from July 2011

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:18 -07:00
наб dac0b0785a contrib: dracut: zfs-lib: remove find_bootfs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:11 -07:00
наб 5d31169d7c contrib: dracut: zfs-lib: simplify ask_for_password
The only user is mount-zfs.sh (non-systemd systems),
so reduce it to what it needs

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:45:04 -07:00
наб fec2c613a4 contrib; dracut: flatten zfs-load-key, simplify zfs-env-bootfs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:44:55 -07:00
наб 245529d85f contrib; dracut: centralise root= parsing, actually support root=s
So far, everything parsed root= manually, which meant that while
zfs-parse.sh was updated, and supposedly supported + -> ' ' conversion,
it meant nothing

Instead, centralise parsing, and allow:
  root=
  root=zfs
  root=zfs:
  root=zfs:AUTO

  root=ZFS=data/set
  root=zfs:data/set
  root=zfs:ZFS=data/set (as a side-effect; allowed but undocumented)

  rootfstype=zfs AND root=data/set <=> root=data/set
  rootfstype=zfs AND root=         <=> root=zfs:AUTO

So rootfstype=zfs /also/ behaves as expected, and + decoding works

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:44:47 -07:00
наб 2c74617bcf contrib: dracut: parse-zfs: stop pretending we support FILESYSTEM=
It was added in the original ae26d0465a ("Add dracut support") commit
in 2011, and was then broken a bit later with the advent of
dracut-zfs-generator, or maybe earlier as part of other churn

Either way, it's broken, and has been in 2.0+ as well, and no-one
complained. Stop pretending we support it at all

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:44:37 -07:00
наб 47636f5661 contrib: dracut: parse-zfs: drop initqueue-finished for i/f
The switch was released in dracut 009 in March 2011,
we can safely get rid of the compatibility hook

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13291
2022-04-20 16:44:25 -07:00
Rich Ercolani b657f2c592 Corrected oversight in ZERO_RANGE behavior
It turns out, no, in fact, ZERO_RANGE and PUNCH_HOLE do
have differing semantics in some ways - in particular,
one requires KEEP_SIZE, and the other does not.

Also added a zero-range test to catch this, corrected a flaw
that made the punch-hole test succeed vacuously, and a typo
in file_write.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13329 
Closes #13338
2022-04-20 16:07:03 -07:00
Alexander Motin 9209ea69bc FreeBSD: Fix translation from ABD to physical pages
In hypothetical case of non-linear ABD with single segment, multiple
to page size but not aligned to it, vdev_geom_fill_unmap_cb() could
fill one page less into bio_ma array.

I am not sure it is exploitable, but better to be safe than sorry.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reported-by: Mark Johnston <markj@FreeBSD.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #13345
2022-04-20 16:05:38 -07:00
наб e37e7dd6a6 man: ... -> … again
zfs-program.8 is left, but that's literal Lua syntax

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13255
2022-04-20 14:31:10 -07:00
Damian Szuberski b1fb4e1ba4 rpm -> deb doesn't fail when optional packages are missing
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13331
Closes #13336
2022-04-20 13:43:42 -07:00
наб 92295af800 Document zfs inherit -S's interaction with noninheritable properties
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11894
Closes #13335
2022-04-20 13:39:11 -07:00
Low-power 115a92ca6a zpool_history_unpack: return correct errno on nvlist_unpack failure
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #13321
2022-04-20 13:35:44 -07:00
наб 9b80d9e6f9 linux: module: uninstall legacy modules on (un)installation
This can be reverted once we're sure nobody's using them anymore
(post-3.0 release?)

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13274
2022-04-20 13:28:54 -07:00
наб 469b8481aa scripts: zfs.sh: unload zfs with dependencies
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13274
2022-04-20 13:28:48 -07:00
наб d18da59d30 scripts: zfs.sh: make usage make sense
We don't pass the arguments as arguments

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13274
2022-04-20 13:28:39 -07:00
наб f6f505c4d6 scripts: zfs.sh: remove cat
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13274
2022-04-20 13:28:28 -07:00
наб ad9e767657 linux: module: weld all but spl.ko into zfs.ko
Originally it was thought it would be useful to split up the kmods
by functionality.  This would allow external consumers to only load
what was needed.  However, in practice we've never had a case where
this functionality would be needed, and conversely managing multiple
kmods can be awkward.  Therefore, this change merges all but the
spl.ko kmod in to a single zfs.ko kmod.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13274
2022-04-20 13:28:24 -07:00
Allan Jude 310ab9d261 Improve the inline descriptions of the ARC module parameters
These are displayed as the descriptions of the sysctl's on FreeBSD

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #13334
2022-04-20 13:16:25 -07:00
Paul Dagnelie 4d46272913 Fix style checking error introduced by zfs-mount changes
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #13347
2022-04-19 13:55:04 -07:00
Brian Behlendorf 026f126b83 Linux 5.17 compat: GENHD_FL_EXT_DEVT / GENHD_FL_NO_PART_SCAN
As of the 5.17 kernel the GENHD_FL_EXT_DEVT flag has been removed
and the GENHD_FL_NO_PART_SCAN flag renamed GENHD_FL_NO_PART. Update
zvol_alloc() to set GENHD_FL_NO_PART for the newer kernels which
is sufficient.  The behavior for prior kernels remains unchanged.

1ebe2e5f ("block: remove GENHD_FL_EXT_DEVT")
46e7eac6 ("block: rename GENHD_FL_NO_PART_SCAN to GENHD_FL_NO_PART")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13294
Closes #13297
2022-04-19 10:38:04 -07:00
omni ec4d860eb0 init.d/zfs-mount: Don't fsck or mount/umount fstab entries
This is better handled by existing OS toolset.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: omni <omni+vagant@hack.org>
Issue #7374
Closes #12780
2022-04-15 14:26:44 -07:00
Low-power 4dced31b98 Fix 'zpool history' sometimes fails without reporting any error
The corresponding function 'zpool_get_history' in libzfs would printing
an error messages only when the ioctl call failed.

Add missing error reporting, specifically memory allocation failures
and error from 'zpool_history_unpack'.

Also avoid possibly reading of uninitialized 'err' variable in case
the requested offset pasts EOF.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: WHR <msl0000023508@gmail.com>
Issue #13322 
Closes #13320
2022-04-15 14:16:07 -07:00
наб 0dd34a1955 libzfs: import: zpool_clear_label: bool for boolean status
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13304
2022-04-13 11:37:18 -07:00
наб 74e4bfbcff libzfs: import: zpool_clear_label: don't allocate another time for L2ARC header
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13304
2022-04-13 11:37:10 -07:00
наб a4e0cee178 libzfs: import: zpool_clear_label: actually fail if clearing l2arc header fails
Found with -Wunused-but-set-variable on Clang trunk

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13304
2022-04-13 11:37:03 -07:00
наб 16c3290bbe module: zfs: vdev_removal: remove unused num_indirect
Found with -Wunused-but-set-variable on Clang trunk

Fixes: a1d477c24c ("OpenZFS 7614, 9064 - zfs device evacuation/removal")
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13304
2022-04-13 11:36:47 -07:00
наб 63cb3413ea tests: cmd: draid: remove unused and undocumented -v
Found with -Wunused-but-set-variable on Clang trunk

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13304
2022-04-13 11:36:29 -07:00
szubersk 4620342a32 zfs(8): remove errors reported by mandoc 1.14.6
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13315
2022-04-13 11:34:46 -07:00
szubersk 4f0be2bdb0 zfs(8): add requirements towards fs names
Provide explicit requirements towards file system naming convention
in OpenZFS man pages.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Mitigates #13310
Closes #13315
2022-04-13 11:34:28 -07:00
Brian Behlendorf 3d149db1f3 ZTS: Retry auto_spare_multiple.ksh
The auto_spare_multiple.ksh test may incorrectly fail for a similar
reason as the auto_spare_shared.ksh test.  Add it to known list of
exceptions which should be retried to prevent failures in the CI.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13318
2022-04-13 09:44:54 -07:00
Brian Behlendorf 5846f71182 ZTS: Retry redundancy_draid_spare[1,3].ksh
The redundancy_draid_spare1.ksh and redundancy_draid_spare3.ksh test
cases are a little to strict for the sequential resilver case.  While
unlikely it is possible that a handful of correctable checksum errors
will be reported resulting in a test failure.  Update the zts-report.py
script to allow this the test case to be retried if requested.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13318
2022-04-13 09:44:33 -07:00
Mark Johnston 7dcb8ed23d FreeBSD: Return Mach error codes from VOP_(GET|PUT)PAGES
FreeBSD's memory management system uses its own error numbers and gets
confused when these VOPs return EIO.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reported-by: Peter Holm <pho@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #13311
2022-04-13 09:43:15 -07:00
Mark Johnston e9084d0712 FreeBSD: Parameterize ZFS_ENTER/ZFS_VERIFY_VP with an error code
For legacy reasons, a couple of VOPs have to return error numbers that
don't come from the usual errno namespace.  To handle the cases where
ZFS_ENTER or ZFS_VERIFY_ZP fail, we need to be able to override the
default error return value of EIO.  Extend the macros to permit this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #13311
2022-04-13 09:42:51 -07:00
Damian Szuberski 35d81a75a8 initramfs: use mount.zfs instead of mount
A followup to d7a67402a8

For `mount -t zfs -o opts ds mp` command line
some implementations of `mount(8)`, e. g. Busybox in Debian
work as follows:

```
newfstatat(AT_FDCWD, "ds", 0x7fff826f4ab0, 0) = -1
mount("ds", "mp", "zfs", MS_SILENT, NULL) = 0
```

The logic above skips completely `mount.zfs` and prevents us
from reading filesystem properties and applying mount options.

For comparison, the coreutils `mount(8)` implementation does:

```
openat(AT_FDCWD, "/proc/filesystems", O_RDONLY|O_CLOEXEC) = 3
// figure out that zfs is a `nodev` filesystem and look for a helper
newfstatat(AT_FDCWD, "/sbin/mount.zfs" ...) = 0
execve("/sbin/mount.zfs" ...) = 0
```

Using `mount.zfs` in initramfs would help circumvent deficiencies
of some of `mount(8)` implementations. `mount -t zfs` translates
to `mount.zfs` invocation, except for cases when explicitly disabled
by `-i`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13305
2022-04-11 15:51:23 -07:00
наб ed715283de cmd: zed: rc: drop "should be owned by root and 0600"
It doesn't matter, 0600 are Weird Permissions, and it's even weirder to
spec them for no reason ‒ it's perfectly fine if it's the usual 0:0 644,
or literally anything else, so long as unprivileged users can't edit it
(which (a) 644 accomplishes and (b) is at the administrator's
 discretion, it's not unheard of to have adm users and having it
 be 664 in that case is just as good; it's not our place to say)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12544
Closes #13276
2022-04-05 13:50:55 -07:00
Jorgen Lundman 4d972ab5ae Prefer ATTR_ in shared codebase over AT_
An earlier commit introduces AT_MODE into the shared kernel sources,
instead of the preferred existing ATTR_MODE use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Jorgen lundman <lundman@lundman.net>
Closes #13293
2022-04-05 13:02:17 -07:00
наб e3e4c30b0a libzfs: sendrecv: use common progress thread killer
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13284
2022-04-05 09:46:07 -07:00
наб 1217fd1ff8 libzfs: sendrecv: always cancel progress thread in zfs_send_one()
This is in line with all the other uses of the progress thread

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11560
Closes #13284
2022-04-05 09:45:55 -07:00
наб 3c69b539fe Forbid asctime{,_r}(), gmtime(), localtime()
ctime() is only used in binary main threads, which is fine

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13284
2022-04-05 09:45:46 -07:00
наб 9185303b32 libspl: zed: event: use localtime_r()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13284
2022-04-05 09:45:39 -07:00
наб ae3683ce11 libspl: print_timestamp: use localtime_r()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13284
2022-04-05 09:45:33 -07:00
наб 8ab279484f libzfs: sendrecv: send_progress_thread: use localtime_r()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13284
2022-04-05 09:45:24 -07:00
наб d5036491e6 zdb: standardise on ctime()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13284
2022-04-05 09:45:05 -07:00
наб 95babbe573 scripts: cstyle: remove unused -h
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13285
2022-04-05 09:36:13 -07:00
наб 14ed1f2424 cstyle.1: note -g, $CI being equivalent to -g
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13285
2022-04-05 09:36:06 -07:00
наб 93f3a3517c scripts: cstyle: remove unused -C
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13285
2022-04-05 09:35:46 -07:00
наб 9292cf761e Fix string/index variables being unflexed by default
I got the status backward (B_FALSE for fixed, rather than B_TRUE for
flex); before:

  $ zfs get mountpoint tarta-zoot -r
  NAME                                 PROPERTY    VALUE       SOURCE
  tarta-zoot                           mountpoint  /           local
  tarta-zoot/PAGEFILE.SYS              mountpoint  -           -
  tarta-zoot/etc                       mountpoint  /etc        inherited from tarta-zoot
  tarta-zoot/home                      mountpoint  /home       inherited from tarta-zoot
  tarta-zoot/home/xspon                mountpoint  /home/xspon  inherited from tarta-zoot
  tarta-zoot/home/nabijaczleweli       mountpoint  /home/nabijaczleweli  inherited from tarta-zoot
  tarta-zoot/home/nabijaczleweli/tftp  mountpoint  /home/nabijaczleweli/tftp  inherited from tarta-zoot
  tarta-zoot/home/root                 mountpoint  /root       local

after:

  $ zfs get mountpoint tarta-zoot -r
  NAME                                 PROPERTY    VALUE                      SOURCE
  tarta-zoot                           mountpoint  /                          local
  tarta-zoot/PAGEFILE.SYS              mountpoint  -                          -
  tarta-zoot/etc                       mountpoint  /etc                       inherited from tarta-zoot
  tarta-zoot/home                      mountpoint  /home                      inherited from tarta-zoot
  tarta-zoot/home/xspon                mountpoint  /home/xspon                inherited from tarta-zoot
  tarta-zoot/home/nabijaczleweli       mountpoint  /home/nabijaczleweli       inherited from tarta-zoot
  tarta-zoot/home/nabijaczleweli/tftp  mountpoint  /home/nabijaczleweli/tftp  inherited from tarta-zoot
  tarta-zoot/home/root                 mountpoint  /root                      local

Fixes: be8e1d81bf ("Flex
 non-pretty-printed properties and raw-/pretty-print remaining ones")
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13286
2022-04-05 09:34:46 -07:00
Riccardo Schirone 7d524c068d Linux 5.18 compat: use address_space_operations->readahead
->readpages was removed and replaced by ->readahead. Define
zpl_readahead for kernels that don't have ->readpages.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Riccardo Schirone <rschirone91@gmail.com>
Closes #13278
2022-04-04 09:35:24 -07:00
Riccardo Schirone 036e846abc Linux 5.18 compat: blkg_tryget is moved to private headers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Riccardo Schirone <rschirone91@gmail.com>
Closes #13278
2022-04-04 09:35:11 -07:00
Ryan Moeller b61507ec1d FreeBSD: Use NDFREE_PNBUF if available
NDF_ONLY_PNBUF has been removed from FreeBSD in favor of NDFREE_PNBUF.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13277
2022-04-02 12:10:55 -07:00
наб 533aef2b3c tests: zts-report: add #13215 to known list for FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:03:32 -07:00
наб e78b7a73fe tests: zts-report: issue numbers are numbers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:03:27 -07:00
наб 6ef2151c80 tests: clean out more temporary files
What remains is a bunch of anonymous untraceable /tmp/tmp.XXXXXXXXXX
files and bak.root.receive.staff1.3835 from an error branch, testdir.1,
testdir.3, and testroot454470 (with children) in testroot

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:03:21 -07:00
наб 22ddc7f5ff scripts: zfs-tests: fix setup script detection when ran with -t /absolute
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:03:14 -07:00
наб fbe811b054 tests: remove unused functions
As found by
  git -C tests/ grep ^function | grep -vFe '.lua:' -e '.zcp:' | while IFS=":$IFS" read -r _ _ fn _; do [ $(git -C tests/ grep -wF $fn | head -2 | wc -l) -eq 1 ] && echo $fn; done
after all rounds this comes out to, sorted:
  check_slog_state
  chgusr_exec
  cksum_files
  cleanup_pools
  compare_modes
  count_ACE
  dataset_set_defaultproperties
  ds_is_snapshot
  get_ACE
  get_group
  get_min
  get_mode
  get_owner
  get_rand_checksum
  get_rand_checksum_any
  get_rand_large_recsize
  get_rand_recsize
  get_user_group
  getitem
  indirect_vdev_mapping_size
  is_dilos
  log_noresult
  log_notinuse
  log_other
  log_timed_out
  log_uninitiated
  log_warning
  num_jobs_by_cpu
  plus_sign_check_l
  plus_sign_check_v
  record_cksum
  rwx_node
  seconds_mmp_waits_for_activity
  set_cur_usr
  setup_mirrors
  setup_raidzs
  showshares_smb
  zfs_zones_setup

This, of course, doesn't catch recursive ones, or ones that log with
their own function name as a prefix, but

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:03:05 -07:00
наб 5c9f744b1a tests: include: use already-set $UNAME instead of shelling out to uname each time
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:59 -07:00
наб ff0fc5af12 tests: pam: use absolute path to module .so
This is a valid configuration and both (a) skips the tests if it's
unbuilt/not installed and (b) makes it work even if installed outside
the system directory (like in /u/l/l/s instead of /l/s)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:54 -07:00
наб 7b6c4cbb1f tests: zts-report: simplify
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:48 -07:00
наб abccb4b584 tests: zts-report: prune unused reasons
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:43 -07:00
наб 6b2616435b tests: zts-report: only known-SKIP zfs_unshare_002_pos on Linux
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:36 -07:00
наб 56692e77e2 linux: libshare: smb: fix more than one smb_is_share_active() call
This also fixes zfs_unshare_006_pos, which exposed this

Fixes: 2f71caf2d9 ("Allow zfs unshare
 <protocol> -a")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:24 -07:00
наб 34abca3e2c tests: zfs_003_neg: handle failures correctly
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:19 -07:00
наб 61f1502246 tests: zfs_share_005: don't fail open
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:14 -07:00
наб 7b875ee601 module: zstd: zfs_zstd: staticify zstd_ksp
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:09 -07:00
наб b1b5843ae4 tests: lua_core: use herewords for single-line programs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:02:03 -07:00
наб 598fed7ecd tests: revert back to original coredump patterns on Linux, too
Otherwise, they leak past the tests and contaminate the running system,
breaking coredumps entirely

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Yannick Le Pennec <yannick.lepennec@live.fr>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:01:51 -07:00
наб 912d2aa7d7 tests: zdb_args_pos.ksh: fix indentation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:01:46 -07:00
наб 20093de25c tests: move C test helpers into test cmd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:01:39 -07:00
наб 1948f6dbbf tests: echo-with-arguments review
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:01:31 -07:00
наб b1579689a9 tests: rsend/send-c_props: make it chooch
Original error:
23:47:40.59 SUCCESS: eval zfs receive -dFv testpool2 < /mnt/testroot/backdir-rsend/pool-final-p
23:47:40.61 1,23d0
23:47:40.61 < type      filesystem      -
23:47:40.61 < origin    POOL@psnap      -
23:47:40.61 < volblocksize      -       -
23:47:40.61 < acltype   nfsv4   inherited from POOL
23:47:40.61 < dnodesize legacy  inherited from POOL
23:47:40.61 < atime     off     local
23:47:40.61 < canmount  off     local
23:47:40.61 < checksum  off     local
23:47:40.61 < compression       off     local
23:47:40.61 < copies    3       local
23:47:40.61 < devices   off     local
23:47:40.61 < exec      off     local
23:47:40.61 < quota     none    default
23:47:40.61 < readonly  on      local
23:47:40.61 < recordsize        128K    local
23:47:40.61 < reservation       none    default
23:47:40.61 < setuid    off     local
23:47:40.61 < snapdir   hidden  local
23:47:40.61 < version   5       -
23:47:40.61 < volsize   -       -
23:47:40.61 < xattr     off     local
23:47:40.61 < mountpoint        /PREFIX inherited from POOL
23:47:40.61 < jailed    on      local
23:47:40.62 cannot open 'testpool2/pclone': dataset does not exist
23:47:40.62 ERROR: cmp_ds_prop testpool/pclone testpool2/pclone exited 1

So: (a) actually send all the datasets in -p mode and
    (b) drop origin for clones sent with -p:

00:38:05.46 SUCCESS: eval zfs receive -dFv testpool2 < /mnt/testroot/backdir-rsend/pool-final-p
00:38:05.48 2c2
00:38:05.48 < origin    POOL@psnap
00:38:05.48 ---
00:38:05.48 > origin    POOL
00:38:05.49 ERROR: cmp_ds_prop testpool/pclone testpool2/pclone nosource exited 1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13250
Closes #13259
2022-04-01 18:01:16 -07:00
наб a5addbbb7d tests: rsend.kshlib: cmp_ds_prop: anonymise "inherited from" sourcces
Fixes rsend_012_pos:
00:04:59.68 SUCCESS: cmp_ds_prop testpool/testfs/vol testpool2/testfs/vol nosource
00:04:59.70 4c4
00:04:59.70 < acltype   posix   inherited from testpool
00:04:59.70 ---
00:04:59.70 > acltype   posix   inherited from testpool2
00:04:59.70 11,12c11,12
00:04:59.70 < devices   off     inherited from testpool
00:04:59.70 < exec      on      inherited from testpool
00:04:59.70 ---
00:04:59.70 > devices   off     inherited from testpool2
00:04:59.70 > exec      on      inherited from testpool2
00:04:59.70 17c17
00:04:59.70 < setuid    off     inherited from testpool
00:04:59.70 ---
00:04:59.70 > setuid    off     inherited from testpool2
00:04:59.70 ERROR: cmp_ds_prop testpool@final testpool2@final exited 1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13250
Closes #13259
2022-04-01 18:00:59 -07:00
наб b2c5291b7e tests: prune cat (ab)uses
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:00:52 -07:00
наб f7cc8dddf7 tests: rsend_012_pos: backup/restore in one invocation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 18:00:44 -07:00
наб 9da14f3981 tests: rsend.kshlib: cmp_ds_prop: allow skipping source
This fixes rsend_012_pos:
20:28:50.50 SUCCESS: eval zfs receive -d -F testpool2 < /mnt/testroot/backdir-rsend/pool-final-R
20:28:50.53 4,6c4,6
20:28:50.53 < acltype   off     local
20:28:50.53 < dnodesize 4k      local
20:28:50.53 < atime     off     local
20:28:50.53 ---
20:28:50.53 > acltype   off     received
20:28:50.53 > dnodesize 4k      received
20:28:50.53 > atime     off     received
20:28:50.53 8,13c8,13
20:28:50.53 < checksum  sha256  local
20:28:50.53 < compression       off     local
20:28:50.53 < copies    2       local
20:28:50.53 < devices   on      local
20:28:50.53 < exec      on      local
20:28:50.53 < quota     1G      local
20:28:50.53 ---
20:28:50.53 > checksum  sha256  received
20:28:50.53 > compression       off     received
20:28:50.53 > copies    2       received
20:28:50.53 > devices   on      received
20:28:50.53 > exec      on      received
20:28:50.53 > quota     1G      received
20:28:50.53 15c15
20:28:50.53 < recordsize        128K    local
20:28:50.53 ---
20:28:50.53 > recordsize        128K    received
20:28:50.53 17,18c17,18
20:28:50.53 < setuid    off     local
20:28:50.53 < snapdir   visible local
20:28:50.53 ---
20:28:50.53 > setuid    off     received
20:28:50.53 > snapdir   visible received
20:28:50.53 ERROR: cmp_ds_prop testpool testpool2 exited 1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13250
Closes #13259
2022-04-01 18:00:26 -07:00
наб 91933eb977 tests: rsend.kshlib: cmp_ds_prop: mask out differences in origin pool
This fixes rsend_011_pos:
20:28:26.94 SUCCESS: cmp_ds_prop testpool/testfs/fs1/fs2 testpool2/testfs/fs1/fs2
20:28:26.96 2c2
20:28:26.96 < origin    testpool@psnap  -
20:28:26.96 ---
20:28:26.96 > origin    testpool2@psnap -
20:28:26.97 ERROR: cmp_ds_prop testpool/pclone testpool2/pclone exited 1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13250
Closes #13259
2022-04-01 17:59:41 -07:00
наб 964d41806f tests: review all wc(1) invocations
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:59:35 -07:00
наб 23914a3b91 tests: review every instance of $?
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:59:30 -07:00
наб 6586085673 tests: include: math: simplify bc conditions, review $?
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:59:24 -07:00
наб caccfc870f tests: clean out unused/single-use/useless commands from the list
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:59:18 -07:00
наб df6e0f0092 zed: functions: zed_log_err: forward to zed_log_msg
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:59:12 -07:00
наб be866839f0 zed: lets: all-debug: printenv -> env
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:59:06 -07:00
наб ee798fee45 scripts: zfs-tests.sh: cleanup, optimise
The single-stage losetup doesn't work on busybox,
but then most of the testsuite doesn't

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:58:59 -07:00
наб 0990228ab8 tests: mixed_create_failure: explicitly note the error
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue: #13215
Closes #13259
2022-04-01 17:58:44 -07:00
наб 50a33b8c69 tests: logapi: don't cat excessively
This also fixes line welding in test error output

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:58:37 -07:00
наб e294acdba8 tests: cmd: don't recurse
This confers an >10x speedup on t/z-t/cmd builds (12s -> 1.1s),
gets rid of 23 redundant identical automake specs and gitignores,
and groups the binaries with their common headers

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:58:23 -07:00
наб caeab993ec tests: {read,write}_dos_attributes: despaghettify
Also: actually accept all the flags in write_d_a

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:58:17 -07:00
наб d30577c9dd fgrep -> grep -F
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:58:11 -07:00
наб f63c9dc70a egrep -> grep -E
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:58:07 -07:00
наб 270d0a5a16 tests: nawk -> awk
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:58:01 -07:00
наб 75746e9a40 tests: review every awk(1) invocation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:55 -07:00
наб 72f3516094 tests: zfs_rollback_commit: talkative failures
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:50 -07:00
наб d7976a073e tests: README: note non-default mountd requirement
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:45 -07:00
наб d5cc930b8f tests: README: note that -d *must* be 777 for the deleg tests
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:40 -07:00
наб 41ebf40375 tests: vdev_zaps: cleanup library
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:35 -07:00
наб 25aeffadb2 tests: vdev_zaps_007: actually test the new pool
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:30 -07:00
наб 592cf7f1e2 tests: get rid of which
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:25 -07:00
наб 62c5ccdf92 tests: don't >-redirect without eval
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:19 -07:00
наб 053dac9e7d tests: vdev_zaps_007: log_must with > must eval
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:14 -07:00
наб b9b763111c tests: redacted_send: explicitly assume instant /dev on non-Linux
The users spew udevadm ENOENTs on FreeBSD

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:08 -07:00
наб 9423c932d4 tests: replace sum(1) with cksum(1)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:57:03 -07:00
наб c2fcc55d97 tests: snapshot_00[15]_pos: use cksum instead of sum -r
This only worked by accident on FreeBSD, where sum doesn't take flags

POSIX guarantees us cksum (Ethernet CRC) ‒ use that instead

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:56:53 -07:00
наб 7aa7e6bd0a tests: zfs_create_nomount: undefined local -> typeset
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:56:45 -07:00
наб f806d67846 tests: mixed_create_failure: undefined log_err -> log_fail
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #13215
Closes #13259
2022-04-01 17:55:49 -07:00
наб 2d9da5e1c8 tests: don't fail if no fio or python3.sysctl
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:45 -07:00
наб bd328a588b tests: nonspecific cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:40 -07:00
наб 33c319eb1e tests: zfs_share_concurrent_shares: don't use log_musts in subprocesses
This thoroughly destroys logapi and races to the log files horribly

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:35 -07:00
наб 3886e7081a tests: zfs_unshare_006: log_unsupported iff usershares are actually off
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:30 -07:00
наб a3dcc0aa0c zfs, zpool: safe_malloc() duplicate argv
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:24 -07:00
наб 84a3eab6aa zfs: simplify usage_prop_cb values
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:19 -07:00
наб 5b0e75caef tests: don't use share/unshare exportfs aliases, support FreeBSD NFS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:14 -07:00
наб c22c872036 tests: don't always skip zfs_unshare tests on FreeBSD
Previously, they'd all be skipped on FreeBSD where share is showmount

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:07 -07:00
наб 261f10b717 tests: zfs_unshare_001_pos: print which filesystem failed
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:55:02 -07:00
наб bf228f3de0 tests: standardise on no-arg uname with *) case for illumos
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:54:52 -07:00
Andrew eebfd28e9d Linux optimize access checks when ACL is trivial
Bypass check of ZFS aces if the ACL is trivial. When an ACL is
trivial its permissions are represented by the mode without any
loss of information. In this case, it is safe to convert the
access request into equivalent mode and then pass desired mask
and inode to generic_permission(). This has the added benefit
of also checking whether entries in a POSIX ACL on the file grant
the desired access.

This commit also skips the ACL check on looking up the xattr dir
since such restrictions don't exist in Linux kernel and it makes
xattr lookup behavior inconsistent between SA and file-based
xattrs. We also don't want to perform a POSIX ACL check while
looking up the POSIX ACL if for some reason it is located in
the xattr dir rather than an SA.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #13237
2022-04-01 09:53:54 -07:00
Rich Ercolani 6a2dda8f05 Ask libtool to stop hiding some errors
For #13083, curiously, it did not print the actual error, just
that the compile failed with "Error 1".

In theory, this flag should cause it to report errors twice sometimes.
In practice, I'm pretty okay with reporting some twice if it avoids
reporting some never.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13086
2022-03-31 10:09:18 -07:00
hpingfs 4d04e41e4d zfs_ctldir: fix incorrect argument type of rw_destroy
The argument type of rw_destroy is (krwlock_t *) while currently
krwlock_t is passed in zfs_ctldir.c. This error is hidden because
rw_destroy is defined as ((void) 0) in linux. But anyway, this
mismatch should be fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ping Huang <huangping@smartx.com>
Closes #13272
2022-03-30 15:40:31 -07:00
hpingfs abdcef47d2 zvol_os: suppress compiler warning for zvol_open_timeout_ms
When HAVE_BLKDEV_GET_ERESTARTSYS is defined, compiler will complain
"defined but not used" warning for zvol_open_timeout_ms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ping Huang <huangping@smartx.com>
Closes #13270
2022-03-30 15:39:55 -07:00
наб 7dc782e5c5 cstyle: remove unused -o
Remove handling for allowing doxygen- and embedding in splint(?)-style
comments.  This functionality is unused by OpenZFS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13264
2022-03-30 15:37:28 -07:00
наб 3cfbeb4e90 zfs: main: don't NULL-check infallible safe_malloc()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13229
2022-03-30 15:30:51 -07:00
наб 18dbf5c8c3 libzfs: don't NULL-check infallible allocations
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13229
2022-03-30 15:30:16 -07:00
наб bc3f12bfac config: user: check for <aio.h>
And always zpool_read_label_slow() on non-conformant libcs

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13207
Closes #13254
2022-03-28 10:24:22 -07:00
наб b61595ff86 man: zfs-rename.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб 4d4fa5215d man: zfs-snapshot.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб bc2fe49a4f man: zfs-promote.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб d842ca3ff2 man: zfs-create.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб 3b887e485c man: zfs-destroy.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб 9bda775bd9 man: zfs-rollback.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб c6fd08797f man: zfs-clone.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб 52f3aebdb2 man: zfs-list.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб 2057b47b91 man: zfs-send.8, zfs-receive.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб b7b3f19935 man: zfs-diff.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб 1e164f89e5 man: zfs-bookmark.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:14 -07:00
наб c522339fd7 man: zfs-set.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 575e20602c man: zfs-allow.8: import examples from zfs.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 69b06a77b6 man: zpool-iostat.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб b36069ab88 man: zpool-remove.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб d8dd89acd1 man: zpool-upgrade.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 8ccb3fd6b1 man: zpool-import.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 4849523af0 man: zpool-export.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 4b660ffe7b man: zpool-destroy.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 74181ca44c man: zpool-list.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб c34eb6f7a1 man: zpool-add.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 9cdb067539 man: zpool-create.8: import examples from zpool.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
Closes #13140
2022-03-28 10:13:13 -07:00
наб a2550c9df9 man: Examples: use subsections instead of lists
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 1ebe4c482d man: zpool.8: Examples: unmirrored -> non-redundant
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 6571873f1b man: zpool.8: Examples: use Pa for disks and scripts
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13228
2022-03-28 10:13:13 -07:00
наб 7db823bd61 module: zfs: dsl_bookmark: silence false-positive maybe-uninitialised
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13247
Closes #13258
2022-03-28 10:03:13 -07:00
Jeremy Visser 063c83a853 zfs-dkms rpm: simplify scriptlets, fix uninstall
Two problems led to unexpected behaviour of the scriptlets:

1) Newer DKMS versions change the formatting of "dkms status":

   (old) zfs, 2.1.2, 5.14.10-300.fc35.x86_64, x86_64: installed
   (new) zfs/2.1.2, 5.14.10-300.fc35.x86_64, x86_64: installed

   Which broke a conditional determining whether to uninstall.

2) zfs_config.h not packaged properly, but was attempted to be read
   in the %preun scriptlet:

   CONFIG_H="/var/lib/dkms/zfs/2.1.2/*/*/zfs_config.h"

   Which broke the uninstallation of the module, which left behind a
   dangling symlink, which broke DKMS entirely with this error:

     Error! Could not locate dkms.conf file.
     File: /var/lib/dkms/zfs/2.1.1/source/dkms.conf does not exist.

This change attempts to simplify life by:

*  Avoiding parsing anything (less prone to future breakage)
*  Uses %posttrans instead of %post for module installation, because
   %post happens before %preun, while %posttrans happens afterwards
*  Unconditionally reinstall module on upgrade, which is less
   efficient but the trade-off is that it's more reliable

Alternative approaches could involve fixing the existing parsing bugs
or improving the logic, but this comes at the cost of complexity and
possible future bugs.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jeremy Visser <jeremyvisser@google.com>
Closes #10463
Closes #13182
2022-03-28 09:55:57 -07:00
наб 86690775b0 Linux 5.18 compat: replace genhd.h with blkdev.h includes
blkdev.h includes genhd.h since dawn of upstream git, so this is
globally safe

Upstream-commit: 322cbb50de711814c42fb088f6d31901502c711a ("block:
 remove genhd.h")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13251
2022-03-28 09:52:55 -07:00
наб d1325b4fa2 Linux 5.18 compat: 4-argument bio_alloc()
bio_alloc(gfp_t gfp_mask, unsigned short nr_iovecs)

became

  bio_alloc(struct block_device *bdev, unsigned short nr_vecs,
            unsigned int opf, gfp_t gfp_mask)
passing NULL/0 continues previous behaviour

Upstream-commit: 07888c665b405b1cd3577ddebfeb74f4717a84c4 ("block:
 pass a block_device and opf to bio_alloc")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13251
2022-03-28 09:51:59 -07:00
George Melikov 427fad7f24 CI: Log test name to /dev/kmsg in ZTS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes  #13249
2022-03-24 08:01:26 -06:00
наб 6322a77ce7 linux: libzutil: zfs_path_order: don't strdup ZPOOL_IMPORT_PATH
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13223
2022-03-23 08:55:38 -07:00
наб 8a2ed86001 libzutil: zfs_resolve_shortname: don't strdup() ZPOOL_IMPORT_PATH
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13223
2022-03-23 08:55:18 -07:00
George Melikov 018937884f zpoolconcepts.7: fix comma typo
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #13239
2022-03-23 08:53:19 -07:00
Brian Behlendorf 706ba5dc1d Linux 5.17 compat: META
Update the META file to reflect compatibility with the 5.17 kernel.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13243
2022-03-23 08:52:30 -07:00
Brian Behlendorf 460748d4ae Switch from _Noreturn to __attribute__((noreturn))
Parts of the Linux kernel build system struggle with _Noreturn.  This
results in the following warnings when building on RHEL 8.5, and likely
other environments.  Switch to using the __attribute__((noreturn)).

  warning: objtool: dbuf_free_range()+0x2b8:
    return with modified stack frame
  warning: objtool: dbuf_free_range()+0x0:
    stack state mismatch: cfa1=7+40 cfa2=7+8
  ...
  WARNING: EXPORT symbol "arc_buf_size" [zfs.ko] version generation
    failed, symbol will not be versioned.
  WARNING: EXPORT symbol "spa_open" [zfs.ko] version generation
    failed, symbol will not be versioned.
  ...

Additionally, __thread_exit() has been renamed spl_thread_exit() and
made a static inline function.  This was needed because the kernel
will generate a warning for symbols which are __attribute__((noreturn))
and then exported with EXPORT_SYMBOL.

While we could continue to use _Noreturn in user space I've also
switched it to __attribute__((noreturn)) purely for consistency
throughout the code base.

Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13238
2022-03-23 08:51:00 -07:00
Tony Hutter b73505c7e0 ZTS: Log test name to /dev/kmsg on Linux
Add a -K option to the test suite to log each test name to /dev/kmsg
(on Linux), so if there's a kernel warning we'll be able to match
it up to a particular test.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes  #13227
2022-03-23 09:15:02 -06:00
Brian Behlendorf 6b444cb971 Linux 5.16 compat: restore FSR and FSAVE
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored
when using a Linux 5.16 kernel.  These instructions are only used when
XSAVE is not supported by the processor meaning only some systems will
encounter this issue.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13210
Closes #13236
2022-03-19 12:46:33 -07:00
Sean Eric Fagan 565089f592 Allow zfs send to exclude datasets
Add support for a -exclude/-X option to `zfs send` to allow dataset 
hierarchies to be excluded.

Snapshots can be excluded using a channel program; however,
this can result in failures with 'zfs send -R'; this option allows 
them to be excluded.  Fortunately, this required a change only to 
cmd/zfs/zfs_main.c, using the already-existing callback argument 
to zfs_send() that is currently unused.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sean Eric Fagan <kithrup@mac.com>
Signed-off-by: Sean Eric Fagan <kithrup@mac.com>
Closes #13158
2022-03-18 17:02:12 -07:00
ColMelvin 3ce3d30532 RPM: Split out pam_zfs_key into separate package
Create a separate `pam_zfs_key` package for the PAM module components,
an optional addition to the deliverables, in much the same way as the
Python bindings are released as a separate `python#-pyzfs` package.

This makes it clear when the PAM module is shipped with the package,
since it's now in its own package.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Closes: #13026
2022-03-18 16:54:50 -07:00
наб 045aeabce6 module: zfs: arc: hdr_full_crypt_dest: drop unevaulated-only variable
This explodes as -Wunused-variable on GCC 8.5.0, despite it being used,
just not in an evaluated context

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13195
2022-03-18 16:53:05 -07:00
наб 4629dfe963 config: always-arch: don't subst TARGET_CPU
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13225
2022-03-18 16:50:52 -07:00
наб bd0955cd8e config: always-arch: prune unused TARGET_CPU_*, add TARGET_CPU catch-all
This fixes (harmless) error spew from configuring on, e.g., armv6l

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13193
Closes #13225
2022-03-18 16:49:08 -07:00
Tony Hutter 515710a1e1 zed: Fix mpath autoreplace on Centos 7
A prior commit included a udev check for MPATH_DEVICE_READY to
determine if a path was multipath when doing an autoreplace:

    f2f6c18 zed: Misc multipath autoreplace fixes

However, MPATH_DEVICE_READY is not provided by the older version of
udev that's on Centos 7 (it is on Centos 8).

This patch instead looks for 'mpath-' in the UUID, which works on
both Centos 7 and 8.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13222
2022-03-18 14:06:40 -07:00
Ryan Moeller d42979c6ef Fix ACL checks for NFS kernel server
This PR changes ZFS ACL checks to evaluate
fsuid / fsgid rather than euid / egid to avoid
accidentally granting elevated permissions to
NFS clients.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Andrew Walker <awalker@ixsystems.com>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13221
2022-03-18 06:47:57 -06:00
Mateusz Guzik a5920d24c0 FreeBSD: add missing replay check to an assert in zfs_xvattr_set
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #13219
2022-03-17 10:30:10 -07:00
Kyle Evans bee314a798 module: freebsd: avoid a taking a destroyed lock in zfs_zevent bits
At shutdown time, we drain all of the zevents and set the
ZEVENT_SHUTDOWN flag.  On FreeBSD, we may end up calling
zfs_zevent_destroy() after the zevent_lock has been destroyed while
the sysevent thread is winding down; we observe ESHUTDOWN, then back
out.

Events have already been drained, so just inline the kmem_free call in
sysevent_worker() to avoid the race, and document the assumption that
zfs_zevent_destroy doesn't do anything else useful at that point.

This fixes a panic that can occur at module unload time.

Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #13220
2022-03-17 10:14:00 -07:00
наб 6ef00196db module: zstd: check we don't leak symbols; regenerate symbol map
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12988 
Closes #13209
2022-03-15 16:10:10 -07:00
наб 1c41d8941c tests: validate getsubopt(3) expulsion
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:50 -07:00
наб 77cdc63d09 zfs-list.8: mention -S after -s it calls back to
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:46 -07:00
наб 40f09cb0f4 zfs: list: only accept whole type for -t, not tp[=whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:42 -07:00
наб 539d16c35e libzfs: tokenise consistently with zfs and zpool
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:38 -07:00
наб 4695230021 zfs: get: only accept whole type for -t, not tp[=whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:33 -07:00
наб 7867f430b4 zfs: get: only accept whole source for -s, not src[=whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:29 -07:00
наб 7c17e82cbe zfs: get: only accept whole column for -o, not col[=whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:23 -07:00
наб c79787845d zpool: get: there's one fewer column than in zfs get
The current code allows -o name,property,value,source,name

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:18 -07:00
наб 15aca3ad59 zfs: wait: only accept whole activity for -t, not act[=whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:14 -07:00
наб 66cd170db3 zpool: get: only accept whole columns for -o, not col[=whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:06 -07:00
наб 675508f608 zpool: wait: only accept whole columns for -t, not col[=whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:14:02 -07:00
наб c21819026f Replace FMD_B_{TRUE,FALSE} with B_{TRUE,FALSE}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:58 -07:00
наб a9e2b22efb Integrate carcass of libspl/i/s/vtoc.h into i/s/efi_partition.h
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:54 -07:00
наб d465fc5844 Forbid b{copy,zero,cmp}(). Don't include <strings.h> for <string.h>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:48 -07:00
наб 861166b027 Remove bcopy(), bzero(), bcmp()
bcopy() has a confusing argument order and is actually a move, not a
copy; they're all deprecated since POSIX.1-2001 and removed in -2008,
and we shim them out to mem*() on Linux anyway

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:42 -07:00
наб 1d77d62f5a libspl: include: sys/vtoc.h: reduce to absolute barest minimum
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:36 -07:00
наб 26bbce8173 tests: replace explicit $? || log_fail with log_must
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:33 -07:00
наб 85c2cce51c tests: zfs_002_pos: simplify ZFS_ABORT tests
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:21 -07:00
наб e09762c6c2 tests: zfs_set_common: check_prop_inherit: print faulty values
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:00 -07:00
Brian Behlendorf 2feba9a6d8 Linux 5.16 compat: META
Update the META file to reflect compatibility with the 5.16 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13212
Closes #13218
2022-03-14 16:39:07 -07:00
Ryan Moeller 8bb9ecf4bf libzfs: Convert to fnvpair functions
Improves readability.  No functional change intended.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13197
2022-03-14 15:44:56 -07:00
Brian Atkinson becc717f61 Adding ZERO_PAGE detection
On some architectures ZERO_PAGE is unavailable because it references
a GPL exported symbol of empty_zero_page. Originally e08b993 removed
the call to PAGE_ZERO(0) for assignment to the abd_zero_page. However,
a simple check can be done to avoid a kernel allocation and free for
the abd_zero_page if ZERO_PAGE is available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #13199
2022-03-14 12:37:39 -07:00
наб dad2b19fff module: zfs: zio_inject: zio_match_handler: don't << -1
Caught by UBSAN: ZI_NO_DVA is passed explicitly in
zio_handle_decrypt_injection() and can be an ENOENT from zio_match_dva()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13146
Closes #13190
2022-03-13 13:18:17 -07:00
Brian Behlendorf 56a0699e5e ZTS: Fix send_partial_dataset.ksh
The send_partial_dataset test verifies that partial send streams
can be resumed.  This test may occasionally fail with a "token is
corrupt" error if the `mess_send_file` truncates a send stream
below the size of the DRR_BEGIN record.  Update this function to
set a minimum size to ensure there is at least an intact DDR_BEGIN
record which allows for the receiving dataset to be created.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13177
2022-03-13 13:16:49 -07:00
Harry Sintonen ebcf12f763 get_key_material_https: removed bogus free() call
The get_key_material_https() function error code path had a bogus
free() call, either resulting in double-free or free() of undefined
pointer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Co-authored-by: Harry Sintonen <sintonen@iki.fi>
Signed-off-by: Harry Sintonen <sintonen@iki.fi>
Closes #13198
2022-03-13 13:15:40 -07:00
Ryan Moeller 76bcffb7dc libzfs: FreeBSD doesn't resize partitions for you
This code can be failure prone on FreeBSD, where zfsd will pass a guid
as the vdev path to online.  The guid causes zfs_resolve_shortname to
fail because it expects a path.  We can just skip the whole ordeal.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #12083
2022-03-11 08:52:49 -08:00
Akash B 1282274f33 Add physical device size to SIZE column in 'zpool list -v'
Add physical device size/capacity only for physical devices in
'zpool list -v' instead of displaying "-" in the SIZE column.
This would make it easier to see the individual device capacity and
to determine which spares are large enough to replace which devices.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #12561
Closes #13106
2022-03-08 16:20:41 -08:00
Attila Fülöp ce7a5dbf4b Linux x86 SIMD: factor out unneeded kernel dependencies
Cleanup the kernel SIMD code by removing kernel dependencies.

 - Replace XSTATE_XSAVE with our own XSAVE implementation for all
   kernels not exporting kernel_fpu{begin,end}(), see #13059

 - Replace union fpregs_state by a uint8_t * buffer and get the size
   of the buffer from the hardware via the CPUID instruction

 - Replace kernels xgetbv() by our own implementation which was
   already there for userspace.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13102
2022-03-08 16:19:15 -08:00
наб a86e089415 libzfs_core: lzc_send_wraper: maximise pipe buffer for existing pipes
This reduces scheduler overhead by letting the reader consume bigger
chunks (64k => 128k at full throttle)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13133
2022-03-08 09:33:12 -08:00
Rich Ercolani 7eb179be18 ZTS: /dev/null: accept no substitutes
Instead of writing to "devnull" and rming it later, just
> /dev/null to not have to cleanup later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13133
2022-03-08 09:33:11 -08:00
наб dd899641ee Revert "ZTS: Avoid piping send directly to /dev/null"
This reverts commit 1a79f7e860.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13133
2022-03-08 09:33:10 -08:00
наб d1a5e9594c Revert "Added error for writing to /dev/ on Linux"
This reverts commit 860051f1d1.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13133
2022-03-08 09:33:09 -08:00
наб 3a909fe33e libzfs, libzfs_core: send: always write to pipe
By introducing lzc_send_wrapper() and routing all ZFS_IOC_SEND*
users through it, we fix a Linux 5.10-introduced bug (see comment)

This is all /transparent/ to the users API, ABI, and usage-wise,
and disabled on FreeBSD and if the output is already a pipe,
and transparently nestable (i.e. zfs_send_one() is wrapped,
but so is lzc_send_redacted() it calls to ‒ this wouldn't be strictly
necessary if ZFS_IOC_SEND_PROGRESS wasn't strictly denominational w.r.t.
the descriptor the send is happening on)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11445
Closes #13133
2022-03-08 09:33:08 -08:00
наб fbbea09db9 libzfs_core: simplify max_pipe_buffer(), return new max
We do still check and only optionally set because all F_SETPIPE_SZ
directives reallocate the pipe buffer

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13133
2022-03-08 09:33:07 -08:00
наб 8a3d77358a libzfs: migrate single-use libzfs_set_pipe_max() to libzfs_core
Notably, this also means that the pipe is expanded before each
dataset is received, so updates to /p/s/f/pipe-max-size are reflected
for each new dataset

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13133
2022-03-08 09:33:04 -08:00
Brian Atkinson 08eb2309ce Cleaning up a couple of ZTS tests setup scripts
With the zfs_destroy ZTS test case the setup script needed to call
default_setup_noexit so compression could be turned off. Also, added
log_must to setting compression off in the reservation setup script for
turning off compression.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #13173
2022-03-08 09:20:33 -08:00
Brian Atkinson 1b609d4b03 Added noexit variant for Raidz setup in ZTS tests
The regular default_raidz_setup function in the ZFS test suite called
log_pass after creating the zpool. However, with compression now being
on by default 56fa4aa, there is no way to turn compression off in the
setup.ksh scripts when creating a raidz VDEV. The addition of the
function default_raidz_setup_noexit allows for a raidz VDEV to be
created, additional zfs property settings to be applied and for the
setup.ksh script itself to call log_pass.

With the addition of default_raidz_setup_noexit some stray log_pass
calls were removed from any setup.ksh scripts that call
default_raidz_setup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #13173
2022-03-08 09:20:00 -08:00
Brian Behlendorf 6df43169b3 Fix ENOSPC when unlinking multiple files from full pool
When unlinking multiple files from a pool at 100% capacity, it was
possible for ENOSPC to be returned after the first unlink.  e.g.

    rm -f /mnt/fs/test1.0.0 /mnt/fs/test1.1.0 /mnt/fs/test1.2.0
    rm: cannot remove '/mnt/fs/test1.1.0': No space left on device
    rm: cannot remove '/mnt/fs/test1.2.0': No space left on device

After waiting for the pending deferred frees from the first unlink to
be processed the remaining files can then be unlinked.  This is caused
by the quota limit in dsl_dir_tempreserve_impl() being temporarily
decreased to the allocatable pool capacity less any deferred free
space.

This is resolved using the existing mechanism of returning ERESTART
when over quota as long as we know enough space will shortly be
available after processing the pending deferred frees.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13172
2022-03-08 09:16:35 -08:00
Umer Saleem 39a4daf742 Expose additional file level attributes
ZFS allows to update and retrieve additional file level attributes for
FreeBSD. This commit allows additional file level attributes to be
updated and retrieved for Linux. These include the flags stored in the
upper half of z_pflags only.

Two new IOCTLs have been added for this purpose. ZFS_IOC_GETDOSFLAGS
can be used to retrieve the attributes, while ZFS_IOC_SETDOSFLAGS can
be used to update the attributes.

Attributes that are allowed to be updated include ZFS_IMMUTABLE,
ZFS_APPENDONLY, ZFS_NOUNLINK, ZFS_ARCHIVE, ZFS_NODUMP, ZFS_SYSTEM,
ZFS_HIDDEN, ZFS_READONLY, ZFS_REPARSE, ZFS_OFFLINE and ZFS_SPARSE.
Flags can be or'd together while calling ZFS_IOC_SETDOSFLAGS.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #13118
2022-03-07 17:52:03 -08:00
Windel Bouwman 9955b9ba2e Handle aarch64 defines seperate from arm
aarch64 is a different architecture than arm. Some
compilers might choke when both __arm__ and __aarch64__
are defined.

This change separates the checks for arm and for
aarch64 in the isa_defs.h header files.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Windel Bouwman <windel@windel.nl>
Closes #10335 
Closes #13151
2022-03-07 17:49:34 -08:00
Alejandro Colomar db7f1a91de Use _Noreturn (C11; GNU89) properly
A function that returns with no value is a different thing from a
function that doesn't return at all.  Those are two orthogonal
concepts, commonly confused.

pthread_create(3) expects a pointer to a start routine that has a
very precise prototype:

    void *(*start_routine)(void *);

However, other thread functions, such as kernel ones, expect:

    void (*start_routine)(void *);

Providing a different one is incorrect, and has only been working
because the ABIs happen to produce a compatible function.

We should use '_Noreturn void', since it's the natural type, and
then provide a '_Noreturn void *' wrapper for pthread functions.

For consistency, replace most cases of __NORETURN or
__attribute__((noreturn)) by _Noreturn.  _Noreturn is understood
by -std=gnu89, so it should be safe to use everywhere.

Ref: https://github.com/openzfs/zfs/pull/13110#discussion_r808450136
Ref: https://software.codidact.com/posts/285972
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Closes #13120
2022-03-04 16:25:22 -08:00
наб 06b8050678 zpool: main: list: don't pay for printf
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13125
2022-03-04 12:08:34 -08:00
наб cdb3eb5aa4 zpool: main: list: -v: update dash spacing for special vdevs
Before:
nabijaczleweli@tarta:~/store/code/zfs$ /sbin/zpool list -v file
NAME                                                    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
file                                                    118G   150K   118G        -         -     0%     0%  1.00x    ONLINE  -
  /mnt/filling/store/nabijaczleweli/code/zfs/file      39.5G      0  39.5G        -         -     0%  0.00%      -    ONLINE
dedup                                                      -      -      -        -         -      -      -      -  -
  /mnt/filling/store/nabijaczleweli/code/zfs/adedup    39.5G      0  39.5G        -         -     0%  0.00%      -    ONLINE
special                                                    -      -      -        -         -      -      -      -  -
  /mnt/filling/store/nabijaczleweli/code/zfs/aspecial  39.5G   150K  39.5G        -         -     0%  0.00%      -    ONLINE
logs                                                       -      -      -        -         -      -      -      -  -
  /mnt/filling/store/nabijaczleweli/code/zfs/alog      39.5G      0  39.5G        -         -     0%  0.00%      -    ONLINE
cache                                                      -      -      -        -         -      -      -      -  -
  /mnt/filling/store/nabijaczleweli/code/zfs/acache    40.0G  1.50K  40.0G        -         -     0%  0.00%      -    ONLINE
spare                                                      -      -      -        -         -      -      -      -  -
  /mnt/filling/store/nabijaczleweli/code/zfs/aspare        -      -      -        -         -      -      -      -     AVAIL

After:
nabijaczleweli@tarta:~/store/code/zfs$ cmd/zpool/zpool list -v file
NAME                                                    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
file                                                    118G   150K   118G        -         -     0%     0%  1.00x    ONLINE  -
  /mnt/filling/store/nabijaczleweli/code/zfs/file      39.5G      0  39.5G        -         -     0%  0.00%      -    ONLINE
dedup                                                      -      -      -        -         -      -      -      -         -
  /mnt/filling/store/nabijaczleweli/code/zfs/adedup    39.5G      0  39.5G        -         -     0%  0.00%      -    ONLINE
special                                                    -      -      -        -         -      -      -      -         -
  /mnt/filling/store/nabijaczleweli/code/zfs/aspecial  39.5G   150K  39.5G        -         -     0%  0.00%      -    ONLINE
logs                                                       -      -      -        -         -      -      -      -         -
  /mnt/filling/store/nabijaczleweli/code/zfs/alog      39.5G      0  39.5G        -         -     0%  0.00%      -    ONLINE
cache                                                      -      -      -        -         -      -      -      -         -
  /mnt/filling/store/nabijaczleweli/code/zfs/acache    40.0G  1.50K  40.0G        -         -     0%  0.00%      -    ONLINE
spare                                                      -      -      -        -         -      -      -      -         -
  /mnt/filling/store/nabijaczleweli/code/zfs/aspare        -      -      -        -         -      -      -      -     AVAIL

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13123
Closes #13125
2022-03-04 12:08:33 -08:00
наб cf5a9d8c39 zpool: main: use ARRAY_SIZE(class_name) instead of 3
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13125
2022-03-04 12:08:33 -08:00
наб f4826a2d46 libzfs: util: don't check for allocation errors from infallible zfs_*()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13125
2022-03-04 12:08:33 -08:00
наб be8e1d81bf Flex non-pretty-printed properties and raw-/pretty-print remaining ones
Before:
nabijaczleweli@tarta:~/store/code/zfs$ /sbin/zpool list -Td -o name,size,alloc,free,ckpoint,expandsz,guid,load_guid,frag,cap,dedup,health,altroot,guid,dedupditto,load_guid,maxblocksize,maxdnodesize 2>/dev/null
Sun 20 Feb 03:57:44 CET 2022
NAME         SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   GUID  LOAD_GUID   FRAG    CAP  DEDUP    HEALTH  ALTROOT   GUID  DEDUPDITTO  LOAD_GUID  MAXBLOCKSIZE  MAXDNODESIZE
filling     25.5T  6.52T  18.9T        -       64M  11512889483096932869  11656109927366648364     1%    25%  1.00x    ONLINE  -        11512889483096932869           0  11656109927366648364       1048576         16384
tarta-boot   240M  50.6M   189M        -         -  2372068846917849656  7752280792179633787    12%    21%  1.00x    ONLINE  -        2372068846917849656           0  7752280792179633787       1048576           512
tarta-zoot  55.5G  6.42G  49.1G        -         -  12971868889665384604  8622632123393589527    17%    11%  1.00x    ONLINE  -        12971868889665384604           0  8622632123393589527       1048576         16384

nabijaczleweli@tarta:~/store/code/zfs$ /sbin/zfs list -o name,guid,keyguid,ivsetguid,createtxg,objsetid,pbkdf2iters,refratio -r tarta-zoot
NAME                                  GUID  KEYGUID  IVSETGUID  CREATETXG  OBJSETID  PBKDF2ITERS  REFRATIO
tarta-zoot                           1110930838977259561     659P          -          1        54            0     1.03x
tarta-zoot/PAGEFILE.SYS              2202570496672997800    3.20E          -       2163      1539            0     1.07x
tarta-zoot/dupa                      16941280502417785695    9.81E          -    2274707      1322  1000000000000     1.00x
tarta-zoot/etc                       17029963068508333530    12.9E          -       3663      1087            0     1.52x
tarta-zoot/home                      3508163802370032575    8.50E          -       3664       294            0     1.00x
tarta-zoot/home/misio                7283672744014848555    13.0E          -       3665       302            0     2.28x
tarta-zoot/home/nabijaczleweli       12286744508078616303    5.15E          -       3666       200            0     2.05x
tarta-zoot/home/nabijaczleweli/tftp  13551632689932817643    5.16E          -       3667      1095            0     1.00x
tarta-zoot/home/root                 5203106193060067946    15.4E          -       3668       698            0     2.86x
tarta-zoot/home/shared-config        8866040021005142194    14.5E          -       3670      2069            0     1.20x
tarta-zoot/home/tymek                9472751824283011822    4.56E          -       3671      1202            0     1.32x
tarta-zoot/oldboot                   10460192444135730377    13.8E          -    2268398      1232            0     1.01x
tarta-zoot/opt                       9945621324983170410    5.84E          -       3672      1210            0     1.00x
tarta-zoot/opt/icecc                 13178238931846132425    9.04E          -       3673      1103            0     2.83x
tarta-zoot/opt/swtpm                 10172962421514870859    4.13E          -     825669    145132            0     1.87x
tarta-zoot/srv                       217179989022738337    3.90E          -       3674      2469            0     1.00x
tarta-zoot/usr                       12214213243060765090    15.0E          -       3675      2477            0     2.58x
tarta-zoot/usr/local                 7542700368693813134     941P          -       3676      2484            0     2.33x
tarta-zoot/var                       13414177124447929530    10.2E          -       3677      2492            0     1.57x
tarta-zoot/var/lib                   6969944550407159241    5.28E          -       3678      2499            0     2.34x
tarta-zoot/var/tmp                   6399468088048343912    1.34E          -       3679      1218            0     3.95x

After:
nabijaczleweli@tarta:~/store/code/zfs$ cmd/zpool/zpool list -Td -o name,size,alloc,free,ckpoint,expandsz,guid,load_guid,frag,cap,dedup,health,altroot,guid,dedupditto,load_guid,maxblocksize,maxdnodesize 2>/dev/null
Sun 20 Feb 03:57:42 CET 2022
NAME         SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ                  GUID             LOAD_GUID   FRAG    CAP  DEDUP    HEALTH  ALTROOT                  GUID  DEDUPDITTO             LOAD_GUID  MAXBLOCKSIZE  MAXDNODESIZE
filling     25.5T  6.52T  18.9T        -       64M  11512889483096932869  11656109927366648364     1%    25%  1.00x    ONLINE  -        11512889483096932869           0  11656109927366648364            1M           16K
tarta-boot   240M  50.6M   189M        -         -   2372068846917849656   7752280792179633787    12%    21%  1.00x    ONLINE  -         2372068846917849656           0   7752280792179633787            1M           512
tarta-zoot  55.5G  6.42G  49.1G        -         -  12971868889665384604   8622632123393589527    17%    11%  1.00x    ONLINE  -        12971868889665384604           0   8622632123393589527            1M           16K

nabijaczleweli@tarta:~/store/code/zfs$ cmd/zfs/zfs list -o name,guid,keyguid,ivsetguid,createtxg,objsetid,pbkdf2iters,refratio -r tarta-zoot
NAME                                                 GUID               KEYGUID  IVSETGUID  CREATETXG  OBJSETID    PBKDF2ITERS  REFRATIO
tarta-zoot                            1110930838977259561    741529699813639505          -          1        54              0     1.03x
tarta-zoot/PAGEFILE.SYS               2202570496672997800   3689529982640017884          -       2163      1539              0     1.07x
tarta-zoot/dupa                      16941280502417785695  11312442953423259518          -    2274707      1322  1000000000000     1.00x
tarta-zoot/etc                       17029963068508333530  14852574366795347233          -       3663      1087              0     1.52x
tarta-zoot/home                       3508163802370032575   9802810070759776956          -       3664       294              0     1.00x
tarta-zoot/home/misio                 7283672744014848555  14983161489316798151          -       3665       302              0     2.28x
tarta-zoot/home/nabijaczleweli       12286744508078616303   5937870537299886218          -       3666       200              0     2.05x
tarta-zoot/home/nabijaczleweli/tftp  13551632689932817643   5950522828900813054          -       3667      1095              0     1.00x
tarta-zoot/home/root                  5203106193060067946  17718025091255443518          -       3668       698              0     2.86x
tarta-zoot/home/shared-config         8866040021005142194  16716354482778968577          -       3670      2069              0     1.20x
tarta-zoot/home/tymek                 9472751824283011822   5251854710505749954          -       3671      1202              0     1.32x
tarta-zoot/oldboot                   10460192444135730377  15894065034622168157          -    2268398      1232              0     1.01x
tarta-zoot/opt                        9945621324983170410   6737735639539098405          -       3672      1210              0     1.00x
tarta-zoot/opt/icecc                 13178238931846132425  10425145983015238428          -       3673      1103              0     2.83x
tarta-zoot/opt/swtpm                 10172962421514870859   4764783754852521469          -     825669    145132              0     1.87x
tarta-zoot/srv                         217179989022738337   4492810461439647259          -       3674      2469              0     1.00x
tarta-zoot/usr                       12214213243060765090  17306702395865262834          -       3675      2477              0     2.58x
tarta-zoot/usr/local                  7542700368693813134   1059954157997659784          -       3676      2484              0     2.33x
tarta-zoot/var                       13414177124447929530  11764397504176937123          -       3677      2492              0     1.57x
tarta-zoot/var/lib                    6969944550407159241   6084753728494937404          -       3678      2499              0     2.34x
tarta-zoot/var/tmp                    6399468088048343912   1548692824635344277          -       3679      1218              0     3.95x

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13122
Closes #13125
2022-03-04 12:08:33 -08:00
наб a9a89755fa module: zcommon: zprop: common: zprop_width: namespace exceptions
Before this, /all/ numerical properties 1 (ZFS_PROP_CREATION,
ZPOOL_PROP_SIZE, VDEV_PROP_CAPACITY) would be non-fixed and
/all/ numerical properties 5 (ZFS_PROP_COMPRESSRATIO,
ZPOOL_PROP_HEALTH, VDEV_PROP_PSIZE) would be 8-wide

Realistically, this doesn't appear to be much of a problem

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13125
2022-03-04 12:08:30 -08:00
наб 6d8a00ff1f contrib/dracut: README: note rootfstype=zfs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13093
2022-03-03 10:45:31 -08:00
наб 6a41310c70 contrib/dracut: zfs-lib: export_all: replace with inline zpool export -a
07a3312f17, which introduced this in
October of 2014, didn't have zpool export -a available; we do

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13093
2022-03-03 10:45:24 -08:00
наб 93669c3812 contrib/dracut: export-zfs: simplify
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13093
2022-03-03 10:45:19 -08:00
наб 497bf14a4a zdb.8: cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13093
2022-03-03 10:44:49 -08:00
Rich Ercolani 56fa4aa96e Default to ON for compression
A simple change, but so many tests break with it,
and those are the majority of this.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13078
2022-03-03 10:43:38 -08:00
Brian Behlendorf 29a0ffe795 ZTS: Fix import_devices_missing.ksh
Related to commit 90b77a036.  Retry the `zpool export` if the pool
is "busy" indicating there is a process accessing the mount point.
This can happen after an import, allowing it to be retried will
avoid spurious test failures.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13169
2022-03-02 11:03:53 -08:00
Rich Ercolani fe2ea67ddd Re-apply 6ba2e72b, silence lint
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12978
2022-03-01 13:56:00 -08:00
Rich Ercolani e220635995 Re-apply a78f19d3
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12978
2022-03-01 13:55:51 -08:00
Rich Ercolani 234e9605c1 Explode zstd 1.4.5 into separate upstream files
It's much nicer to import from upstream this way, and compiles
faster too.

Everything in lib/ is unmodified 1.4.5.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12978
2022-03-01 13:55:12 -08:00
Aleksa Sarai 669683c4cb ZTS: switch to rsync for directory diffs
While "diff -r" is the most straightforward way of comparing directory
trees for differences, it has two major issues:

 * File metadata is not compared, which means that subtle bugs may be
   missed even if a test is written that exercises the buggy behaviour.
 * diff(1) doesn't know how to compare special files -- it assumes they
   are always different, which means that a test using diff(1) on
   special files will always fail (resulting in such tests not being
   added).

rsync can be used in a very similar manner to diff (with the -ni flags),
but has the additional benefit of being able to detect and resolve many
more differences between directory trees. In addition, rsync has a
standard set of features and flags while diffs feature set depends on
whether you're using GNU or BSD binutils.

Note that for several of the test cases we expect that file timestamps
will not match. For example, the ctime for a file creation or modify
event is stored in the intent log but not the mtime. Thus when replaying
the log the correct ctime is set but the current mtime is used. This is
the expected behavior, so to prevent these tests from failing, there's a
replay_directory_diff function which ignores those kinds of changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes #12588
2022-03-01 10:05:32 -08:00
Brian Behlendorf 19229e5f1e ZTS: Modify receive-o-x_props_override.ksh exception
As previously noted in #12272 the receive-o-x_props_override.ksh test
reliably fails on FreeBSD.  Since we don't expect this test to pass
move the exception from the "maybe" to "known" section.  This way we
don't retry the FAILED test when it is not expected to pass.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13167
2022-03-01 08:47:30 -08:00
Brian Behlendorf a6bf7ea16a ZTS: Move largest_pool_001_pos.ksh to Linux runfile
On FreeBSD pools are not allowed to be created using vdevs which are
backed by ZFS volumes.  This configuration is not recommended for any
supported platform, nevertheless the largest_pool_001_pos.ksh test
case makes use of it as a convenience.  This causes the test case to
fail reliably on FreeBSD.  The layout is still tolerated on Linux
so only perform this test on Linux.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13166
2022-03-01 08:46:00 -08:00
Savyasachee Jha b3ab290855 dracut: skip zfsexpandknoweldge when zfs_devs is present in dracut
PR 1711 (https://github.com/dracutdevs/dracut/pull/1711) adds a zfs_devs
function to dracut to detect the physical devices backing zfs pools. If
this function exists in the version of dracut this module is being
called from, then it does not need to run.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13121
2022-02-28 14:35:25 -08:00
наб 9e532d17f3 config: gcc != cc
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13145
Closes #13152
2022-02-26 11:28:26 -08:00
наб b08153f1c1 freebsd: libzfs: zmount: void-cast unused assert(3) variables
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13145
Closes #13152
2022-02-26 11:28:18 -08:00
наб 1ea68b6d38 config: check for -Wno-cast-function-type
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13145
Closes #13152
2022-02-26 11:26:55 -08:00
Paul Dagnelie 82226e4f44 Fix erroneous zstreamdump warning
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #13154
2022-02-26 11:24:27 -08:00
Brian Behlendorf ce91f973ec ztest: Fix ASSERT in ztest_objset_destroy_cb()
The dsl_destroy_snapshot() call in ztest_objset_destroy_cb() may
encounter a runtime error when the pool is out of space.  This is
similar to the error handling for the dsl_destroy_head() case,
but since dsl_destroy_snapshot() is implemented as a channel
program ECHRNG is returned instead of ENOSPC.  ECHRNG may also
be returned instead of EBUSY if there is a hold on the snapshot.

Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13155
2022-02-26 11:20:32 -08:00
наб 4f453dcc1f Fix FreeBSD reporting on reruns
Turns out, when your test-suite fails on FreeBSD the rerun logic
would fail as follows:

Results Summary
PASS	 1358
FAIL	   7
SKIP	  47

Running Time:	04:00:02
Percent passed:	96.2%
Log directory:	/var/tmp/test_results/20220225T092538
mktemp: illegal option -- p
usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
       mktemp [-d] [-q] [-u] -t prefix
mktemp: illegal option -- p
usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
       mktemp [-d] [-q] [-u] -t prefix
/usr/local/share/zfs/zfs-tests.sh: cannot create :
                                   No such file or directory
...

This change resolves a flaw from the original commit, 2320e6eb4
("Add zfs-test  facility to automatically rerun failing tests")

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13156
2022-02-26 11:19:05 -08:00
Tony Hutter f2f6c18f17 zed: Misc multipath autoreplace fixes
We recently had a case where our operators replaced a bad
multipathed disk, only to see it fail to autoreplace.  The
zed logs showed that the multipath replacement disk did not pass
the 'is_dm' test in zfs_process_add() even though it should have.
is_dm is set if there exists a sysfs entry for to the
underlying /dev/sd* paths for the multipath disk.  It's
possible this path didn't exist due to a race condition where
the sysfs paths weren't created at the time the udev event came
in to zed, but this was never verified.

This patch updates the check to look for udev properties that
indicate if the new autoreplace disk is an empty multipath disk,
rather than looking for the underlying sysfs entries. It also
adds in additional logging, and fixes a bug where zed allowed
you to use an already zfs-formatted disk from another pool
as a multipath auto-replacement disk.

Furthermore, while testing this patch, I also ran across a case
where a force-faulted disk did not have a ZPOOL_CONFIG_PHYS_PATH
entry in its config.  This prevented it from being autoreplaced.
I added additional logic to derive the PHYS_PATH from the PATH if
the PATH was a /dev/disk/by-vdev/ path.  For example, if PATH
was /dev/disk/by-vdev/L28, then PHYS_PATH would be L28.  This is
safe since by-vdev paths represent physical locations and do not
change between boots.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13023
2022-02-24 11:43:39 -08:00
Damian Szuberski e25bcf906a Fix directory detection in dkms.mkconf
Fix `zfs-dkms` installation on Debian-derived distributions by
aligning the directory detection logic to #13096.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #11449
Closes #13141
2022-02-24 10:33:48 -08:00
Damian Szuberski 78fad47cb3 Add Linux kmemleak support to ZTS
- Kmemleak `clear` is invoked right before every test case run.
- Kmemleak `scan` is requested right after each test case is finished.
- Kmemleak instrumentation is not used for
  setup/cleanup/pretest/posttest/failsafe stages to shorten the test
  case execution time.
- Kmemleak periodic scan is disabled (`scan=0`) before the test suite
  run to avoid interfering with the on-demand scan results.
- There are unavoidable potential false positives coming from kernel
  areas other than OpenZFS module.
- The ZTS with kmemleak enabled duration is increased by ~50%.

Example run
```
Running Time:   07:12:13
Percent passed: 98.3%

unreferenced object 0xffff9da82aea5410 (size 80):
  comm "kworker/u32:10", pid 942206, jiffies 4296749716 (age 2615.516s)
  hex dump (first 32 bytes):
    00 30 30 00 00 00 00 00 ff 8f 30 00 00 00 00 00  .00.......0.....
    51 e6 77 05 a8 9d ff ff 00 00 00 00 00 00 00 00  Q.w.............
  backtrace:
    [<000000005cf1fea2>] alloc_extent_state+0x1d/0xb0 [btrfs]
    [<0000000083f78ae5>] set_extent_bit+0x2ff/0x670 [btrfs]
    [<00000000de29249e>] lock_extent_bits+0x6b/0xa0 [btrfs]
    [<00000000b241f424>] lock_and_cleanup_extent_if_need+0xaf/0x1c0
       [btrfs]
    [<0000000093ca72b5>] btrfs_buffered_write+0x297/0x7d0 [btrfs]
    [<000000002c2938c8>] btrfs_file_write_iter+0x127/0x390 [btrfs]
    [<00000000b888f720>] do_iter_readv_writev+0x152/0x1b0
    [<00000000320f0bcc>] do_iter_write+0x7c/0x1c0
    [<000000000b5a8fe0>] lo_write_bvec+0x62/0x150 [loop]
    [<000000009aa03c73>] loop_process_work+0x250/0xbd0 [loop]
    [<00000000c7487d8a>] process_one_work+0x1f1/0x390
    [<000000000b236831>] worker_thread+0x53/0x3e0
    [<0000000023cb3e57>] kthread+0x127/0x150
    [<000000002d48676a>] ret_from_fork+0x22/0x30
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13084
2022-02-24 10:21:13 -08:00
Attila Fülöp 4d14a285b3 Linux 5.11 compat: x86 SIMD: fix kernel_fpu_{begin,end}() detection
Linux 5.11 changed kernel_fpu_begin() to an inlined function and
moved the functionality to kernel_fpu_begin_mask(). This breaks the
existing detection mechanism since it checks if kernel_fpu_begin is
an exported kernel symbol, which isn't the case for an inlined
function.

To avoid assumptions about internal implementation, replace
ZFS_LINUX_TEST_RESULT_SYMBOL in favor of  ZFS_LINUX_TEST_RESULT
which already makes sure kernel_fpu_{begin,end}() is usable by us.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13147
2022-02-24 09:23:41 -08:00
Jitendra Patidar 361a7e8211 log xattr=sa create/remove/update to ZIL
As such, there are no specific synchronous semantics defined for
the xattrs. But for xattr=on, it does log to ZIL and zil_commit() is
done, if sync=always is set on dataset. This provides sync semantics
for xattr=on with sync=always set on dataset.

For the xattr=sa implementation, it doesn't log to ZIL, so, even with
sync=always, xattrs are not guaranteed to be synced before xattr call
returns to caller. So, xattr can be lost if system crash happens, before
txg carrying xattr transaction is synced.

This change adds xattr=sa logging to ZIL on xattr create/remove/update
and xattrs are synced to ZIL (zil_commit() done) for sync=always.
This makes xattr=sa behavior similar to xattr=on.

Implementation notes:
The actual logging is fairly straight-forward and does not warrant
additional explanation.
However, it has been 14 years since we last added new TX types
to the ZIL [1], hence this is the first time we do it after the
introduction of zpool features. Therefore, here is an overview of the
feature activation and deactivation workflow:

1. The feature must be enabled. Otherwise, we don't log the new
    record type. This ensures compatibility with older software.
2. The feature is activated per-dataset, since the ZIL is per-dataset.
3. If the feature is enabled and dataset is not for zvol, any append to
    the ZIL chain will activate the feature for the dataset. Likewise
    for starting a new ZIL chain.
4. A dataset that doesn't have a ZIL chain has the feature deactivated.

We ensure (3) by activating on the first zil_commit() after the feature
was enabled. Since activating the features requires waiting for txg
sync, the first zil_commit() after enabling the feature will be slower
than usual. The downside is that this is really a conservative
approximation: even if we never append a 'TX_SETSAXATTR' to the ZIL
chain, we pay the penalty for feature activation. The upside is that the
user is in control of when we pay the penalty, i.e., upon enabling the
feature.

We ensure (4) by hooking into zil_sync(), where ZIL destroy actually
happens.

One more piece on feature activation, since it's spread across
multiple functions:

zil_commit()
  zil_process_commit_list()
    if lwb == NULL // first zil_commit since zil_open
      zil_create()
        if no log block pointer in ZIL header:
          if feature enabled and not active:
	    // CASE 1
            enable, COALESCE txg wait with dmu_tx that allocated the
	    log block
         else // log block was allocated earlier than this zil_open
          if feature enabled and not active:
	    // CASE 2
            enable, EXPLICIT txg wait
    else // already have an in-DRAM LWB
      if feature enabled and not active:
        // this happens when we enable the feature after zil_create
	// CASE 3
        enable, EXPLICIT txg wait

[1] https://github.com/illumos/illumos-gate/commit/da6c28aaf62fa55f0fdb8004aa40f88f23bf53f0

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #8768 
Closes #9078
2022-02-22 13:06:43 -08:00
Krzysztof Piecuch ccdcc1dbe8 systemd: read initconfdir
Systemd units do not read @initconfdir@ but refer to variables defined
there, also a minor fixup in zfs-scrub service file.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Krzysztof Piecuch <piecuch@kpiecuch.pl>
Closes #12946
2022-02-22 12:59:11 -08:00
наб 7454ca413c zpoo-features.7: raidz -> RAID-Z near dRAID
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:56:21 -08:00
наб 1f4376eb01 zpool-features.7: never-return-enabled consistency
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:56:14 -08:00
наб 58a4848132 zpool-features.7: zfs sendstreams
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:56:06 -08:00
наб 6f763af270 zpool-features.7: spurious line break in enabled_txg
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:55:58 -08:00
наб 2d232ca806 man: full stop at EOL
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:55:51 -08:00
наб 6985944e2f man: -based when -based
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:55:44 -08:00
наб a737b415d6 man: IO -> I/O; I/Os -> I/O operations again
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:55:37 -08:00
наб aafb89016b man: final RAIDZ -> RAID-Z
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:55:29 -08:00
наб 6bf6f5196b man: final VDEV -> vdev
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13116
2022-02-22 10:55:18 -08:00
Brian Behlendorf a5b3fab341 ZTS: Retry in import_rewind_config_changed.ksh
As explained by the disclaimer in the test case,

    "This test can fail since nothing guarantees that old
    MOS blocks aren't overwritten."

This behavior is expected and correct, but results in a
flaky test case which is problematic for the CI.  The best
we can do to resolve this is to retry the sub-test which
failed when the MOS blocks have clearly been overwritten.

When testing failures were rare enough that a single retry
should normally be sufficient.  However, we allow up to
five for good measure.

Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13119
2022-02-20 19:21:31 -08:00
Damian Szuberski 806739f991 Correct compilation errors reported by GCC 10/11
New `zfs_type_t` value `ZFS_TYPE_INVALID` is introduced.
Variable initialization is now possible to make GCC happy.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12167
Closes #13103
2022-02-20 19:20:00 -08:00
Ryan Moeller e41013078a libzfs: Fail making a dataset handle gracefully
When a dataset is in the process of being received it gets marked as
inconsistent and should not be used.  We should check for this when
opening a dataset handle in libzfs and return with an appropriate error
set, rather than hitting an abort because of the incomplete data.

zfs_open() passes errno to zfs_standard_error() after observing
make_dataset_handle() fail, which ends up aborting if errno is 0.
Set errno before returning where we know it has not been set already.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #13077
2022-02-18 13:09:03 -08:00
Damian Szuberski a014378dd0 spl: make 'spl_panic_halt' working for all cases
The default behavior where the serious ZFS errors cause FS thread to
stuck is very bad for some production scenario.

In some production scenarios (Linux), it is recommended to make real
kernel PANIC, where system can be rebooted by watchdog or kernel itself.
This patch enables coherent handling of spl_panic_halt parameter.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: Wojciech Nizinski <w.nizinski@grinn-global.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12120
Closes #13109
2022-02-18 11:43:11 -08:00
наб 15b982492a cstyle: forbid ARGSUSED
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:35:06 -08:00
наб a2995e7641 config: add -Wextra (sans sign-compare and missing-field-initializers)
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:35:02 -08:00
наб 642827ecda module: zfs: zcp_get: fix uninitialised warning
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:56 -08:00
наб b7c42ce5b2 raidz_test: silence unsigned >=0 warnings
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:52 -08:00
наб a215c3e834 libzpool: kernel: silence unrelated-function-pointer cast warning
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:47 -08:00
наб d8552689d1 libnvpair: json: suppress wchar_t >=0 warnings
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:43 -08:00
наб 4e1f1035d0 config: prune unused -Wno-bool-compare checks
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:37 -08:00
наб 72154bd6c9 libtpool: -Wno-clobbered
Also remove -Wno-unused-but-set-variable

Upstream-bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61118
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:25 -08:00
наб 0ea6510aa0 module: icp: remove useless assert
Which produces a warning since uints are, by definition, >=0

Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:18 -08:00
наб 5cf3c24fd8 libzfs: sendrecv: fix NULL arithmetic UB
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:14 -08:00
наб 46c7a80280 userspace: mark arguments used
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:08 -08:00
наб ef70eff198 module: mark arguments used
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:34:03 -08:00
наб 51c747de43 linux: module/zfs: vnops: make null_xattr static
Reviewed-by: Alejandro Colomar <alx.manpages@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13110
2022-02-18 09:33:20 -08:00
Brian Behlendorf 7901b62685 ZTS: Fix vdev_zaps_004_pos.ksh
When attaching a vdev to a mirror wait for the resilver to complete
before invoking `zdb` to inspect the pool.  This ensures the pool is
essentially idle which allows `zdb` to open the imported pool reliably.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13112 
Closes #6935
2022-02-17 12:09:06 -08:00
Savyasachee Jha 5eae5a88b2 Multiple dracut module install script cleanups
- Replaced intances of `dracut_install` with `inst_simple`
- Removed calls to `test -x mark_hostonly` because the function is an
inbuilt dracut function
- Removed redundant installation of `systemd-ask-password` and
`systemd-tty-ask-password-agent` because they are already installed by
the systemd module. There is no need to install them again
- Removed multiple calls to the `mark_hostonly` function because the
`inst_simple` has a command-line switch for it
- Cleaned up the installation of the `zpool.cache`, `vdev_id.conf` and
`hostid` files to make the logic easier to follow
- Cleaned up and simplified the systemd service installation logic by
invoking systemctl instead of creating symlinks manually
- Replaced various hard-coded paths with dracut equivalents to better
conform with expected dracut behaviour
- Removed redundant call to `mkdir` (`inst_simple` creates the parent
directory if it does not exist on the destination initrd)

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010
2022-02-16 15:28:18 -08:00
Savyasachee Jha ed66eafd05 Remove absolute paths to udev rules and binaries for dracut
Since dracut functions can locate both udev rules and binaries, there is
no point in keeping absolute paths in the module setup script. It also
breaks the --sysroot option in dracut. This commit removes mentions to
absolute paths for binaries and udev rules.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010
2022-02-16 15:26:08 -08:00
Savyasachee Jha 7f5a01bdd3 Make dracut fail if essential files cannot be installed
Dracut will now fail in initramfs generation if essential files cannot
be installed.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010
2022-02-16 15:25:59 -08:00
Savyasachee Jha 44b0380a35 Make better use of dracut functions when building initramfs
Setting up the module involves multiple redundant calls to a bunch of
dracut functions wheich can be combined into one. Additionally, the mass
of code required to load libgcc_s.so* can be replaced with one dracut
function. This has the additional effect of removing errors involving
the non-installation of libgcc_s.so* which are seen on debian bullseye
when using version 2.1.2-1~bpo11+1 from the backports repository.

The systemd binaries are separated out into their own `dracut_install`
function call so they do not get pulled in when dracut does not load the
systemd module.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Savyasachee Jha <hi@savyasacheejha.com>
Closes #13010
2022-02-16 15:23:30 -08:00
Damian Szuberski 72f3c8b130 Fix Linux kernel directories detection
Most modern Linux distributions have separate locations for bare
source and prebuilt ("build") files. Additionally, there are `source`
and `build` symlinks in `/lib/modules/$(KERNEL_VERSION)` pointing to
them. The order of directory search is now:
- `configure` command line values if both `--with-linux` and
  `--with-linux-obj` were defined
- If only `--with-linux` was defined, `--with-linux-obj` is assumed
  to have the same value as `--with-linux`
- If neither `--with-linux` nor `--with-linux-obj` were defined
  autodetection is used:
  - `/lib/modules/$(uname -r)/{source,build}` respectively, if exist
  - The first directory in `/lib/modules` with the highest version
    number according to `sort -V` which contains `source` and `build`
    symlinks/directories
  - The first directory matching `/usr/src/kernels/*` and
    `/usr/src/linux-*` with the highest version number according to
    `sort -V`. Here the source and prebuilt directories are assumed
    to be the same.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #9935
Closes #13096
2022-02-16 15:17:16 -08:00
George Amanakis 52a36bd41a Enable encrypted raw sending to pools with greater ashift
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.

This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.

Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13067 
Closes #13074
2022-02-16 11:52:02 -08:00
Damian Szuberski ba6005175e checkabi/storeabi relevant only to x86_64
The stored ABI files are for the x86_64 architecture.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #11345
Closes #13104
2022-02-16 11:48:01 -08:00
Rich Ercolani 0df22afea2 Colorize the Github test output
Let's color our workflow test output for better readability.

Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13000
2022-02-16 11:40:25 -08:00
наб 67de71e644 libzfs: calculate receive times with sub-second precision
Provide two digits of precision when reporting send/receive
times.  Tiny snapshots may take significantly less than a second
and rounding up to a full second can introduce a significant error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13047
2022-02-15 16:42:30 -08:00
Ryan Moeller 5c0061345b Cross-platform xattr user namespace compatibility
ZFS on Linux originally implemented xattr namespaces in a way that is
incompatible with other operating systems.  On illumos, xattrs do not
have namespaces.  Every xattr name is visible.  FreeBSD has two
universally defined namespaces: EXTATTR_NAMESPACE_USER and
EXTATTR_NAMESPACE_SYSTEM.  The system namespace is used for protected
FreeBSD-specific attributes such as MAC labels and pnfs state.  These
attributes have the namespace string "freebsd:system:" prefixed to the
name in the encoding scheme used by ZFS.  The user namespace is used
for general purpose user attributes and obeys normal access control
mechanisms.  These attributes have no namespace string prefixed, so
xattrs written on illumos are accessible in the user namespace on
FreeBSD, and xattrs written to the user namespace on FreeBSD are
accessible by the same name on illumos.

Linux has several xattr namespaces.  On Linux, ZFS encodes the
namespace in the xattr name for every namespace, including the user
namespace.  As a consequence, an xattr in the user namespace with the
name "foo" is stored by ZFS with the name "user.foo" and therefore
appears on FreeBSD and illumos to have the name "user.foo" rather than
"foo".  Conversely, none of the xattrs written on FreeBSD or illumos
are accessible on Linux unless the name happens to be prefixed with one
of the Linux xattr namespaces, in which case the namespace is stripped
from the name.  This makes xattrs entirely incompatible between Linux
and other platforms.

We want to make the encoding of user namespace xattrs compatible across
platforms.  A critical requirement of this compatibility is for xattrs
from existing pools from FreeBSD and illumos to be accessible by the
same names in the user namespace on Linux.  It is also necessary that
existing pools with xattrs written by Linux retain access to those
xattrs by the same names on Linux.  Making user namespace xattrs from
Linux accessible by the correct names on other platforms is important.
The handling of other namespaces is not required to be consistent.

Add a fallback mechanism for listing and getting xattrs to treat xattrs
as being in the user namespace if they do not match a known prefix.

Do not allow setting or getting xattrs with a name that is prefixed
with one of the namespace names used by ZFS on supported platforms.

Allow choosing between legacy illumos and FreeBSD compatibility and
legacy Linux compatibility with a new tunable.  This facilitates
replication and migration of pools between hosts with different
compatibility needs.

The tunable controls whether or not to prefix the namespace to the
name.  If the xattr is already present with the alternate prefix,
remove it so only the new version persists.  By default the platform's
existing convention is used.

Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11919
2022-02-15 16:35:30 -08:00
наб 666749806d module: icp: remove provider stats
These were all folded into a single kstat at
  /proc/spl/kstat/kcf/NONAME_provider_stats
with no way to know which one it actually was,
and only the AES and SHA (so not Skein) ones were ever updated

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:26:08 -08:00
наб bf86638687 module: icp: enforce KCF_{OPS_CLASSSIZE,MAXMECHTAB}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:26:04 -08:00
наб e013057492 module: icp: remove unused pd_{remove_cv,hash_limit}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:26:00 -08:00
наб de0ec5e7df module: icp: remove vestigia of crypto sessions
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:56 -08:00
наб cf497e18df module: icp: remove unused (and mostly faked) cm_{{min,max}_key_length,mech_flags}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:52 -08:00
наб 11320b4cdf module: icp: remove unused crypto_provider_handle_t
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:46 -08:00
наб f5e7d918a7 module: icp: remove pre-set entries from mechtabs
They don't do anything except clogging up the AVL tree

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:41 -08:00
наб df7b54f1d9 module: icp: rip out insane crypto_req_handle_t mechanism, inline KM_SLEEP
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:37 -08:00
наб 15ec086396 include: crypto: clean out api.h
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:32 -08:00
наб 42dbc2025a module: icp: remove unused headers. Migrate {ops => sched}_impl
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:28 -08:00
наб 1949be46c3 include: crypto: clean out unused SYSCALL32 and flags
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:24 -08:00
наб f43748f6e1 module: icp: remove algorithm name defines used only in the default mechtab
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:21 -08:00
наб d223af9bbc include: crypto: remove unused algorithm name defines
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:17 -08:00
наб 64e82cea13 module: icp: remove set-but-unused cd_miscdata
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:13 -08:00
наб 739afd9475 module: icp: fold away all key formats except CRYPTO_KEY_RAW
It's the only one actually used

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:07 -08:00
наб 1018e81e30 module: icp: remove unused CRYPTO_* error codes
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:25:03 -08:00
наб 7eacb87112 module: icp: rip out modhash. Replace the one user with AVL
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:59 -08:00
наб 1cb6fa2cb8 module: icp: remove unused me_mutex
It only needs to be locked if dynamic changes can occur. They can't.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:54 -08:00
наб cb6e9c3f5f module: icp: remove unused me_threshold
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:50 -08:00
наб 7f90cf3043 module: icp: remove unused struct crypto_ctx::cc_{session,flags,opstate}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:46 -08:00
наб a288428d83 module: icp: remove unused gswq, kcfpool, [as]req_cache, reqid_table, obsolete kstat
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:42 -08:00
наб 1c17d2940c module: icp: remove unused notification framework
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:37 -08:00
наб 3fd5ead75e module: icp: remove unused kcf_op_{group,type}, req_params, ...
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:33 -08:00
наб f3c3a6d47e module: icp: remove unused p[di]_flags
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:29 -08:00
наб d77702035a module: icp: remove unused CRYPTO_{NOTIFY_OPDONE,SKIP_REQID,RESTRICTED}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:24 -08:00
наб eb1e09b7ec module: icp: remove unused CRYPTO_ALWAYS_QUEUE
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:19 -08:00
наб 65a613b70d module: icp: remove unused kcf_digest.c
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:14 -08:00
наб 255bc38e6f module: icp: drop software provider generation numbers
We register all providers at once, before anything happens

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:09 -08:00
наб bf3fffe70d module: icp: remove unused kcf_mac operations
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:24:04 -08:00
наб 2c2f955aae module: icp: remove unused kcf_cipher operations
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:59 -08:00
наб 710657f51d module: icp: remove other provider types
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:53 -08:00
наб 167ced3fb1 module: icp: use original mechanisms
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:49 -08:00
наб bcee18d4e0 module: icp: use original description
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:45 -08:00
наб b0502ab097 module: icp: guarantee the ops vector is persistent
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:40 -08:00
наб d59a7fae40 module: icp: have a static 8 providers
This is currently twice the amount we actually have (sha[12], skein,
aes), and 512 * sizeof(void*) = 4096: 128x more than we need and a waste
of most of a page in the kernel address space

Plus, there's no need to actually allocate it dynamically: it's always
got a static size. Put it in .data

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:33 -08:00
наб 464700ae02 module: icp: spi: crypto_ops_t: remove unused op types
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:28 -08:00
наб f5896e2bdf module: icp: spi: flatten struct crypto_ops, crypto_provider_info
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:24 -08:00
наб 959b9d6392 module: icp: spi: remove crypto_control_ops_t
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:23:19 -08:00
наб 9cdf015d0a module: icp: spi: remove crypto_{provider,op}_notification()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12901
2022-02-15 16:22:57 -08:00
Jorgen Lundman 4759342a5e Add spa _os() hooks
Add hooks for when spa is created, exported, activated and
deactivated. Used by macOS to attach iokit, and lock
kext as busy (to stop unloads).

Userland, Linux, and, FreeBSD have empty stubs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12801
2022-02-15 15:54:25 -08:00
George Amanakis 2fb52853dc Avoid dirtying the final TXGs when exporting a pool
There are two codepaths than can dirty final TXGs:

1) If calling spa_export_common()->spa_unload()->
   spa_unload_log_sm_flush_all() after the spa_final_txg is set, then
   spa_sync()->spa_flush_metaslabs() may end up dirtying the final
   TXGs. Then we have the following panic:
   Call Trace:
    <TASK>
    dump_stack_lvl+0x46/0x62
    spl_panic+0xea/0x102 [spl]
    dbuf_dirty+0xcd6/0x11b0 [zfs]
    zap_lockdir_impl+0x321/0x590 [zfs]
    zap_lockdir+0xed/0x150 [zfs]
    zap_update+0x69/0x250 [zfs]
    feature_sync+0x5f/0x190 [zfs]
    space_map_alloc+0x83/0xc0 [zfs]
    spa_generate_syncing_log_sm+0x10b/0x2f0 [zfs]
    spa_flush_metaslabs+0xb2/0x350 [zfs]
    spa_sync_iterate_to_convergence+0x15a/0x320 [zfs]
    spa_sync+0x2e0/0x840 [zfs]
    txg_sync_thread+0x2b1/0x3f0 [zfs]
    thread_generic_wrapper+0x62/0xa0 [spl]
    kthread+0x127/0x150
    ret_from_fork+0x22/0x30
    </TASK>

2) Calling vdev_*_stop_all() for a second time in spa_unload() after
   spa_export_common() unnecessarily delays the final TXGs beyond what
   spa_final_txg is set at.

Fix this by performing the check and call for
spa_unload_log_sm_flush_all() before the spa_final_txg is set in
spa_export_common(). Also check if the spa_final_txg has already been
set in spa_unload() and skip those calls in this case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
External-issue: https://www.illumos.org/issues/9081
Closes #13048 
Closes #13098
2022-02-15 15:48:59 -08:00
Jorgen Lundman 9a70e97fe1 Rename fallthrough to zfs_fallthrough
Unfortunately macOS has obj-C keyword "fallthrough" in the OS headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #13097
2022-02-15 08:58:59 -08:00
наб ae07fc1393 libzfs: sendrecv: fix missing error output for invalid properties
Fixes: 7633c0aedd ("libzfs: sendrecv:
fix unused, remove argsused")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13100
Closes #13101
2022-02-14 13:07:56 -08:00
наб 0c8aab9586 zfs-receive.8: properly unlight = in option setting
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13101
2022-02-14 13:07:50 -08:00
наб 4631bf7bfb zfs-receive.8: fix Op Fl x Ar encryption in running text
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13101
2022-02-14 13:07:45 -08:00
Rich Ercolani dec1eef4c5 Silence uninitialized warnings in dsl_dataset.c
On newer compilers, dsl_dataset.c now warns (or, on DEBUG, errors)
on uninitialized variable usage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #13083
2022-02-14 10:04:50 -08:00
Brian Behlendorf 9f734e81f4 ZTS: Fix checkpoint_ro_rewind.ksh
Related to commit 90b77a036.  Retry the `zpool export` if the pool is
"busy" indicating there is a process accessing the mount point.  This
can happen after an import and allowing it to be retried will avoid
spurious test failures.

Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13092
2022-02-13 14:22:49 -08:00
Brian Behlendorf b7baf49bd3 ZTS: Fix zpool_expand_001_pos
The dRAID section of the zpool_expand_001_pos test would reliably fail
because the calculated expansion size assumed the dRAID top-level vdev
was created with a distributed spare.  Create the vdev as expected to
resolve the test failure.

This test case flaw was accidentally caused by changing the default
number of dRAID distributed spares from one to zero while dRAID was
being developed.

Additionally, remove zpool_expand_005_pos from the list of possible
faulty tests.  It appears to be passing consistently in my testing.

Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13091
2022-02-13 14:22:00 -08:00
Brian Behlendorf 10271af05c Fix gcc warning in kfpu_begin()
Observed when building on CentOS 8 Stream.  Remove the `out`
label at the end of the function and instead return.

  linux/simd_x86.h: In function 'kfpu_begin':
  linux/simd_x86.h:337:1: error: label at end of compound statement

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13089
2022-02-11 14:31:45 -08:00
Paul Zuchowski fe804dc412 ZTS: Fix problem with zdb_objset_id test
Use large numbers for datasets with numeric names to avoid name
and id collisions.  Sporadic test failures were observed when the
test would create $TESTPOOL/100 with an objset ID of 100.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #13087
2022-02-11 13:32:08 -08:00
наб 4ea4a46c27 zpool-import.8: WARNING should be emphasised
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13082
2022-02-11 11:50:54 -08:00
наб a9e62ec368 zpool-import.8: newpool is Ar, not Sy
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13082
2022-02-11 11:50:14 -08:00
наб 17fd8f6b0a zpoolprops.7: document leaked
It's noted very scarcely in the code as it stands, indeed the only
actual comment on this is

  /*
   * We have finished background destroying, but there is still
   * some space left in the dp_free_dir. Transfer this leaked
   * space to the dp_leak_dir.
   */

Introduced in fbeddd60b7 ("Illumos 4390 -
I/O errors can corrupt space map when deleting fs/vol"),
which explains, alongside the references, that this can only happen
with a corrupted pool

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13081
2022-02-11 11:49:07 -08:00
Zhu Chuang 0a222408f1 Correct a typo in zfs-receive.8
Should be  `-o keyformat=passphrase` instead of `-o -keyformat=passphrase`

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chuang Zhu <chuang@melty.land>
Closes #13072
2022-02-11 11:47:35 -08:00
наб 4a4e168392 contrib: rename initrd READMEs to README.md
It's an unhelpful naming scheme and one that breaks GitHub autoreadme.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13017
2022-02-11 11:44:27 -08:00
наб b8d9679f36 contrib: dracut: zfs-{rollback,snapshot}-bootfs: don't shell for test
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13017
2022-02-11 11:43:07 -08:00
наб 34ec548980 dracut: README: rewrite
The documentation in the dracut README has grown stale and inaccurate.
Remove the stale content and write a short and useful reference manual.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13012
Closes #13017
2022-02-11 11:42:07 -08:00
Brian Behlendorf 399159f7fb ZTS: Fix zvol_misc_volmode test
Changing volmode may need to remove minors, which could be open, so
call udev_wait() before we "zfs set volmode=<value>".  This ensures
no udev process has the zvol open (i.e. blkid) and the kernel
zvol_remove_minor_impl() function won't skip removing the in use
device.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13075
2022-02-09 17:00:03 -08:00
drowfx 3819aaaff9 Add dataset_kstats_update.. to mmap read/write paths
This allows reads/writes caused by accesses to mmap files to be
accounted correctly in the per-dataset kstats for both Linux and
FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matthias Blankertz <matthias@blankertz.org>
Closes #12994 
Closes #13044
2022-02-09 14:41:42 -08:00
Attila Fülöp 68ddc06b61 Receive checks should allow unencrypted child datasets
dmu_recv_begin_check() unconditionally sets the DS_HOLD_FLAG_DECRYPT
flag before calling dsl_dataset_hold_flags(). If the key on the
receiving side isn't loaded or the send stream contains embedded
blocks, the receive check fails for a stream which is perfectly
valid and could be received without any problem. This seems like
a remnant of the initial design, where unencrypted datasets below
encrypted ones weren't allowed.

Add a condition to set `DS_HOLD_FLAG_DECRYPT` only for encrypted
datasets, modify an existing test to detect this regression and add
a test for raw replication streams.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13033 
Closes #13076
2022-02-09 14:38:33 -08:00
Jorgen Lundman c28d6ab08b Rename EMPTY_TASKQ into taskq_empty
To follow a change in illumos taskq

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12802
2022-02-09 15:04:26 -07:00
Damian Szuberski 4eea717c4f Propagate KERNEL_* to *.spec
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Peter Levine <plevine457@gmail.com>
Closes #13046
2022-02-09 13:27:11 -08:00
Peter Levine b66140c6ad Add support for $KERNEL_{CC,LD,LLVM} variables
Currently, $(CC), $(LD), and $(LLVM) variables aren't passed to kbuild
while building modules.  This causes modules to build with the default
GNU GCC toolchain and prevents experimenting with other toolchains such
as CLANG/LLVM.  It can also lead to build failure if the CFLAGS/LDFLAGS
passed are incompatible with gcc/ld.

Pass $KERNEL_CC, $KERNEL_LD, and $KERNEL_LLVM as $(CC), $(LD), and
$(LLVM), respectively, to kbuild for each that is defined in the
environment.  This should take care of the majority of alternative
toolchain use cases.

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Peter Levine <plevine457@gmail.com>
Closes #13046
2022-02-09 13:24:22 -08:00
Attila Fülöp 8e94ac0e36 Linux 5.16 compat: don't use XSTATE_XSAVE to save FPU state
Linux 5.16 moved XSTATE_XSAVE and XSTATE_XRESTORE out of our reach,
so add our own XSAVE{,OPT,S} code and use it for Linux 5.16.

Please note that this differs from previous behavior in that it
won't handle exceptions created by XSAVE an XRSTOR. This is sensible
for three reasons.

 - Exceptions during XSAVE and XRSTOR can only occur if the feature
   is not supported or enabled or the memory operand isn't aligned
   on a 64 byte boundary. If this happens something else went
   terribly wrong, and it may be better to stop execution.

 - Previously we just printed a warning and didn't handle the fault,
   this is arguable for the above reason.

 - All other *SAVE instruction also don't handle exceptions, so this
   at least aligns behavior.

Finally add a test to catch such a regression in the future.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13042
Closes #13059
2022-02-09 12:50:10 -08:00
Damian Szuberski e2909fae8f Remove unused files from GitHub runners
Majority of the software installed by default in GitHub runners is
irrelevant to OpenZFS. Reclaimed space could be used for more/bigger
vdev files. File deletion happens in the background, leveraging
`systemd-run` - the workflow is not significantly slowed down.

Before
```
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   53G   31G  63% /
```

After
```
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   15G   70G  18% /
```

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13066
2022-02-09 11:50:13 -08:00
Damian Szuberski d7a67402a8 mount.zfs -o zfsutil leverages zfs_mount_at()
Using `zfs_mount_at()` gives opportunity to properly propagate
mountopts from what's stored in a pool to the `mount(2)` syscall
invocation. It fixes cases when mount options are set to incorrect
values and rectification is impossible (e. g. Linux initrd boot
sequence in #7947).
Moved debug information printing after all variables are
initialized - printed text reflects what is passed to `mount(2)`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Issue #7947 
Closes #13021
2022-02-08 10:53:23 -08:00
Tomohiro Kusumi 5f65d008e9 Remove unneeded "extern inline" function declarations
All of these externs are already #included as static inline
functions via corresponding headers.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #13073
2022-02-08 10:48:57 -08:00
Damian Szuberski 8df0bde321 Add --enable=all to ShellCheck by default
Change enforced shell type from `dash` to `sh` and excluded
`SC2039` and `SC3043` by default. `local` keyword is accepted by all
POSIX shells from practical point of view. There is no need anymore
to enforce dash so `local` is accepted.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13020
2022-02-07 11:59:09 -08:00
Damian Szuberski add15e9539 Extract workflows dependencies
- Move build dependencies moved to
  `.github/workflows/build-dependencies.txt` shared among workflows.

- Change `ubuntu-latest` -> `ubuntu-20.04` to avoid unexpected
  runner environment updates in `zloop` workflow.

- Change `ubuntu-20.04` -> `ubuntu-latest` to track changes in
  runner environment in `checkstyle` workflow.

- Kernel buffer is flushed before ZTS invocation to avoid storing
  the same data after each test case run.

- `make` is invoked with consistent set of options to reduce
  clutter in logs.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13037
2022-02-07 11:44:17 -08:00
Christian Schwarz 1dccfd7a38 zvol: make calls to platform ops static
There's no need to make the platform ops dynamic dispatch.

This change replaces the dynamic dispatch with static calls to the
platform-specific functions.
To avoid name collisions, prefix all platform-specific functions
with `zvol_os_`.
I actually find `zvol_..._os` slightly nicer to read in the calling
code, but having it as a prefix is useful.

Advantage:
- easier jump-to-definition / grepping
- potential benefits to static analysis
- better legibility

Future work: also prefix remaining `static` functions in zvol_os.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #12965
2022-02-07 10:24:38 -08:00
Alexander Motin f2c5bc150e Add more control/visibility to spa_load_verify().
Use error thresholds from policy to control whether to scrub data
and/or metadata.  If threshold is set to UINT64_MAX, then caller
probably does not care about result and we may skip that part.

By default import neither set the data error threshold nor read
the error counter, so skip the data scrub for faster import.
Metadata are still scrubbed and fail if even single error found.

While there just for symmetry return number of metadata errors in
case threshold is not set to zero and we haven't reached it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #13022
2022-02-04 13:06:38 -08:00
Christian Schwarz 2f14adacaa zfs_set_prop_nvlist: make it easier to spot the call to dsl_props_set
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #12963
2022-02-04 11:52:10 -08:00
Christian Schwarz db87580076 dsl_dir_tempreserve_impl: remove unused deferred variable
The following commit moved the users of `deferred` into function
dsl_pool_unreserved_space:

    commit d2734cce68
    Author: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
    Date:   Fri Dec 16 14:11:29 2016 -0800

        OpenZFS 9166 - zfs storage pool checkpoint

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #13056
2022-02-04 10:33:34 -08:00
Brian Behlendorf 100b8950f4 ZTS: Update enospc_002_pos test case
The on-disk cost of creating a snapshot or bookmark is sufficiently low
that it is difficult to make it reliably fail even when the pool is
"full".  In order to avoid false positives remove these two checks from
the test case.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13060
2022-02-04 09:36:46 -08:00
Pawel Jakub Dawidek 3d244b4881 Fix clearing set-uid and set-gid bits on a file when replying a write
POSIX requires that set-uid and set-gid bits to be removed when an
unprivileged user writes to a file and ZFS does that during normal
operation.

The problem arrises when the write is stored in the ZIL and replayed.
During replay we have no access to original credentials of the process
doing the write, so zfs_write() will be performed with the root
credentials. When root is doing the write set-uid and set-gid bits
are not removed from the file.

To correct that, log a separate TX_SETATTR entry that removed those bits
on first write to such file.

Idea from:	Christian Schwarz

Add test for ZIL replay of setuid/setgid clearing.

Improve various edge cases when clearing setid bits:
- The setid bits can be readded during a single write, so make sure to check
  for them on every chunk write.
- Log TX_SETATTR record at most once per transaction group (if the setid bits
  are keep coming back).
- Move zfs_log_setattr() outside of zp->z_acl_lock.

Reviewed-by: Dan McDonald <danmcd@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Christian Schwarz <me@cschwarz.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #13027
2022-02-03 14:37:57 -08:00
Damian Szuberski 63652e1546 Add --enable-asan and --enable-ubsan switches
`configure` now accepts `--enable-asan` and `--enable-ubsan` switches
which results in passing `-fsanitize=address`
and `-fsanitize=undefined`, respectively, to the compiler. Those
flags are enabled in GitHub workflows for ZTS and zloop. Errors
reported by both instrumentations are corrected, except for:

- Memory leak reporting is (temporarily) suppressed. The cost of
  fixing them is relatively high compared to the gains.

- Checksum computing functions in `module/zcommon/zfs_fletcher*`
  have UBSan errors suppressed. It is completely impractical
  to enforce 64-byte payload alignment there due to performance
  impact.

- There's no ASan heap poisoning in `module/zstd/lib/zstd.c`. A custom
  memory allocator is used there rendering that measure
  unfeasible.

- Memory leaks detection has to be suppressed for `cmd/zvol_id`.
  `zvol_id` is run by udev with the help of `ptrace(2)`. Tracing is
  incompatible with memory leaks detection.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12928
2022-02-03 14:35:38 -08:00
Phil Kauffman aa9905d89b zed-functions.sh: escape newline to produce valid json
This was discovered when using Discords Slack compatible webhook.

Slack webhooks works without the escape, however Discord rightly refuses
the POST as it contains invalid JSON.

https://discord.com/developers/docs/resources/webhook#execute-slackcompatible-webhook

Valid (while escaping the newline:
```
+ msg_json='{"text": "*ZFS scrub_finish error for test on quartz*\nZFS has detected a data error:\n\n   eid: 124\n class: scrub_finish\n  host: quartz\n  time: \n error: \n objid: :\n  pool: test\n"}'
```

Invalid (no escape):
```
+ msg_json='{"text": "*ZFS scrub_finish error for test on quartz*
ZFS has detected a data error:\n\n   eid: 124\n class: scrub_finish\n  host: quartz\n  time: \n error: \n objid: :\n  pool: test\n"}'
```
The new line gets rendered and not sent inside the JSON as intended.

```
++ curl -X POST https://discord.com/api/webhooks/{webhook.id}/{webhook.token}/slack --header 'Content-Type: application/json' --data-binary '{"text": "*ZFS scrub_finish error for test on quartz*
ZFS has detected a data error:\n\n   eid: 124\n class: scrub_finish\n  host: quartz\n  time: \n error: \n objid: :\n  pool: test\n"}'
+ msg_out='{"message": "Cannot send an empty message", "code": 50006}'
```

Test method:
`root@quartz:/etc/zfs/zed.d# export ZED_ZEDLET_DIR=/etc/zfs/zed.d; export ZEVENT_EID=124; export ZEVENT_SUBCLASS=scrub_finish; export ZEVENT_POOL=test; export ZED_NOTIFY_DATA=1; bash -x ./data-notify.sh`

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Philip Kauffman <philip@kauffman.me>
Closes #13049
2022-02-03 14:31:57 -08:00
Akash B 7b468ed2d8 Add enumerated vdev names to 'zpool iostat -v' and 'zpool list -v'
This commit adds enumerated names to disambiguate between the
different vdevs. Previously only 'zpool status' showed enumerated
vdev names, now 'zpool list -v' and 'zpool iostat -v' also shows
the enumerated vdev names.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes #12510
Closes #13031
2022-02-03 14:29:29 -08:00
George Amanakis f3b08dfd7f Report dnodes with faulty bonuslen
In files created/modified before 4254acb there may be a corruption of
xattrs which is not reported during scrub and normal send/receive. It
manifests only as an error when raw sending/receiving. This happens
because currently only the raw receive path checks for discrepancies
between the dnode bonus length and the spill pointer flag.

In case we encounter a dnode whose bonus length is greater than the
predicted one, we should report an error. Modify in this regard
dnode_sync() with an assertion at the end, dump_dnode() to error out,
dsl_scan_recurse() to report errors during a scrub, and zstream to
report a warning when dumping. Also added a test to verify spill blocks
are sent correctly in a raw send.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12720 
Closes #13014
2022-02-03 14:28:19 -08:00
наб 142a5010ab Notice if the test-runner dies
Currently, we seem to only care if the results collector errors.
We also should care if the test-runner died.

Authored-by: Rich Ercolani <rincebrain@gmail.com>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by:  Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12998
2022-02-02 14:17:46 -08:00
Tomohiro Kusumi 955bf4dc04 Fix trivial calloc(3) arguments order
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #13052
2022-02-02 11:27:35 -08:00
Ryan Moeller 15aa38690e Simplify resume token generation
* Improve naming.
* Reduce indentation.
* Avoid boilerplate logic duplication.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:04:08 -08:00
Ryan Moeller d1a38ee742 libzfs_sendrecv: Factor out lzc_flags_from_resume_nvl
Improve the readability of zfs_send_resume_impl by moving resume nvl
decoding into a separate helper function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:04:01 -08:00
Ryan Moeller 102eb6733c libzfs_sendrecv: Refactor find_redact_book
Factor out get_bookmarks, find_redact_pair, and get_redact_complete
helper functions to improve the readability of find_redact_book.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:03:53 -08:00
Ryan Moeller 2fafbfde9d libzfs_sendrecv: Fix some comment style nits
* Capitalize and punctuate complete sentences.
* Add a blank line between functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:03:45 -08:00
Ryan Moeller 45932229d5 libzfs_sendrecv: Style pass on dump_filesystems
* Add a high level comment.
* Eliminate unnecessarily void arg.
* Capitalize and punctuate complete sentences in comments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:03:38 -08:00
Ryan Moeller 1910a30848 libzfs_sendrecv: Style pass on dump_filesystem
* Add high level comments.
* Eliminate unnecessarily void arg.
* Avoid unnecessary line wrapping.
* Initialize sdd fields with the correct types.
* Remove extra whitespace.
* Refactor replication checks for clarity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:03:30 -08:00
Ryan Moeller 1ae7835177 libzfs_sendrecv: Style pass on dump_snapshot
* Add a high level comment.
* Avoid unnecessary line wrapping.
* Simplify size accounting logic.
* Eliminate unnecessary buffer on the stack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:03:21 -08:00
Ryan Moeller 447e90e360 libzfs_sendrecv: Style pass on send_print_verbose
* Add missing dgettext calls.
* Avoid unnecessary line wraps.
* Factor out duplicated parsable check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:03:13 -08:00
Ryan Moeller fbcc25c9b2 libzfs_sendrecv: Pull header line out of loop
This makes the header print before the sleep as well, which is fine.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:03:05 -08:00
Ryan Moeller 24f1aa023a libzfs_sendrecv: Initialize in case of failure
In zfs_send_progress, initialize \*bytes_written and \*blocks_visited
in case we have to return early due to ioctl failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:02:58 -08:00
Ryan Moeller 25074b472a libzfs_sendrecv: Style pass on dump_ioctl
* Don't bother building a debug nvlist if we can't return it.
* Save errno after ioctl failure in case snprintf clobbers it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:02:50 -08:00
Ryan Moeller a8747c0403 libzfs_sendrecv: Style pass on zfs_send_space
* Reduce indentation.
* Move locals closer to use.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:02:41 -08:00
Ryan Moeller 0c1c746a74 libzfs_sendrecv: Style pass on send_iterate_fs
* Capitalize and punctuate complete sentences in comments.
* Separate out a group of locals to add a comment on their purpose.
* Remove unnecessary line wrapping.
* Make it clear that dds_origin is a string by using explicit character
  comparison to check for an empty string, rather than implictly
  treating it as a boolean.
* Reorganize manipulation of props and holds nvlists to improve
  clarity.
* There's no need to initialize the snapname buffer with zeros, we're
  immediately overwriting it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:02:33 -08:00
Ryan Moeller 2b6b7111f4 libzfs_sendrecv: Style pass on send_iterate_prop
* Add a high level comment.
* Move locals closer to point of use.
* Use fnv* routines rather than explicit verification of success.
* Factor out duplicated code by introducing isspacelimit to clarify
  behavior.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:02:25 -08:00
Ryan Moeller dd59c422d3 libzfs_sendrecv: Style pass on send_iterate_snap
* Add a high level comment.
* Use local variables to reduce line wrapping.
* Remove extra braces and insert space for clarity.
* Assert precondition that the dataset name contains '@' for sanity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:02:13 -08:00
Ryan Moeller f2b36b2db0 libzfs_sendrecv: Fix leaked holds nvlist
There is no need to allocate a holds nvlist.  lzc_get_holds does that
for us.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:02:05 -08:00
Ryan Moeller af2b1fbda6 libzfs_sendrecv: Avoid extra avl_find
avl_add does avl_find internally, then avl_insert.  We're already doing
the avl_find, so using avl_insert directly avoids repeating the search.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:01:56 -08:00
Ryan Moeller 1488e822f2 libzfs_sendrecv: Simplify out guid temporary
De-clutter the clode and make it clear the guid is only used here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:01:48 -08:00
Ryan Moeller 01a0039ac9 libzfs_sendrecv: Use size_t for payload_len
We won't be passing a negative value here, so make it clear.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12967
2022-02-01 17:01:21 -08:00
наб 3c617c7921 libzfs: mount: don't leak mnt_param_t if mnt_func fails
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:57:16 -08:00
наб 61b5b27853 libtpool: remove useless CSTYLED
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:57:09 -08:00
наб 1ef686905b linux: libspl: zone: () -> (void)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:57:00 -08:00
наб a401a2c54a zpool: misc. cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:56:52 -08:00
наб 0481eabfa1 libzfs: share_proto: constify, staticify
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:56:40 -08:00
наб bde0f7d4c6 zpool: main: print hostids with PRIx64
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:56:31 -08:00
наб 0f7a0cc7c2 libzfs: const correctness
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:56:18 -08:00
наб bf12053cb2 zpool: main: const correctness, use time_t for time
The cast will explode on 32-bit big-endian architectures

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12968
2022-02-01 16:55:11 -08:00
Mark Johnston fdcb79b52e spl: Don't check FreeBSD rwlocks for double initialization (#13019)
This checking breaks KMSAN since it effectively loads from uninitialized
memory to see if the lock is already initialized.  This happens in
dnode_cons() for example.  This checking is not very useful, partly due
to UMA's memory trashing, and is already disabled for mutexes.  Make
mutexes and rwlocks consistent: remove double-initialization checking
for rwlocks, and pass SX_NEW to disable the same checking in
lock_init().

No functional change intended, this affects only debug builds.

As a side note, kmem cache constructors/destructors are implemented
suboptimally on FreeBSD.  FreeBSD's slab allocator, UMA, supports two
pairs of constructors/destructors: ctor/dtor and init/fini.  The former
are called upon every allocation and free of an item, while the latter
are called when an item is imported or released from a zone,
respectively.  That is, when a slab is allocated to a particular cache,
it is subdivided into items, and init is called on each.  fini is called
when the slab is being prepared to be freed back to the system.  The
intent is for them to initialize static fields such as locks, which
do not need to be initialized upon each allocation of an item.

In illumos, kmem_cache constructors/destructors correspond to UMA's
init/fini callbacks.  However, in the SPL they are implemented as UMA
ctor/dtors, meaning that they get called far more often than necessary.
This may be difficult to fix, since new code may assume the kmem cache
ctor/dtors are in fact called upon each allocation/free, and there
doesn't seem to be a clear way to implement the intended semantics on
Linux.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #13019
2022-01-31 10:58:45 -08:00
наб 73e972af7a ZTS: explicitly strip whitespace for broken wc(1) implementations
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13032
2022-01-28 16:59:52 -08:00
наб 7633c0aedd libzfs: sendrecv: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12906
2022-01-28 08:34:48 -07:00
наб c70bb2f610 Replace *CTASSERT() with _Static_assert()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12993
2022-01-26 11:38:52 -08:00
наб 7ada752a93 Clean up CSTYLEDs
69 CSTYLED BEGINs remain, appx. 30 of which can be removed if cstyle(1)
had a useful policy regarding
  CALL(ARG1,
  	ARG2,
  	ARG3);
above 2 lines. As it stands, it spits out *both*
  sysctl_os.c: 385: continuation line should be indented by 4 spaces
  sysctl_os.c: 385: indent by spaces instead of tabs
which is very cool

Another >10 could be fixed by removing "ulong" &al. handling.
I don't foresee anyone actually using it intentionally
(does it even exist in modern headers? why did it in the first place?).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12993
2022-01-26 11:38:52 -08:00
наб a3fecf4f10 icp: asm_linkage.h: clean out unused bits, CSTYLED
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12993
2022-01-26 11:38:28 -08:00
наб be93512152 libspl: include: ia32: remove
It hosts only asm_linkage.h, which is entirely unused,
and has slightly diverged from the one that's actually used
(module/icp/include/sys/ia32/asm_linkage.h)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12993
2022-01-26 11:37:45 -08:00
наб 7467281594 tests: simplify check_bg_procs_limit_num(), inline into setup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12979
2022-01-26 11:30:20 -08:00
наб d9fdba124d tests: prune remaining xargs(1), add missing zfs-project -c0 note
-c0 suppresses diagnoses ‒ it's not just -c but with NULs; cf.
  http://build.zfsonlinux.org/builders/Debian%2010%20x86_64%20%28TEST%29/builds/10605/steps/shell_4/logs/log
search for the second "zfs project -s -p" instance

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12979
2022-01-26 11:30:09 -08:00
наб e0c5a48b3f tests: simplify find_vfstab_dev()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12979
2022-01-26 11:30:03 -08:00
наб a67a187335 Makefile: simplify away xargs(1)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12979
2022-01-26 11:29:58 -08:00
наб c5d8cd63ae module: Makefile: simplify clean and install jobs
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12979
2022-01-26 11:29:23 -08:00
Ryan Moeller 3158c2e3cb FreeBSD: Fix zvol_cdev_open locking
First open locking changes were correctly applied to zvol_geom_open but
incorrectly applied to zvol_cdev_open, causing spa_namespace_lock to be
held indefinitely.

Make the first open locking in zvol_cdev_open match zvol_geom_open.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #13016
2022-01-26 11:23:39 -08:00
ColMelvin 8d6a598a20 RPM: Add missing BuildRequires for PAM component
When the optional PAM binaries are included in a build, ./configure will
look for security/pam_modules.h and - if it doesn't find it - recommend
the user install `libpam0g-dev`.  On Red Hat systems, `pam-devel` is the
package that supplies this requirement; `libpam0g-dev` does not exist.

By encoding this requirement in the spec file, we give packagers more
appropriate (and timely) recommendations for completing the build.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Closes #13001
2022-01-25 13:14:43 -08:00
Finix1979 a3fbe2b942 Linux <4.8 compat: submit_bio() rw arg
When using the two argument version of submit_bio() in kernel's prior
to 4.8 the first argument should be specified.  It's used by block
dump to report the bio direction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Finix Yan <yancw@info2soft.com>
Closes #13006
2022-01-25 13:12:49 -08:00
наб 5dccd7e8f3 Linux 5.17 compat: PDE_DATA() renamed to pde_data()
Upstream commit 359745d78351c6f5442435f81549f0207ece28aa
("proc: remove PDE_DATA() completely")

Link: https://lore.kernel.org/all/20211124081956.87711-2-songmuchun@bytedance.com/T/#u

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13004
Closes #12989
2022-01-25 12:53:00 -08:00
наб a46237106c Linux 5.17 compat: dequeue_signal() takes a 4th argument
Linux 5.17's dequeue_signal() takes an additional enum pid_type *
output argument

Upstream commit 5768d8906bc23d512b1a736c1e198aa833a6daa4
("signal: Requeue signals in the appropriate queue")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12989
2022-01-25 12:52:51 -08:00
наб a9856574cf Linux 5.17 compat: detect complete_and_exit() rename
Linux 5.17 sees a rename from complete_and_exit()
to kthread complete_and_exit()

Upstream commit cead18552660702a4a46f58e65188fe5f36e9dfe
("exit: Rename complete_and_exit to kthread_complete_and_exit")

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12989
2022-01-25 12:52:12 -08:00
наб a9e2788ffe libspl: cast to uintptr_t instead of !!ing
This led to these two warning types:
  debug.h:139:67: warning: the address of ‘ARC_anon’
  will always evaluate as ‘true’ [-Waddress]
    139 | #define ASSERT3P(x, y, z)
              ((void) sizeof (!!(x)), (void) sizeof (!!(z)))
        |                                               ^
  arc.c:1591:2: note: in expansion of macro ‘ASSERT3P’
   1591 |  ASSERT3P(hdr->b_l1hdr.b_state, ==, arc_anon);
        |  ^~~~~~~~
and
  arc.h:66:44: warning: ‘<<’ in boolean context,
  did you mean ‘<’? [-Wint-in-bool-context]
     66 | #define HDR_GET_LSIZE(hdr)
              ((hdr)->b_lsize << SPA_MINBLOCKSHIFT)
  debug.h:138:46: note: in definition of macro ‘ASSERT3U’
    138 | #define ASSERT3U(x, y, z)
              ((void) sizeof (!!(x)), (void) sizeof (!!(z)))
        |                        ^
  arc.c:1760:12: note: in expansion of macro ‘HDR_GET_LSIZE’
   1760 |   ASSERT3U(HDR_GET_LSIZE(hdr), !=, 0);
        |            ^~~~~~~~~~~~~

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13009
2022-01-24 17:05:42 -08:00
Rich Ercolani cd26b217dc Add explicit timeout to test step
If we die from timeout of the whole GH action run, we don't run the
collect step afterward, which can make it hard to investigate the
timeout.

If we timeout first in the test action, though, it qualifies as
failure, and collects appropriately.

(330 minutes seems like an acceptable tradeoff between the 6h
timeout by default on the action and the 4h and change "functional"
usually takes.)

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12999
2022-01-24 15:54:52 -08:00
Rich Ercolani 4372e96f4b Add support for FALLOC_FL_ZERO_RANGE
For us, I think it's always just FALLOC_FL_PUNCH_HOLE with a fake
mustache on.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12975
2022-01-24 12:59:25 -08:00
Rich Ercolani 299fbf75ec Linux 5.16 compat: Added mapping for iov_iter_fault_in_readable
Linux decided to rename this for some reason. At some point, we
should probably invert this mapping, but for now...

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12975
2022-01-24 12:59:09 -08:00
Rich Ercolani 12fa250d84 Linux 5.16 compat: Added add_disk check for return
add_disk went from void to must-check int return.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12975
2022-01-24 12:59:01 -08:00
Rich Ercolani e86f88aa0b Linux 5.16 compat: Check slab.h for kvmalloc
As it says on the tin - the folio work moved a bunch out of mm.h.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12975
2022-01-24 12:57:50 -08:00
наб 17b2ae0b24 Fix test-runner on FreeBSD
CLOCK_MONOTONIC_RAW is only a thing on Linux and macOS. I'm not
actually sure why the previous hardcoding of a constant didn't
error out, but when we removed it, it sure does now.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12995
2022-01-21 15:37:46 -08:00
Mark Johnston 063daa8350 Fix handling of errors from dmu_write_uio_dbuf() on FreeBSD
FreeBSD's implementation of zfs_uio_fault_move() returns EFAULT when a
page fault occurs while copying data in or out of user buffers.  The VFS
treats such errors specially and will retry the I/O operation (which may
have made some partial progress).

When the FreeBSD and Linux implementations of zfs_write() were merged,
the handling of errors from dmu_write_uio_dbuf() changed such that
EFAULT is not handled as a partial write.  For example, when appending
to a file, the z_size field of the znode is not updated after a partial
write resulting in EFAULT.

Restore the old handling of errors from dmu_write_uio_dbuf() to fix
this.  This should have no impact on Linux, which has special handling
for EFAULT already.

Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12964
2022-01-21 11:54:05 -08:00
George Amanakis 63a26454ba Introduce a flag to skip comparing the local mac when raw sending
Raw receiving a snapshot back to the originating dataset is currently
impossible because of user accounting being present in the originating
dataset.

One solution would be resetting user accounting when raw receiving on
the receiving dataset. However, to recalculate it we would have to dirty
all dnodes, which may not be preferable on big datasets.

Instead, we rely on the os_phys flag
OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is
incomplete when raw receiving. Thus, on the next mount of the receiving
dataset the local mac protecting user accounting is zeroed out.
The flag is then cleared when user accounting of the raw received
snapshot is calculated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12981 
Closes #10523
Closes #11221
Closes #11294
Closes #12594
Issue #11300
2022-01-21 11:41:17 -08:00
Mark Johnston 6e2a59181e Avoid memory allocations in the ARC eviction thread
When the eviction thread goes to shrink an ARC state, it allocates a set
of marker buffers used to hold its place in the state's sublists.

This can be problematic in low memory conditions, since
1) the allocation can be substantial, as we allocate NCPU markers;
2) on at least FreeBSD, page reclamation can block in
   arc_wait_for_eviction()

In particular, in stress tests it's possible to hit a deadlock on
FreeBSD when the number of free pages is very low, wherein the system is
waiting for the page daemon to reclaim memory, the page daemon is
waiting for the ARC eviction thread to finish, and the ARC eviction
thread is blocked waiting for more memory.

Try to reduce the likelihood of such deadlocks by pre-allocating markers
for the eviction thread at ARC initialization time.  When evicting
buffers from an ARC state, check to see if the current thread is the ARC
eviction thread, and use the pre-allocated markers for that purpose
rather than dynamically allocating them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12985
2022-01-21 10:28:13 -08:00
наб bc40713a8f libspl: ASSERT*: !! for sizeof
sizeof(bitfield.member) is invalid, and this shows up in some FreeBSD
build configurations: work around this by !!ing ‒
this makes the sizeof target the ! result type (_Bool), instead

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Fixes: 42aaf0e ("libspl: ASSERT*: mark arguments as used")
Closes #12984
Closes #12986
2022-01-21 10:20:11 -08:00
Paul Zuchowski 5a4d282f55 Fix problem with zdb -d
zdb -d <pool>/<objset ID> does not work when
other command line arguments are included i.e.
zdb -U <cachefile> -d <pool>/<objset ID>
This change fixes the command line parsing
to handle this situation.  Also fix issue
where zdb -r <dataset> <file> does not handle
the root <dataset> of the pool. Introduce -N
option to force <objset ID> to be interpreted
as a numeric objsetID.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #12845
Closes #12944
2022-01-20 10:28:55 -07:00
наб e1c720de7d libefi: remove efi_type()
All it is right now is some #if 0ed Solaris code that returns ENOSYS,
and is only applicable for the Solaris blockdev layer.
In the Illumos gate, there's a single user: rmformat(1);
I recommend a read of the manual as a blast from the past, but, well

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12844
Closes #12969
2022-01-18 14:40:43 -08:00
наб 18168da727 module/*.ko: prune .data, global .rodata
Evaluated every variable that lives in .data (and globals in .rodata)
in the kernel modules, and constified/eliminated/localised them
appropriately. This means that all read-only data is now actually
read-only data, and, if possible, at file scope. A lot of previously-
global-symbols became inlinable (and inlined!) constants. Probably
not in a big Wowee Performance Moment, but hey.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12899
2022-01-14 15:37:55 -08:00
Brian Behlendorf 7adc190098 Clarify failmode=wait documentation
Nowhere in the description of the failmode property does it
clearly state how to bring a suspended pool back online.
Add a few words to property description and the zpool-clear(8)
man page.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12907
Closes #9395
2022-01-14 15:30:07 -08:00
Ryan Moeller 5a57d6f73b FreeBSD: Touch up comments in zvol_os
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12934
2022-01-14 12:43:31 -08:00
Ryan Moeller 020545a95d FreeBSD: Fix zvol_*_open() locking
These are the changes for FreeBSD corresponding to the changes made for
Linux in #12863, see that PR for details.

Changes from #12863 are applied for zvol_geom_open and zvol_cdev_open
on FreeBSD.  This also adds a check for the zvol dying which we had
in zvol_geom_open but was missing in zvol_cdev_open.  The check causes
the open to fail early with ENXIO when we are in the middle of changing
volmode.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12934
2022-01-14 12:43:05 -08:00
Ryan Moeller 69c5e694a1 FreeBSD: Fix leaked strings in libspl mnttab
The FreeBSD implementations of various libspl functions for getting
mounted device information were found to leak several strings which
were being allocated in statfs2mnttab but never freed.

The Solaris getmntany(3C) and related interfaces are expected to return
strings residing in static buffers that need to be copied rather than
freed by the caller.

Use static thread-local storage to stash the mnttab structure strings
from FreeBSD's statfs info rather than strings allocated on the heap by
strdup(3).

While here, remove some stray commented out lines.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12961
2022-01-14 12:38:32 -08:00
наб 12bd322dde man: zfs.4: miscellaneous cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12941
2022-01-13 11:09:03 -08:00
наб 4737a9eb70 linux: libzfs: mount: fix uninitialised flags
They're later |=d with constants, but never reset

Caught by valgrind while investigating
https://github.com/openzfs/zfs/pull/12928#issuecomment-1007496550

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12954
2022-01-13 11:07:54 -08:00
Damian Szuberski 836136961a Add ShellCheck's --enable=all inside scripts/
Strengthen static code analysis for shell scripts.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12914
2022-01-13 11:09:19 -07:00
Damian Szuberski 8a7c4efd3c Removed Python 2 and Python 3.5- support
Deprecation of Python versions below 3.6 gives opportunity to unify the
build and install requirements for OpenZFS packages. The minimal
supported Python version is 3.6 as this is the most recent Python
package CentOS/RHEL 7 users can get.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12925
2022-01-13 09:51:12 -07:00
Mark Maybee da9c6c0333 Remove VERIFY() in vdev_props_set_sync()
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Maybee <mark.maybee@delphix.com>
Closes #12951
2022-01-12 16:15:30 -08:00
Rich Ercolani 63f4bfd6ac lz4: Cherrypick fix for CVE-2021-3520
There should be no risk of us accidentally hitting this since
we'd need maliciously malformed data to wind up in the pipeline,
or a very unfortunate random bit flip at exactly the right moment.
Still since we can handle it we should.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12947
2022-01-12 16:14:36 -08:00
Rich Ercolani d6c1bbdd65 Updated the lz4 decompressor
As an experiment, I stole the lz4 decompressor from
upstream lz4 (1.9.3), and landed it.

Feedback suggested that keeping the vendor lz4 code isolated and
unlinted was probably reasonable, so I lobbed it into its own file.

It also seemed reasonable to put the mostly-untouched* code into
lz4.c proper, and relegate the integrated and ZFS-specific code to
lz4_zfs.c.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12805
2022-01-07 10:36:49 -08:00
Ryan Hirasaki 862d5dfc84 README: Update OpenZFS website url
This change is to first replace the OpenZFS website in the README to
point to openzfs.org as this is what open-zfs.org redirects to.
Along with replacing the URL, the protocol is also upgraded
from http to https.

These changes should prevent web browsers such as Firefox from
complaining about visiting a http site, if the proper security
settings are enabled, when it will still end up on a https page
after the redirect.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Hirasaki <ryanhirasaki@gmail.com>
Closes #12939
2022-01-06 16:25:01 -08:00
Tino Reichardt a798b485ae Remove sha1 hashing from OpenZFS, it's not used anywhere.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12895
Closes #12902
2022-01-06 16:16:28 -08:00
наб 3e310f099d module: icp: remove solaris crypto and cryptoadm ioctl definitions
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12895
Closes #12902
2022-01-06 16:16:19 -08:00
наб 5c8389a8cb module: icp: rip out the Solaris loadable module architecture
After progressively folding away null cases, it turns out there's
/literally/ nothing there, even if some things are part of the
Solaris SPARC DDI/DKI or the seventeen module types (some doubled for
32-bit userland), or the entire modctl syscall definition.
Nothing.

Initialisation is handled in illumos-crypto.c,
which calls all the initialisers directly

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12895
Closes #12902
2022-01-06 16:14:04 -08:00
Damian Szuberski c1d3be19d7 Add ShellCheck's --enable=all inside cmd/
The only exception is `cmd/vdev_id/vdev_id` which might be a subject of
refactoring (see #12084)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12912
2022-01-06 16:07:54 -08:00
Christian Schwarz a8f27ec6c5 l2arc_write_buffers: remove redundant asserts
Probably introduced inadvertently in b525630 (Native Encryption).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes #12935
2022-01-06 14:39:22 -08:00
Damian Szuberski ae66d3aa90 Add ShellCheck's --enable=all inside etc/
Strengthen static code analysis for shell scripts.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12913
2022-01-06 14:36:04 -08:00
наб ccc421ec39 module: Makefile: flatten subdir loop, use $PWD instead of pwd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12899
2022-01-06 14:29:41 -08:00
наб 1add1a5b3c FreeBSD: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12899
2022-01-06 12:46:49 -08:00
наб f788aaeb0e config: check for parallel(1), use it for cstyle
Before:
$ time make cstyle
real    0m23.118s
user    0m23.002s
sys     0m0.114s

After:
$ time make cstyle
real    0m4.577s
user    0m31.487s
sys     0m0.699s

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12899
2022-01-06 12:46:42 -08:00
наб 95aab0b7d2 libfetch: unquote @LIBFETCH_SONAME@ subst
@LIBFETCH_SONAME@ is no longer quoted. The C define still is.

Ref: 153f7c9f72
Ref: https://github.com/openzfs/zfs/pull/12835#discussion_r776833743
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12922
2022-01-06 11:26:40 -08:00
наб 7c2eb1c875 zvol: remove unused variable
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12917
2022-01-06 11:20:13 -08:00
наб c25e639f2b fm: remove unused variables
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12917
2022-01-06 11:20:13 -08:00
наб 7c41df4c77 zvol: remove unused variable
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12917
2022-01-06 11:20:06 -08:00
наб 43dbf88178 FreeBSD: vfsops: use setgen for error case
Fix from https://github.com/openzfs/zfs/pull/12844#discussion_r774179413

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12905
2022-01-06 11:15:08 -08:00
Paul Dagnelie 399b98198a Revert "zfs list: Allow more fields in ZFS_ITER_SIMPLE mode"
This reverts commit f6a0dac84a.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12938
2022-01-06 11:12:53 -08:00
chrisrd 0175272f64 man: speling
Fix spelling.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #12911
2022-01-06 11:00:01 -08:00
Allan Jude 7454275a53 ZTS: normalize on use of sync_pool and sync_all_pools
- Replaces use of manual `zpool sync`
- Don't use `log_must sync_pool` as `sync_pool` uses it internally
- Replace many (but not all) uses of `sync` with `sync_pool`

This makes the tests more consistent, and makes searching easier.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #12894
2022-01-06 10:57:09 -08:00
Manoj Joseph 6b2e32019e Long opts for zdb
This change introduces long options for zdb. It updates the usage
message as well to include the long options.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Manoj Joseph <manoj.joseph@delphix.com>
Closes #12818
2022-01-06 10:54:32 -08:00
chrisrd 2a149775b4 zfs_prune: reset sc.nr_to_scan
sc.nr_to_scan is an input to super_cache_clean (via
shrinker->scan_objects), used to set the number of objects to scan
in the various caches. However super_cache_scan also modifies
sc.nr_to_scan, so when used in a loop we need to reset
sc.nr_to_scan back to our desired nr_to_scan for the next
iteration.

Issue discovered and solution suggested by
Tenzin Lhakhang @tlhakhan.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Issue #12433
Closes #12908
2022-01-04 17:07:33 -08:00
Brian Behlendorf 3c80e0742a Verify dRAID empty sectors
Verify that all empty sectors are zero filled before using them to
calculate parity.  Failure to do so can result in incorrect parity
columns being generated and written to disk if the contents of an
empty sector are non-zero.  This was possible because the checksum
only protects the data portions of the buffer, not the empty sector
padding.

This issue has been addressed by updating raidz_parity_verify() to
check that all dRAID empty sectors are zero filled.  Any sectors
which are non-zero will be fixed, repair IO issued, and a checksum
error logged.  They can then be safely used to verify the parity.

This specific type of damage is unlikely to occur since it requires
a disk to have silently returned bad data, for an empty sector, while
performing a scrub.  However, if a pool were to have been damaged
in this way, scrubbing the pool with this change applied will repair
both the empty sector and parity columns as long as the data checksum
is valid.  Checksum errors will be reported in the `zpool status`
output for any repairs which are made.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12857
2022-01-04 16:46:32 -08:00
наб 1135d0a5ff FreeBSD: fix unpropagated error
When performing I/O on FreeBSD using a file based vdev ensure all
errors encountered when reading/writing are propagated through the
zio pipeline.  

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12904
2021-12-23 11:39:29 -08:00
наб 18e4f67960 module: icp: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 14e4e3cb9f module: zfs: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 868998220e module: zfs: freebsd: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 66cd33e09b module: zfs: linux: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб dac85132c6 module: zcommon: zprop: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 2447a1e59c module: avl: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 1f182103aa libzfs: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 0cad373e5c libnvpair: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 9028f99a79 libzutil: import: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 56050599c9 libzpool: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 57bff80ebf libshare: nfs: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб bfc1789703 libzfs_core: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 7fe47a2386 libefi: efi_type: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб c2f94afa0e include: dmu.h: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:47 -08:00
наб 855e49e881 module: zfs: vdev: shim out vdev_indirect_mapping_verify()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:41 -08:00
наб ce767d69b0 module: zfs: vdev: shim out vdev_indirect_births_verify()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:42:29 -08:00
наб 36542b065d module: zfs: spa: shim out vdev_count_verify_zaps()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:36:50 -08:00
наб 2c1988e96f module: zfs: multilist: shim out multilist_d2l()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:36:45 -08:00
наб 89495a427f module: zfs: dsl: pool: shim out dsl_early_sync_task_verify()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:36:36 -08:00
наб 16a32ce402 module: zfs: dnode: use debug-only in debug mode only
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:36:31 -08:00
наб 83719bd68c include: sys/arc.h: shim out arc_referenced()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:36:26 -08:00
наб 29d033c6b5 include: qat.h: mark unused macro arguments as used
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:36:21 -08:00
наб da5f5ec508 libspl: kmem.h: mark unused kmem_*() macro arguments used
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:36:12 -08:00
наб 42aaf0e7c4 libspl: ASSERT*: mark arguments as used
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12844
2021-12-23 09:35:47 -08:00
Brian Behlendorf d6885f3209 ZTS: Fix enospc_002_pos.ksh again
This is a follow up commit for e03a41a60 which aimed to resolve
this same test failure.  The core "problem" here is that it takes
very little space to perform a clone/snapshot/bookmark, which
means if we want these commands to reliably fail the pool must
truely have exhausted all free space.

This commit increases the number of fill iterations to try and
consume every block which we can.  This still can't guarantee
the clone/snapshot/bookmark will fail, but it significantly
improves the odds.  The exception was kept since it's still
not a sure thing.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12903
2021-12-23 09:21:40 -08:00
Alexander Motin 462217d1c2 Reduce number of arc_prune threads
On FreeBSD vnode reclamation is single-threaded, protected by single
global lock.  Linux seems to be able to use a thread per mount point,
but at this time it creates more harm than good.

Reduce number of threads to 1, adding tunable in case somebody wants
to try more.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12896
Issue #9966
2021-12-22 17:07:13 -08:00
Brian Behlendorf d2f374c3f2 ZTS: Fix rollback_003_pos.ksh
Under Linux when rolling back a mounted filesystem negative dentries
may not be dropped from the cache.  This can result in an ENOENT
being incorrectly returned on first access.  Issuing a `df` before
the unmount results in the negative dentries being invalidated and
side steps the issue.

This is solely a workaround for the test case on Linux and not
correct behavior.  The core issue of invalidating negative dentries
needs to be handled with a kernel side change.  This is being
tracked as issue #6143.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12898 
Issue #6143
2021-12-22 11:05:07 -08:00
Brian Behlendorf 9ba5d8d204 ZTS: Fix refreserv_raidz.ksh
The rerefreserv_raidz test was failing on Linux because the sync being
issued doesn't guarantee a pool sync.  Switch to using the sync_pool
function and remove the ZTS exception for Linux.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12897
2021-12-22 09:37:27 -08:00
Georgy Yakovlev 2f411512be zfs-test/mmap_seek: fix build on musl
The build on musl needs linux/fs.h for SEEK_DATA and friends,
and sys/sysmacros.h for P2ROUNDUP.  Add the needed headers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #12891
2021-12-21 16:44:18 -08:00
shodanshok bfb3b0d77a zed: send notification email by default
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #12806
2021-12-21 16:24:05 -08:00
наб 153f7c9f72 contrib/initrd hooks: properly quote @LIBFETCH_SONAME@
Bullseye shellcheck picks these up as SC2140, and it's right!
@LIBFETCH_SONAME@ is already quoted, so dracut had
  "$d/"libcurl.so.4""
and i-t had
  ""libcurl.so.4""

Partially reverts 34eef3e9a7 (#12760),
which broke this

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
наб 1f63c64559 contrib/pam_zfs_key: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
наб cf8d708b7a tests: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
наб 1fedd01247 zfs: iter: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
наб d8f32c0eb7 ztest: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
наб 058cd62432 zdb: il: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
наб 6fa118b857 zdb: main: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:12 -08:00
наб 724e6c68e5 zed: agents: fmd_api: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб d6fccfe62e zed: agents: zfs_retire: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб 491165c079 zed: agents: zfs_mod: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб e265a082eb zed: agents: zfs_diagnosis: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб 16529f305a zed: agents: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб e2a59aa701 zed: exec: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб 008f30c730 zed: main: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб ab860757f5 zpool: vdev_os: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб e40ca391f8 zpool: main: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб b487738d34 zpool: iter: zpool_compare: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб 964e6a497b zinject: cancel_one_handler: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб 63b6c3e1d1 zhack: space_delta_cb: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб 876b60dcfb raidz_test: init_rand: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
Brian Behlendorf ff1acbac30 ZTS: speed up rsend tests
With some minor tweaks several of rsend tests can be sped up
considerably without significantly reducing test coverage.

* send-c_verify_ratio:  ~120s -> ~60s
* send_realloc_*_files: ~330s -> ~65s

For the send_realloc* tests this also has the advantage of removing
(most of) the linux/freebsd conditional logic.  Note that for this
test more passes, and thus more incremental send/recvs, are preferable
to a larger number of files.

Total run time of the rsend test group was reduced from roughly 20 to
11 minutes in an environment similar to what's used by the CI.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12876
2021-12-21 11:12:38 -08:00
Brian Behlendorf 7b5d783a46 ZTS: rsend_007_pos failures
The rsend_007_pos test reliably fails on Linux in the cleanup
function.  This is caused by an unmount error when attempting to
recursively destroy the newly received datasets.  Invoking `df`
prior to the `zfs destroy` interestingly avoids the unmont error.

Why this should matter is unclear and should be investigated.
However, this minor tweak may allow us to remove the ZTS rsend
exceptions.  The subsequent rsend_010_pos and rsend_011_pos
failures were a result of this initial failure.  The other
"maybe" failures I was unable to reproduce and have not been
recently observed in the master branch.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5665
Closes #6086
Closes #6087
Closes #6446
Closes #12876
2021-12-21 11:11:07 -08:00
Martin Matuška 20f5c5b912 FreeBSD: fix world build after 143476ce8
Do not redefine the fallthrough macro when building with libcpp.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #12880
2021-12-20 14:28:43 -08:00
Philipp Riederer 8623bd962d Fix error propagation from lzc_send_redacted
Any error from lzc_send_redacted is overwritten by the error of
send_conclusion_record; skip writing the conclusion record if there
was an earlier error.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Philipp Riederer <philipp@riederer.email>
Closes #12766
2021-12-20 10:50:46 -08:00
Ryan Moeller 3fa5266d72 Linux: Implement FS_IOC_GETVERSION
Provide access to file generation number on Linux.

Add test coverage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12856
2021-12-17 16:18:37 -08:00
наб 82e414f1b2 libshare: nfs: always try to mkdir()
This also works out to one syscall if the directory exists,
but is one syscall shorter if it doesn't.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:54:25 -08:00
наб 1e78b4eee0 libshare: nfs: set export file 644
The shares are publicly known anyway and can be interrogated by any
user, so this is a debugging aid more than anything.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:54:14 -08:00
наб 9d4a44f0b8 linux/libshare: nfs: don't needlessly strdup() hostspec
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:54:09 -08:00
наб 605e03e51a libshare: nfs: share nfs_is_shared()
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:54:04 -08:00
наб 4e225e7316 libshare: nfs: share nfs_copy_entries()
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:54:00 -08:00
наб c53f2e9b50 libshare: nfs: open temporary file once
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:53:54 -08:00
наб f50697f95b libshare: nfs: retry flock() when interrupted
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:53:49 -08:00
наб bdf6464c6c freebsd/libshare: nfs: don't send SIGHUP to all processes
pidfile_open() sets *pidptr to -1 if the process currently holding
the lock is between pidfile_open() and pidfile_write(),
the subsequent kill(mountdpid) would potentially SIGHUP all
non-system processes except init: just sleep for half a millisecond
and try again in that case

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:53:25 -08:00
наб cf65c33c9c zfs-share.8: document -l flag
Description stolen from zfs-mount.8

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12067
2021-12-17 12:52:26 -08:00
наб 3a661613df contrib/initrd: systemd-ask-password --no-tty before argument
In systemd 249 (sid), sd-a-p processes its arguments in getopt + mode,
so "systemd-ask-password zupa --no-tty" prompts for "zupa --no-tty",
not "zupa" not on the tty, as expected (bullseye, 247).

Ref: https://github.com/systemd/systemd/commit/4b1c842d95bfd6ab352ade1a4655f9e512f35185
Ref: https://github.com/systemd/systemd/pull/19806
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12870
2021-12-17 12:44:23 -08:00
Rich Ercolani f68b9c81c8 Workaround Debian's fake System.map behavior
Debian ships fake System.map files by default, leading to the
invocation of depmod with them to flood you with errors about
missing symbols.

Let's notice and not do that.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12862
2021-12-17 12:43:13 -08:00
Brian Behlendorf eecd3f1a21 ZTS: alloc_class.ksh must wait for the process to exit
The alloc_class_* tests may fail on Linux with an EBUSY error if
`zfs destroy` is run before the `dd` process has had a chance to
terminate.  Wait on the pid after the `kill -9` to make sure.

When testing I didn't observe any failures for the alloc_class
tests.  Remove them from the exceptions list, the CI was used to
verify the tests pass on all platforms.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12873
2021-12-17 12:40:34 -08:00
Rich Ercolani 1a79f7e860 ZTS: Avoid piping send directly to /dev/null
Unfortunately, #11445 means while we fail gracefully now, we still
fail, unless people want to implement a complex workaround just to
support /dev/null.

So let's just use the cheap workaround in a test for now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12872
2021-12-17 12:39:10 -08:00
Tony Hutter 9aa0915f87 ZTS: Fix zpool_reopen_[1-5] on Fedora 35
The zpool_reopen_[1-5] tests are failing Fedora 35 with:

zpool_reopen_001_pos.ksh[64]: log_must[67]: log_pos[270]:
wait_for_resilver_end[98]: wait_for_action: line 71: func: is read only

Renaming 'func' -> 'funct' fixes the issue.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12871
2021-12-17 12:37:21 -08:00
Brian Behlendorf 8a02d01e85 Fix zvol_open() lock inversion
When restructuring the zvol_open() logic for the Linux 5.13 kernel
a lock inversion was accidentally introduced.  In the updated code
the spa_namespace_lock is now taken before the zv_suspend_lock
allowing the following scenario to occur:

    down_read <=== waiting for zv_suspend_lock
    zvol_open <=== holds spa_namespace_lock
    __blkdev_get
    blkdev_get_by_dev
    blkdev_open
    ...

     mutex_lock <== waiting for spa_namespace_lock
     spa_open_common
     spa_open
     dsl_pool_hold
     dmu_objset_hold_flags
     dmu_objset_hold
     dsl_prop_get
     dsl_prop_get_integer
     zvol_create_minor
     dmu_recv_end
     zfs_ioc_recv_impl <=== holds zv_suspend_lock via zvol_suspend()
     zfs_ioc_recv
     ...

This commit resolves the issue by moving the acquisition of the
spa_namespace_lock back to after the zv_suspend_lock which restores
the original ordering.

Additionally, as part of this change the error exit paths were
simplified where possible.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12863
2021-12-17 09:52:13 -08:00
Alan Somers ca1b2bb4b5 FreeBSD: Update argument types for VOP_READDIR
A recent commit to FreeBSD changed the type of
vop_readdir_args.a_cookies to a uint64_t**.  There is no functional
impact to ZFS because ZFS only uses 32-bit cookies, which will be
zero-extended to 64-bits by the existing code.

https://github.com/freebsd/freebsd-src/commit/b214fcceacad6b842545150664bd2695c1c2b34f

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #12874
2021-12-17 09:50:12 -08:00
наб eb51a9d747 zcommon: pre-iterate over sysfs instead of statting every feature
If sufficient memory (<2K, realistically) is available, libzfs_init()
can be significantly shorted by iterating over the correct sysfs
directory before registrations, we can turn 168 stats into 15/18
syscalls (3 opens (6 if built in), 3 fstats, 6 getdentses, and 3
closes), a tenfoldish reduction; this is probably a bit faster, too.

The list is always optional, and registration functions (and one-off
users) can simply pass NULL, which will fall back to the previous
mechanism

Also, don't allocate in zfs_mod_supported_impl, and use use access()
instead of stat(), since existence is really what we care about

Also, fix pre-prop-checking compat in fallback for built-in ZFS

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12089
2021-12-16 16:43:10 -08:00
наб 8fdc6f618c zcommon: *_prop: make all zprop_index_t tables const
They're already static, and there's no point in them being R/W
and living outside .rodata

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12836
2021-12-16 13:26:04 -08:00
Ryan Moeller 92a9e8c618 FreeBSD: Provide correct file generation number
va_seq was actually a thin veil over va_gen, so z_gen is a more
appropriate value than z_seq to populate the field with.

Drop the unnecessary compat obfuscation and provide the correct
file generation number.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@freebsd.org>
Closes #12851
2021-12-16 13:22:15 -08:00
Allan Jude f6a0dac84a zfs list: Allow more fields in ZFS_ITER_SIMPLE mode
If the fields to be listed and sorted by are constrained
to those populated by dsl_dataset_fast_stat(), then
zfs list is much faster, as it does not need to open each
objset and reads its properties.

A previous optimization by Pawel Dawidek
(0cee24064a) took advantage
of this to make listing snapshot names sorted only by name
much faster.

However, it was limited to `-o name -s name`, this work
extends this optimization to work with:
  - name
  - guid
  - createtxg
  - numclones
  - inconsistent
  - redacted
  - origin
and could be further extended to any other properties
supported by dsl_dataset_fast_stat() or similar, that do
not require extra locking or reading from disk.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11080
2021-12-16 11:56:22 -08:00
Georgy Yakovlev 2300621dc7 systemd: add weekly and monthly scrub timers
Timers can be enabled as follows:

systemctl enable zfs-scrub-weekly@rpool.timer --now
systemctl enable zfs-scrub-monthly@datapool.timer --now

Each timer will pull in zfs-scrub@${poolname}.service, which is not
schedule-specific.

Added PERIODIC SCRUB section to zpool-scrub.8.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #12193
2021-12-16 11:47:22 -08:00
наб f291fa658e t/z_diff/socket, zfs: main: fix unused argument warnings, ARGSUSED tags
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12829
2021-12-13 15:50:47 -08:00
наб b7ef2340c2 libzfs: diff: simplify superfluous stdio
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12829
2021-12-13 15:50:38 -08:00
наб 9bdf0c592b libzfs: diff: print_what() can return the symbol => get_what()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12829
2021-12-13 15:50:29 -08:00
наб a72129edcb libzfs: diff: stream_bytes: use fputc, %hho formats chars
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12829
2021-12-13 15:50:20 -08:00
наб 1cfb6ef36e libzfs: zpool_set_vdev_prop: remove unused vprop
Found by clang 14 with -Wunused-but-set-variable

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12829
2021-12-13 15:50:09 -08:00
наб 9e184b7c35 linux: libspl: getmntany: remove unused argument
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12829
2021-12-13 15:49:59 -08:00
наб 344bbc82e7 zfs, libzfs: diff: accept -h/ZFS_DIFF_NO_MANGLE, disabling path escaping
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12829
2021-12-13 15:49:40 -08:00
ogelpre f04b976200 Add init script to load keys
Add new init scripts which allow automatic loading of keys if
keylocation property is set to a URI.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Benedikt Neuffer <ogelpre@itfriend.de>
Closes #11659
Closes #11662
2021-12-12 11:17:14 -08:00
Till Maas 4a5b6ced41 zfs-dkms rpm: Fix scriptlets dependencies
To ensure that the necessary packages are available during the %post and
%preun scriptlets, require them properly.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Till Maas <opensource@till.name>
Closes #12822
Closes #12832
2021-12-12 11:15:25 -08:00
Ryan Moeller 23cee221b7 FreeBSD: Add vop_standard_writecount_nomsync
https://cgit.freebsd.org/src/commit?id=3ffcfa599e29686cf2b3c1a6087408c37acaed78

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12828
2021-12-12 11:13:18 -08:00
Mark Johnston cdf74673bc zfs: Fix a deadlock between page busy and the teardown lock
When rolling back a dataset, ZFS has to purge file data resident in the
system page cache.  To do this, it loops over all vnodes for the
mountpoint and calls vn_pages_remove() to purge pages associated with
the vnode's VM object.  Each page is thus exclusively busied while the
dataset's teardown write lock is held.

When handling a page fault on a mapped ZFS file, FreeBSD's page fault
handler busies newly allocated pages and then uses VOP_GETPAGES to fill
them.  The ZFS getpages VOP acquires the teardown read lock with vnode
pages already busied.  This represents a lock order reversal which can
lead to deadlock.

To break the deadlock, observe that zfs_rezget() need only purge those
pages marked valid, and that pages busied by the page fault handler are,
by definition, invalid.  Furthermore, ZFS pages always transition from
invalid to valid with the teardown lock held, and ZFS never creates
partially valid pages.  Thus, zfs_rezget() can use the new
vn_pages_remove_valid() to skip over pages busied by the fault handler.

PR:		258208
Tested by:	pho
Reviewed by:	avg, sef, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32931

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12828
2021-12-12 11:13:18 -08:00
Ryan Moeller d172264d1c FreeBSD: Catch up with more VFS changes
Unused thread argument was removed from NDINIT*

https://cgit.freebsd.org/src/commit?id=7e1d3eefd410ca0fbae5a217422821244c3eeee4

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12828
2021-12-12 11:13:18 -08:00
наб 510885a84c FreeBSD supports edonr follow up
This chases 269b5dadcf (#12735),
which touched the actual code but didn't fix the comment

Additionally, ignore the name.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12823
2021-12-08 17:01:36 -08:00
Brian Behlendorf 9699e45d57 Linux 5.15 compat: META (#12824)
The final 5.15 kernel is available and has been tested. 

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2021-12-07 15:35:42 -08:00
наб 5ece420f03 contrib/bash_completion.d: fix error spew from __zfs_match_snapshot()
Given:
  /sbin/zfs list filling/a-zvol<TAB> -o space,refratio
The rest of the cmdline gets vored by:
  /sbin/zfs list filling/a-zvolcannot open 'filling/a-zvol':
  operation not applicable to datasets of this type

With -x (fragment):
  + COMPREPLY=($(compgen -W "$(__zfs_match_snapshot)" -- "$cur"))
  +++ __zfs_match_snapshot
  +++ local base_dataset=filling/dziadtop-nowe-duchy
  +++ [[ filling/dziadtop-nowe-duchy != filling/dziadtop-nowe-duchy ]]
  +++ [[ filling/dziadtop-nowe-duchy != '' ]]
  +++ __zfs_list_datasets filling/dziadtop-nowe-duchy
  +++ /sbin/zfs list -H -o name -s name -t filesystem
                     -r filling/dziadtop-nowe-duchy
  +++ tail -n +2
  cannot open 'filling/dziadtop-nowe-duchy':
  operation not applicable to datasets of this type
  +++ echo filling/dziadtop-nowe-duchy
  +++ echo filling/dziadtop-nowe-duchy@
  ++ compgen -W 'filling/dziadtop-nowe-duchy

This properly completes with:
  $ /sbin/zfs list filling/a-zvol<TAB> -o space,refratio
  filling/a-zvol   filling/a-zvol@
  $ /sbin/zfs list filling/a-zvol<cursor> -o space,refratio

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12820
2021-12-07 12:30:10 -08:00
Coleman Kane a78f19d3ec Linux 5.16: Resolve ZSTD_isError symbol collision in Linux kernel
Newer zstd code introduced in the main kernel tree now creates a symbol
collision with ZSTD_isError in our ZSTD code. This change relabels our
implementation with a ZFS-specific symbol name, and undoes some
macro-based micro-optimizations that conflict with the attempt to rename
our internal-use version.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 12:28:22 -08:00
Coleman Kane 1e767532f2 Linux 5.16: The blk-cgroup.h header is where struct blkcg_gq is defined
The definition of struct blkcg_gq was moved into blk-cgroup.h, which is
a header that's been in Linux since 2015. This is used by
vdev_blkg_tryget() in module/os/linux/zfs/vdev_disk.c. Since the kernel
for CentOS 7 and similar-generation releases doesn't have this header,
its inclusion is guarded by a configure test.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 12:28:12 -08:00
Coleman Kane d08b99aca0 Linux 5.16: bio_set_dev is no longer a helper macro
This change adds a confiugre check to determine if bio_set_dev is a
helper macro or not. If not, then the attempt to override its internal
call to bio_associate_blkg(), with a macro definition to our own
version, is no longer possible, as the compiler won't use it when
compiling the new inline function replacement implemented in the header.
This change also creates a new vdev_bio_set_dev() function that performs
the same work, and also performs the work implemented in
vdev_bio_associate_blkg(), as it is the only thing calling that function
in our code. Our custom vdev_bio_associate_blkg() is now only compiled
if the bio_set_dev() is a macro in the Linux headers.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 12:28:00 -08:00
Coleman Kane f6e22561d2 Linux 5.16: type member of iov_iter renamed iter_type
The iov_iter->type member was renamed iov_iter->iter_type. However,
while looking into this, realized that in 2018 a iov_iter_type(*iov)
accessor function was introduced. So if that is present, use it,
otherwise fall back to trying the existing behavior of directly
accessing type from iov_iter.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 12:27:49 -08:00
Coleman Kane 435a451e5c Linux 5.16: block_device_operations->submit_bio now returns void
The return type for the submit_bio member of struct
block_device_operations was changed to no longer return a value.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12819
2021-12-07 12:27:27 -08:00
Paul Dagnelie 376027331d ZFS send/recv with ashift 9->12 leads to data corruption
Improve the ability of zfs send to determine if a block is compressed
or not by using information contained in the blkptr.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12770
2021-12-07 11:27:59 -07:00
Arshad Hussain b6fc42b5e1 Update "tests/README.md"
This patch adds detail section on adding and running
test-case. It also changes markdown number list to
more readeable headers

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #12737
2021-12-07 09:49:25 -07:00
Paul Dagnelie 795075e638 Add const to nvlist functions to properly expose their real behavior
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12728
2021-12-06 18:19:13 -07:00
Brian Behlendorf 14ba514af6 ZTS: import_rewind_device_replaced reliably fails
The import_rewind_device_replaced.ksh test was never entirely reliable
because it depends on MOS data not being overwritten.  The MOS data is
not protected by the snapshot so occasional failures were always
expected.  However, this test is now failing reliably on all platforms
indicating something has changed in the code since the test was marked
"maybe".  Convert the test to a "known" failure until the root cause
is identified and resolved.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12821
2021-12-06 09:45:17 -08:00
Rich Ercolani df42e20ac6 Corrected a case where we could read uninited ABD memory
For my sins, I started running valgrind over ztest to try and fix
that pesky intermittent "zloop dies with malloc errors" problem.

This one seemed exciting enough to merit cutting a PR for before
the rest get polished.

Suggested-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12214
2021-12-03 13:13:21 -08:00
John Wren Kennedy ddc026f59b Strip colons from all test result filenames
The upload artifact functionality in github can't handle colons in
filenames. The current code handles this for files under the most
recent set of results. With the ability to rerun failed tests, now
there can be multiple sets of results, and they all need to be
processed in the same way.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12815
2021-12-01 16:18:45 -08:00
Brian Behlendorf 77e2756de0 Linux 5.13 compat: retry zvol_open() when contended
Due to a possible lock inversion the zvol open call path on Linux
needs to be able to retry in the case where the spa_namespace_lock
cannot be acquired.

For Linux 5.12 an older kernel this was accomplished by returning
-ERESTARTSYS from zvol_open() to request that blkdev_get() drop
the bdev->bd_mutex lock, reaquire it, then call the open callback
again.  However, as of the 5.13 kernel this behavior was removed.

Therefore, for 5.12 and older kernels we preserved the existing
retry logic, but for 5.13 and newer kernels we retry internally in
zvol_open().  This should always succeed except in the case where
a pool's vdev are layed on zvols, in which case it may fail.  To
handle this case vdev_disk_open() has been updated to retry when
opening a device when -ERESTARTSYS is returned.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12301
Closes #12759
2021-12-01 17:07:12 -07:00
John Wren Kennedy 31d2f42b2a Temporarily remove tests from sanity runfile
With the addition of functionality to rerun failing tests, some
tests that fail only sometimes still fail often enough to degrade
the reliability of the sanity runs. Remove them from the runfile
until they reliably pass.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12814
2021-12-01 13:22:52 -08:00
Paul Dagnelie 2320e6eb43 Add zfs-test facility to automatically rerun failing tests
This was a project proposed as part of the Quality theme for the
hackthon for the 2021 OpenZFS Developer Summit. The idea is to improve
the usability of the automated tests that get run when a PR is created
by having failing tests automatically rerun in order to make flaky
tests less impactful.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12740
2021-12-01 10:38:53 -07:00
Attila Fülöp 861dca065e get_key_material: fix style
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12765
2021-11-30 11:54:36 -08:00
Harald van Dijk 85638aa870 get_key_material: skip passphrase validation when loading keys
The restriction that an encryption key must be at least
MIN_PASSPHRASE_LEN characters long make sense when changing the
encryption key, but not when loading: as this restriction is not
enforced in the libraries, it is possible to bypass zfs change-key's
restrictions and end up with a key that becomes impossible to load with
zfs load-key, for example through pam_zfs_key.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Harald van Dijk <harald@gigawatt.nl>
Closes #12765
2021-11-30 11:54:06 -08:00
Attila Fülöp 4234812d1a pam_zfs_key: tests: check if zfs load-key works on short passphrases
The pam_zfs_key pam module does not enforce a minimum password
length while changing the user password and thus the users home
dataset passphrase. To not end up with a dateset `zfs load-key`
can't load the key for, `zfs load-key` should not enforce a minimum
passphrase length. This adds a test for that.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12765
Closes #12651
Closes #12656
2021-11-30 11:52:21 -08:00
Attila Fülöp 307db92823 pam_zfs_key: tests: clean up the generated pam service config file
Remove the generated pam service config file
`/etc/pam.d/pam_zfs_key_test` on test cleanup, since the tests
shouldn't alter system state.

While here, move the pam service config file name into a variable.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12765
2021-11-30 11:51:45 -08:00
jokersus b5c16861e9 Remove REMAKE_INITRD
The option has been deprecated in dkms and will break packaging in
future versions. See https://github.com/dell/dkms/commit/7114c62

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: jokersus <jokersus.cava@gmail.com>
Closes #12781
2021-11-30 11:09:15 -08:00
Brian Behlendorf 05b3eb6d23 Default to zfs_dmu_offset_next_sync=1
Strict hole reporting was previously disabled by default as a
performance optimization.  However, this has lead to confusion
over the expected behavior and a variety of workarounds being
adopted by consumers of ZFS.  Change the default behavior to
always report holes and force the TXG sync.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12746
2021-11-30 10:38:09 -08:00
Rich Ercolani 5dc6fc2b73 Stop segfaulting on unmount error case
After interrupting ZTS runs that errored out, I found that
"zpool export testpool2" was segfaulting.

This seems unnecessary.

Reviewed-by: szubersk <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12804
2021-11-30 10:36:36 -08:00
Pawel Jakub Dawidek 547df81641 Code cleanups
- Allocate ve_search on the stack, so we avoid allocating memory for
  every I/O even if the VDEV cache is disabled.
- Reduce lock scope.
- Avoid locking in vdev_cache_read() when the VDEV cache is disabled.
- Sort file names properly.
- Correct comment.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12749
2021-11-30 10:32:38 -08:00
maxz cfc62062ae Replace wrong occurrences of affect by effect in the man pages
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Max Zettlmeißl <max@zettlmeissl.de>
Closes #12784
2021-11-30 10:28:57 -08:00
Rich Ercolani c5ccbbdcf6 Allow printing special vdev metaslab groups
Sometimes, we'd like to know info about the metaslab groups
on special vdevs too. So let's make -MM do something useful.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12750
2021-11-30 10:26:45 -08:00
Damian Szuberski 34eef3e9a7 Pass --enable=all to shellcheck within contrib/
- Remove `SHELLCHECK_IGNORE` in favor of inline suppressions
  and more general `SHELLCHECK_OPTS`.

- Exclude `SC2250` (turned on by `--enable=all`) globally

- Pass `--enable=all` to shellcheck for scripts in contrib/: it's
  very important to catch errors early in areas that are not easily
  testable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12760
2021-11-30 10:23:10 -08:00
наб 4325de09cd etc/systemd/zfs-mount-generator: serialise, handle keylocation=http[s]://
* etc/systemd/zfs-mount-generator: serialise

The wins for a relatively normal workload are rather slim:
	real	0.02119s/0.00985s=2.15029x
	user	0.02130s/0.00346s=6.15560x
	sys	0.03858s/0.00643s=6.00062x

	wall-total	0.014518s/0.005925s=2.45009x
	wall-init	0.014518s/0.002457s=5.90684x
	wall-real	0.014518s/0.003467s=4.18668x

But this is a big win on machines with a lot of datasets and expensive
forks.

For example, the gain on a VM on my work laptop with 900+ legacy-mount
Docker datasets, the original gains from the C rewrite were
only five-fold:
	real    0.516s/0.102s=5.05882x
	user    0.237s/0.143s=1.65734x
	sys     0.287s/0.100s=2.87x

And this serial variant gains this back there as well:
	real    0.102s/0.008s=12.75x
	user    0.143s/0.007s=20.42857
	sys     0.100s/0.001s=100x

	wall-total	0.09717s/0.00319s=30.40255x
	wall-init	0.00203s/0.00200s=1.015941x
	wall-real	0.09513s/0.00118s=80.02043x

For a total of
	real    0.516s/0.008s=64.5x
	user    0.237s/0.007s=33.85714x
	sys     0.287s/0.001s=287x

Suggested-by: Richard Laager <rlaager@wiktel.com>

* etc/systemd/zfs-mount-generator: pull in network for keylocation=https

Also simplify RequiresMountsFor= handling
Ref: #11956

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12138
2021-11-30 09:29:50 -07:00
Allan Jude 2a673e76a9 Vdev Properties Feature
Add properties, similar to pool properties, to each vdev.
This makes use of the existing per-vdev ZAP that was added as
part of device evacuation/removal.

A large number of read-only properties are exposed,
many of the members of struct vdev_t, that provide useful
statistics.

Adds support for read-only "removing" vdev property.
Adds the "allocating" property that defaults to "on" and
can be set to "off" to prevent future allocations from that
top-level vdev.

Supports user-defined vdev properties.
Includes support for properties.vdev in SYSFS.

Co-authored-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11711
2021-11-30 07:46:25 -07:00
pstef 5f64bf7fde Fix typo in zpool.8
Update zpool.8 to avoid parseltongue.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Piotr P. Stefaniak <pstef@freebsd.org>
Closes #12763
2021-11-29 10:52:42 -08:00
Coleman Kane 75b309a938 Linux 5.16 compat: asm/fpu/xcr.h is new location for xgetbv/xsetbv
Linux 5.16 moved these functions into this new header in commit
1b4fb8545f2b00f2844c4b7619d64d98440a477c. This change adds code to look
for the presence of this header, and include it so that the code using
xgetbv & xsetbv will compile again.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12800
2021-11-29 10:49:33 -08:00
Coleman Kane c0fb44c506 Linux 5.16: wait_on_page_bit() no longer available to modules
Instead, linux/pagemap.h offers a number of folio-specific functions to
be called instead. In this case, module/os/linux/zfs/zfs_vnops_os.c
wants to call wait_on_page_bit(pp, PG_writeback). This gets replaced
with folio_wait_bit(folio_page(pp), PG_writeback). This change modifies
the code to conditionally compile that if configure identifies th
presence of the folio_wait_bit() function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12800
2021-11-29 10:48:52 -08:00
Mark Johnston ded851b2e0 Fix several bugs in the FreeBSD rename VOP implementation
- To avoid a use-after-free, zfsvfs->z_log needs to be loaded after the
  teardown lock is acquired with ZFS_ENTER().
- Avoid leaking vnode locks in zfs_rename_relock() and zfs_rename_()
  when the ZFS_ENTER() macros forces an early return.

Refactor the rename implementation so that ZFS_ENTER() can be used
safely.  As a bonus, this lets us use the ZFS_VERIFY_ZP() macro instead
of open-coding its implementation.

Reported-by: Peter Holm <pho@FreeBSD.org>
Tested-by: Peter Holm <pho@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #12717
2021-11-19 15:26:39 -07:00
Paul Dagnelie f9e39f98a0 Add notes to system_taskq
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12771
2021-11-19 10:02:45 -07:00
Rich Ercolani 269b5dadcf Enable edonr in FreeBSD
The code is integrated, builds fine, runs fine, there's not really
any reason not to.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12735
2021-11-16 12:40:10 -07:00
Martin Matuška b8dcfb2c9f FreeBSD: fix world build after de198f2d9
The inline function vn_flush_cached_data() in vnode.h
must not be compiled when building BASE.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #12743
2021-11-15 09:07:39 -07:00
Damian Szuberski 8ac58c3f56 Fix zfs:AUTO autodetection in initramfs scripts
Don't exit early in find_rootfs() when zpool.bootfs
is set to `zfs:AUTO`.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12658
2021-11-13 08:02:50 -07:00
Pawel Jakub Dawidek ac32854a6e Remove (now unused) td argument from zfs_lookup()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12748
2021-11-12 17:06:44 -08:00
George Amanakis c9d62d1356 Introduce a tunable to exclude special class buffers from L2ARC
Special allocation class or dedup vdevs may have roughly the same
performance as L2ARC vdevs. Introduce a new tunable to exclude those
buffers from being cacheable on L2ARC.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11761 
Closes #12285
2021-11-11 12:52:16 -08:00
наб 420b44488f Remove basename(1). Clean up/shorten some coreutils pipelines
Basenames that remain, in cmd/zed/zed.d/statechange-led.sh:
	dev=$(basename "$(echo "$therest" | awk '{print $(NF-1)}')")
	vdev=$(basename "$ZEVENT_VDEV_PATH")
I don't wanna interfere with #11988

scripts/zfs-tests.sh:
	SINGLETESTFILE=$(basename "$SINGLETEST")
tests/zfs-tests/tests/functional/cli_user/zfs_list/zfs_list.kshlib:
	ACTUAL=$(basename $dataset)
	ACTUAL=$(basename $dataset)
tests/zfs-tests/tests/functional/cli_user/zpool_iostat/
	zpool_iostat_-c_homedir.ksh:
	typeset USER_SCRIPT=$(basename "$USER_SCRIPT_FULL")
tests/zfs-tests/tests/functional/cli_user/zpool_iostat/
	zpool_iostat_-c_searchpath.ksh:
	typeset CMD_1=$(basename "$SCRIPT_1")
	typeset CMD_2=$(basename "$SCRIPT_2")
tests/zfs-tests/tests/functional/cli_user/zpool_status/
	zpool_status_-c_homedir.ksh:
	typeset USER_SCRIPT=$(basename "$USER_SCRIPT_FULL")
tests/zfs-tests/tests/functional/cli_user/zpool_status/
	zpool_status_-c_searchpath.ksh
	typeset CMD_1=$(basename "$SCRIPT_1")
	typeset CMD_2=$(basename "$SCRIPT_2")
tests/zfs-tests/tests/functional/migration/migration.cfg:
	export BNAME=`basename $TESTFILE`
tests/zfs-tests/tests/perf/perf.shlib:
	typeset logbase="$(get_perf_output_dir)/$(basename \
tests/zfs-tests/tests/perf/perf.shlib:
	typeset logbase="$(get_perf_output_dir)/$(basename \

These are potentially Of Directories, where basename is actually
useful

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12652
2021-11-11 13:27:37 -07:00
Fedor Uporov 49d42425d6 Check l2cache vdevs pending list inside the vdev_inuse()
The l2cache device could be added twice because vdev_inuse() does not
check spa_l2cache for added devices. Make l2cache vdevs inuse checking
logic more closer to spare vdevs.

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #9153 
Closes #12689
2021-11-11 11:54:15 -08:00
Fedor Uporov d04b5c9e87 zhack: Add repair label option
In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. The zhack with newly added cli options could
be used to restore label checksums and make pool importable again.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #2510
Closes #12686
2021-11-11 11:26:18 -08:00
Palash Gandhi 637771a066 ZTS: zfs_list_004_neg should not check paths that belong to ZFS
When ZFS is on root, /tmp is a ZFS. This causes zfs_list_004_neg to
fail since `zfs list` on /tmp passes when the test expects it not to.
The fix is to exclude paths that belong to ZFS.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Palash Gandhi <pbg4930@rit.edu>
Closes #12744
2021-11-11 08:46:44 -07:00
Brian Behlendorf c23803be84 Restore dirty dnode detection logic
In addition to flushing memory mapped regions when checking holes,
commit de198f2d95 modified the dirty dnode detection logic to check
the dn->dn_dirty_records instead of the dn->dn_dirty_link.  Relying
on the dirty record has not be reliable, switch back to the previous
method.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900 
Closes #12745
2021-11-10 16:14:32 -08:00
Brian Behlendorf 371e0f7754 Exclude zfs_copies_003_pos on Linux
This test case may fail on 5.13 and newer Linux kernels if the
/dev/zvol/ device is not created by udev.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12301
Closes  #12738
2021-11-10 13:56:01 -07:00
Fedor Uporov 2a9c572059 zdb: Report bad label checksum
In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. From other side zdb with -l option will not
provide any useful information why it happened. Add notifications
about corrupted label checksums.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #2509
Closes #12685
2021-11-10 12:22:00 -07:00
Dimitri John Ledkov 6c8f03232a Upgrade to libabigail 2.0.0
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Closes #12722
Closes #12739
2021-11-09 17:36:42 -08:00
Tony Hutter ae70d628ff zed: Control NVMe fault LEDs
The ZED code currently can only turn on the fault LED for
a faulted disk in a JBOD enclosure.  This extends support
for faulted NVMe disks as well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12648
Closes #12695
2021-11-09 16:50:18 -08:00
Fedor Uporov e39fe05b69 Skip spacemaps reading in case of pool readonly import
The only zdb utility require to read metaslab-related data during
read-only pool import because of spacemaps validation. Add global
variable which will allow zdb read spacemaps in case of readonly
import mode.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #9095
Closes #12687
2021-11-09 12:50:39 -08:00
Brian Atkinson 345196be18 Single IO issue for raidz writes with skip sector
In order to reduce contention on the vq_lock, optional skip sectors
for Raidz writes can be placed into a single IO request. This is done by
padding out the linear ABD for a parity column to contain the skip
sector and by creating gang ABD to contain the data and skip sector for
data columns.

The vdev_raidz_map_alloc() function now contains specific functions for
both reads and write to allocate the ABD's that will be issued down to
the VDEV chldren.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-By: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #12333
2021-11-09 12:51:33 -07:00
Brian Behlendorf 453c63e9b7 Linux 5.16 compat: submit_bio()
The submit_bio() prototype has changed again.  The version is 5.16
still only expects a single argument but the return type has changed
to void.  Since we never used the returned value before update the
configure check to detect both single arg versions.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12725
2021-11-09 11:27:46 -08:00
Brian Behlendorf 1e7d634867 Linux 5.16 compat: linux/elevator.h
Commit https://github.com/torvalds/linux/commit/2e9bc346 moved
the elevator.h header under the block/ directory as part of some
refactoring.  This turns out not to be a problem since there's
no longer anything we need from the header.  This has been the
case for some time, this change removes the elevator.h include
and replaces it with a major.h include.

Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12725
2021-11-09 11:26:35 -08:00
Rich Ercolani 380b072403 Exclude zvol_misc_volmode for now
It keeps failing, on changes which aren't related at all.

So until someone runs down why, I'd like it to stop being the
sole reason for CI failures.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12733
2021-11-08 19:01:19 -07:00
Brian Behlendorf de198f2d95 Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency
When using lseek(2) to report data/holes memory mapped regions of
the file were ignored.  This could result in incorrect results.
To handle this zfs_holey_common() was updated to asynchronously
writeback any dirty mmap(2) regions prior to reporting holes.

Additionally, while not strictly required, the dn_struct_rwlock is
now held over the dirty check to prevent the dnode structure from
changing.  This ensures that a clean dnode can't be dirtied before
the data/hole is located.  The range lock is now also taken to
ensure the call cannot race with zfs_write().

Furthermore, the code was refactored to provide a dnode_is_dirty()
helper function which checks the dnode for any dirty records to
determine its dirtiness.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900
Closes #12724
2021-11-07 14:27:44 -07:00
Michael Franzl 4b87c1981d Update contrib/initramfs/README.initramfs.markdown
Note that Dropbear supports ed25519 keys since version 2020.79.

See https://github.com/mkj/dropbear/pull/91

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Michael Franzl <michael@franzl.name>
Closes #12715
2021-11-04 09:23:50 -06:00
Rich Ercolani 05679465ac Revert behavior of 59eab109 on not-Linux
It turns out that short-circuiting the EFAULT behavior on a short read
breaks things on FreeBSD. So until there's a nicer solution, let's
just revert the behavior for not-Linux.

Reference:
https://reviews.freebsd.org/R10:70f51f0e474ffe1fb74cb427423a2fba3637544d

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12698
2021-11-04 07:49:40 -06:00
Rich Ercolani e79b6807b8 Workaround issue cleaning up automounted snapshots on Linux
On Linux, sometimes, when ZFS goes to unmount an automounted snap,
it fails a VERIFY check on debug builds, because taskq_cancel_id
returned ENOENT after not finding the taskq it was trying to cancel.

This presumably happens when it already died for some reason; in this
case, we don't really mind it already being dead, since we're just
going to dispatch a new task to unmount it right after.

So we just ignore it if we get back ENOENT trying to cancel here,
retry a couple times if we get back the only other possible condition
(EBUSY), and log to dbgmsg if we got anything but ENOENT or success.

(We also add some locking around taskqid, to avoid one or two cases
of two instances of trying to cancel something at once.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #11632
Closes #12670
2021-11-03 09:00:08 -06:00
Rich Ercolani a2ffc0e025 Add more explicit warning about dedup being dropped
"has unsupported feature: [number]" seems reasonable when we can't
know what the problem was, but with the send -D removal, we know
what it was, and can explicitly tell people "don't do that; try
this if you must".

So let's.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12708
2021-11-02 15:45:20 -06:00
Damian Szuberski 6d680e61ef Update checkstyle workflow env to ubuntu-20.04
- `checkstyle` workflow uses ubuntu-20.04 environment
- improved `mancheck.sh` readability

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12713
2021-11-02 14:02:57 -06:00
Paul Dagnelie 58bf6afd3f Fix cpu hotplug atomic sleep issue
We move the spinlock unlock before the thread creation. This should be
safe because the thread creation code doesn't actually manipulate any
taskq data structures; that's done by the thread once it's created.

We also remove the assertion that the maxthreads is the current threads
plus one; that assertion could fail if multiple hotplug events come in
quick succession, and the first new taskq thread hasn't had a chance to
start processing yet.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
eviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12714
2021-11-02 10:23:48 -06:00
Mike Swanson 321c1b6f39 Disable normalization implicitly when setting "utf8only=off"
When a parent dataset has normalization set to any value other than
"none", and a file system is created with the property "utf8only=off",
implicitly also set "normalization=none" instead of overriding the
desire for a non-UTF8 enforcing file system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes #11892
Closes #12038
2021-10-29 16:59:18 -07:00
Mark Johnston 4d1a3ba6ed Exit the teardown section later in rename on FreeBSD
We have to hold the teardown lock while dereferencing zfsvfs->z_os and,
I believe, when committing to the ZIL.

Note that jumping to the "out" label, "error" is always non-zero.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12704
2021-10-29 16:40:25 -07:00
Mark Johnston 68a7a9edc5 Fix potential use-after-frees in FreeBSD getpages and setattr VOPs
The objset object is reallocated during certain dataset operations, such
as rollbacks, so the objset pointer must be loaded after acquiring the
teardown lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12704
2021-10-29 16:39:47 -07:00
D. Ebdrup 28401f6668 zfsprops.7: Add note about comma-separation
This change primarily seeks to make implicit documentation explicit, as
it is not outright stated that options should be comma-separated, nor is
there a reason given for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Daniel Ebdrup Jensen <debdrup@FreeBSD.org>
Closes #12579
2021-10-29 16:30:44 -07:00
Fedor Uporov 475e41b9f5 Do not print UINT64_MAX value for some of zfs properties
The values of next properties: filesystem_limit, filesystem_count,
snapshot_limit, snapshot_count were returned to user as UINT64_MAX
integers in case if -p cli option is used, return 'none' value instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #9306 
Closes #12690
2021-10-29 16:18:13 -07:00
Rich Ercolani 1139e170d4 Add explicit error for device_rebuild being disabled
Currently, you get back "can only attach to mirrors and top-level disks"
unconditionally if zpool attach returns ENOTSUP, but that also happens
if, say, feature@device_rebuild=disabled and you tried attach -s.

So let's print an error for that case, lest people go down a rabbit hole
looking into what they did wrong.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #11414 
Closes #12680
2021-10-29 15:55:22 -07:00
Rich Ercolani 4476ccd906 Normalize property names for zfs receive
It turns out, userland is much more happy with aliased property
names than the kernel is.

So let's normalize those to the expected names before we pass
them off.

Added a test case hacked up from the other recv -o/-x test that fails
on unpatched git and passes here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12607 
Closes #12609
2021-10-29 15:38:10 -07:00
Rich Ercolani adeccfea17 Python 3.10 fixes, part 2
There was a fallback case I overlooked in the initial patch, with
a similarly imperfect version extractor.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12045 
Closes #12673
2021-10-29 15:36:01 -07:00
Peter Levine 6ef28c526b Set DEFAULT_INIT_SHELL to /sbin/openrc-run for Gentoo and Alpine
Gentoo and Alpine always set the rc init scripts' shebang to
#!/sbin/openrc-run, whether or not openrc is installed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Peter Levine <plevine457@gmail.com>
Closes #12683 
Closes #12692
2021-10-29 15:34:37 -07:00
Tony Hutter 4d4998ea39 vdev_id: Fix PHY sorting
One of our developers noticed a bug in vdev_id where we were incorrectly
sorting PHYs using alphabetical sorting (which usually works) instead
of natural sorting (-v).  For example:

	[port-0:0]# ls -d phy*
	phy-0:10  phy-0:11  phy-0:8  phy-0:9

	[port-0:0]# ls -vd phy*
	phy-0:8  phy-0:9  phy-0:10  phy-0:11

This fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12699
2021-10-29 15:33:34 -07:00
Fedor Uporov d5a5ec4693 Remove unused function zvol_set_volblocksize()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes #12688
2021-10-26 17:07:53 -07:00
Rich Ercolani 6f57f1e382 Make dsl_scan print the pool name in dbgmsg
If you've got multiple scrubs/resilvers going, it's rather helpful
to know which pool each scan line refers to.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12674
2021-10-26 17:24:14 -06:00
Allan Jude 65ad5d1165 spa.c: Replace VERIFY(nvlist_*(...) == 0) with fnvlist_* (#12678)
The fnvlist versions of the functions are fatal if they fail,
saving each call from having to include checking the result.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Allan Jude <allan@klarasystems.com>
2021-10-26 17:15:38 -06:00
Brian Behlendorf 90b77a0364 ZTS: Standardize use of destroy_dataset in cleanup
When cleaning up a test case standardize on using the convention:

    datasetexists $ds && destroy_dataset $ds <flags>

By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure
that the destroy is retried in the event that a ZFS volume is busy.
This helps ensures ensure tests are fully cleaned up and prevents false
positive test failures on Linux.

Note that all of the tests which used 'zfs destroy' in cleanup have
been updated even if they don't use volumes.  This was done to
clearly establish the expected convention.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12663
2021-10-25 15:13:50 -06:00
Rich Ercolani 731fbb5d22 Workaround cloud-init hotplug issue
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12644
Closes #12669
2021-10-25 11:27:05 -06:00
Ryan Moeller 14b69c0929 FreeBSD: Catch up with recent VFS changes
cn_thread is always curthread.

https://cgit.freebsd.org/src/commit?id=b4a58fbf640409a1e507d9f7b411c83a3f83a2f3
https://cgit.freebsd.org/src/commit?id=2b68eb8e1dbbdaf6a0df1c83b26f5403ca52d4c3

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #12668
2021-10-25 09:46:28 -07:00
Attila Fülöp 7cc5cb8083 pam_zfs_key: malloc and mlock/munlock won't match
mlock(2) and munlock(2) operate on memory pages whereas malloc(3)
does not. So if you munlock(2) a malloced memory region, the whole
page containing it is freed. Since this page may contain another
malloced and mlocked memory region, used as a password buffer by a
concurrent running instance of pam_zfs_key, there is a slight chance
of leaking passwords. By using mmap(2) we avoid such problems since
it will return whole pages on page aligned addresses.

Although the above concern may be mostly academical, it is still
better to use mmap(2) for allocating memory since the FreeBSD
documentation suggests to call mlock(2) and munlock(2) on page
aligned addresses, and other implementations even require it.

While here, remove duplicate code in alloc_pw_string() by calling
alloc_pw_size().

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12665
2021-10-22 11:42:34 -07:00
Attila Fülöp 50292e2545 pam_zfs_key: mlock(2) and munlock(2) can fail
Since both syscalls can fail, add error handling, including EAGAIN.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12665
2021-10-22 11:42:23 -07:00
Attila Fülöp ee7c30b350 pam_zfs_key: change test user name to conform to standards
The useradd(8) command on my system won't accept login names with
uppercase letters in them, so adjust for that.

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12665
2021-10-22 11:42:10 -07:00
youzhongyang ec64fdb93d Skip snapshot in zfs_iter_mounted()
The intention of the zfs_iter_mounted() is to traverse the dataset
and its descendants, not the snapshots. The current code can cause
a mounted snapshot to be included and thus zfs_open() on the snapshot
with ZFS_TYPE_FILESYSTEM would print confusing message such as "cannot
open 'rpool/fs@snap': snapshot delimiter '@' is not expected here".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #12447
Closes #12448
2021-10-20 16:07:19 -07:00
Tony Hutter 1886cdfcfb vdev_id: Fix enclosure_symlinks feature
The vdev_id.conf "enclosure_symlinks" option persistently creates
and maps /dev/by-enclosure symlinks to dynamic /dev/sg* devices.

This patch fixes two issues:

1. The enclosure_symlinks feature was accidentally broken in:

   vdev_id: Support daisy-chained JBODs in multipath mode

2. Even when working, the feature numbered the enclosure
   sequentially rather than by HBA port number.  That meant that
   if a port was down or didn't appear in sysfs, then the
   enclosure_sumlinks numbers would be numbered wrong.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12660
2021-10-20 15:48:04 -07:00
felixdoerre 6cb5e1e759 libshare: nfs: pass through ipv6 addresses in bracket notation
Recognize when the host part of a sharenfs attribute is an ipv6
Literal and pass that through without modification.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Felix Dörre <felix@dogcraft.de>
Closes: #11171
Closes #11939
Closes: #1894
2021-10-20 10:40:00 -07:00
Toomas Soome a4cecfbdc9 zpool should call zfs_nicestrtonum() with non-NULL handle
When zfs_nicestrtonum() is called and there will be an error,
the message is left in libzfs handle, if provided. We can use
this message, to provide better feedback for user.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #12650
2021-10-20 10:34:34 -07:00
Pawel Jakub Dawidek a95c82bed8 Remove code duplication
Remove code duplication by moving code responsible for partial block
zeroing to a separate function: dnode_partial_zero().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12627
2021-10-18 16:50:33 -07:00
Francesco Mazzoli fd778e44c8 Notify on UNAVAIL statechange
`UNAVAIL` is maybe not quite as concerning as `DEGRADED`, but still an
event of notice, in my opinion. For example it is triggered when a
drive goes missing.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Francesco Mazzoli <f@mazzo.li>
Closes #12629 
Closes #12630
2021-10-15 15:57:23 -07:00
Teodor Spæren 01b572bc62 zdb: fix overflow of time estimation
The calculation of estimated time remaining in zdb -cc could overflow,
as reported in #10666. This patch fixes this, by using uint64_t instead
of ints in the calculations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Teodor Spæren <teodor@sparen.no>
Closes #10666 
Closes #12610
2021-10-15 15:55:34 -07:00
Pawel Jakub Dawidek afbc617921 Remove FreeBSD's local copy of the dmu_buf_hold_array() function
Make the main dmu_buf_hold_array() function non-static.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #12628
2021-10-13 11:01:01 -07:00
Teodor Spæren d785245857 zio: use unsigned values for enum
cppcheck complains about the use of 1 << 31, because enums are signed
ints which cannot represent this. As discussed in issue #12611, it
appears that with C99, we can use an unsiged int for the enum, on most
platforms.

I've crafted this commit for just the include/sys/zio.h header, as it's
the only one with a shift of 31. If this is something we want to adopt
in the rest of the project, I will go through and apply it to the rest
of the project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Teodor Spæren <teodor@sparen.no>
Closes #12611 
Closes #12615
2021-10-11 10:58:06 -07:00
Brian Behlendorf 280d0f0ce4 Export minimal zfs_refcount interfaces
Lustre makes light use of the zfs_refcount interfaces which
isn't a problem when using a non-debug build of OpenZFS. However,
when debugging is enabled the required symbols are not exported.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12613
2021-10-11 10:54:39 -07:00
Brian Behlendorf 648445e007 ZTS: Add known exceptions
Add the following test failures to the exception list for FreeBSD
to ensure we notice new unexpected failures.

   pool_checkpoint/checkpoint_big_rewind
   pool_checkpoint/checkpoint_indirect

And the following for Linux.

   zvol/zvol_misc/zvol_misc_snapdev

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12621
Issue #12622
Issue #12623
Closes #12624
2021-10-11 10:52:32 -07:00
Brian Behlendorf 72f06d01b5 ZTS: deadman_sync fix
In the CI environment it's possible for events to be slightly
delayed resulting in 4, instead of 5, events appearing in the
log file.  This isn't a problem and should be considered a
success to avoid false positive test results.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12625
2021-10-11 10:49:13 -07:00
nachtgeist a5b464263b initramfs: use correct dataset for rootfs on rollback=1
When booting with root=zfs:rpool/myrootfs@foosnapshot rollback=1,
myrootfs and its descendants get rolled back to foosnapshot, however
ZFS_BOOTFS still contains myrootfs@foosnapshot instead of the
actually desired value of myrootfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Daniel Reichelt <hacking@nachtgeist.net>
Closes #12585 
Closes #12586
2021-10-08 11:16:31 -07:00
Ryan Moeller 97bbeeb938 Fail invalid incremental recursive send gracefully
zfs send -R -i snap1 pool/ds@snap1 is an invalid invocation of zfs send
because the incremental source and target snapshots are the same.  We
have an error message for this condition, but we don't make it there
because of a failed assert while iterating through the dataset's
snapshots.

Check for NULL to avoid the assert so we can make it to the error
message.

Test this form of invalid send invocation in rsend tests.  Fix the
rsend_016_neg test while here: log_neg itself doesn't fail the test,
and writing to /dev/null is not supported on all Linux kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11121 
Closes #12533
2021-10-08 11:14:26 -07:00
Rich Ercolani 9d1407e8f2 Correct refcount_add in dmu_zfetch
refcount_add_many(foo,N) is not the same as
for (i=0; i < N; i++) { refcount_add(foo); }

Unfortunately, this is only actually true with debug kernels and
reference_tracking_enable=1.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12589 
Closes #12602
2021-10-08 11:10:34 -07:00
Valmiky Arquissandas 2d02bba23d arcstat: Fix integer division with python3
The arcstat script requests compatibility with python2 and python3, but
PEP 238 modified the / operator and results in erroneous output when
run under python3.

This commit replaces instances of / with //, yielding the expected
result in both versions of Python.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Valmiky Arquissandas <foss@kayvlim.com>
Closes #12603
2021-10-08 09:32:27 -06:00
Brian Behlendorf 514498fef6 Simplify and document OpenZFS library dependencies
For those not already familiar with the code base it can be a
challenge to understand how the libraries are laid out.  This
has sometimes resulted in functionality being added in the
wrong place.  To help avoid that in the future this commit
documents the high-level dependencies for easy reference in
lib/Makefile.am.  It also simplifies a few things.

- Switched libzpool dependency on libzfs_core to libzutil.
  This change makes it clear libzpool should never depend
  on the ioctl() functionality provided by libzfs_core.

- Moved zfs_ioctl_fd() from libzutil to libzfs_core and
  renamed it lzc_ioctl_fd().  Normal access to the kmods
  should all be funneled through the libzfs_core library.
  The sole exception is the pool_active() which was updated
  to not use lzc_ioctl_fd() to remove the libzfs_core
  dependency.

- Removed libzfs_core dependency on libzutil.

- Removed the lib/libzfs/os/freebsd/libzfs_ioctl_compat.c
  source file which was all dead code.

- Removed libzfs_core dependency from mkbusy and ctime
  test utilities.  It was only needed for some trivial
  wrapper functions and that code is easy to replicate
  to shed the unneeded dependency.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12602
2021-10-07 11:31:26 -06:00
Serapheim Dimitropoulos 48df24d4ce Remove zdb and libzpool from initramfs image
= Motivation

At Delphix we are heavy users of kernel crash dumps that are captured
through a crash kernel that is spawned whenever the main kernel panics.
The way that this works internally is that a certain amount of memory is
reserved while the main system is running so the initramfs of the crash
kernel can be loaded when a panic occurs.

In order to keep reserved memory at minimum we've been historically
trying to identify the binaries that are part of the kernel's initramfs
that are big and finding ways of either making them smaller or do not
include them in the initramfs image. An example is always stripping the
DWARF info of the ZFS kernel module copy that is included in the
initramfs image of both our running and our crash kernel (the difference
in size there is 76MB vs 4MB).

We've recently identified that libzpool has been the largest binary in
our initramfs images - currently sized around 17MB.

= This Patch

The ZFS scripts do not explicitly copy libzpool to initramfs. They copy
zdb which pulls in libzpool as a dependency. Given that both zdb and
libzpool are not really essential for initramfs (e.g. we'll still have
access to the once the root filesystem is unpacked) this patch removes
them from initramfs.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #12616
2021-10-07 10:52:03 -06:00
Rich Ercolani 543ab04cc3 Document additional -c caveat
One might expect "send data as it is on disk, and cannot trigger
compression changes" to imply "does not attempt to compress data
that was not compressed on the sender."

One would be mistaken.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12570
2021-10-05 11:48:17 -07:00
Tony Hutter 2a8430a260 Rescan enclosure sysfs path on import
When you create a pool, zfs writes vd->vdev_enc_sysfs_path with the
enclosure sysfs path to the fault LEDs, like:

    vdev_enc_sysfs_path = /sys/class/enclosure/0:0:1:0/SLOT8

However, this enclosure path doesn't get updated on successive imports
even if enclosure path to the disk changes.  This patch fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11950 
Closes #12095
2021-10-04 12:32:16 -07:00
Rich Ercolani aad91df075 Reject zfs send -RI with nonexistent fromsnap
Right now, zfs send -I dataset@nonexistent dataset@existent fails, but
zfs send -RI dataset@nonexistent dataset@existent does not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12574
Closes #12575
2021-10-04 10:17:30 -07:00
Attila Fülöp 60b618a967 ZFS: Remove a redundant if condition (#12598)
Commit 0c03d21ac9 left in a redundant if condition while
removing some code. Just remove it.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12598
2021-10-02 12:50:57 -06:00
José Luis Salvador Rufo ed3a3bdb0d Proper support for DESTDIR and INSTALL_MOD_PATH
The environment variables DESTDIR and INSTALL_MOD_PATH must
be mutually exclusive.

https://www.gnu.org/prep/standards/html_node/DESTDIR.html
https://www.kernel.org/doc/Documentation/kbuild/modules.txt

This issue was discussed in this Buildroot thread:
https://lists.buildroot.org/pipermail/buildroot/2021-August/621350.html

I saw this behavior in other different projects, as:

- Yocto Project:
  https://www.yoctoproject.org/pipermail/meta-freescale/2013-August/004307.html

- Google IA Coral:
  https://coral.googlesource.com/linux-imx-debian/+/refs/heads/master/debian/rules

For the above reasons, INSTALL_MOD_PATH will be set as DESTDIR
by default.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Signed-off-by: Romain Naour <romain.naour@gmail.com>
Closes #12577
2021-10-01 10:44:34 -07:00
Ryan Moeller 96ad227a9d ZTS: Minimize udev_wait in zvol_misc tests
The zvol_misc tests, in particular zvol_misc_volmode, make use of a
common udev_wait function to wait for zvol devices in /dev to quiesce
on Linux.  On other platforms this function currently only sleeps for
one second before returning.  This is insufficient, and
zvol_misc_volmode has been flaky on FreeBSD as a result.

Replace udev_wait with block_device_wait, passing through the optional
device parameter where possible.  Rearrange a few checks to strengthen
the verifications we are making and avoid unnecessarily sleeping.  We
must keep udev_wait in a couple places to pass in Github CI workflows.
Remove zvol_misc_volmode from the maybe failing tests on FreeBSD in
zts-report.py.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12583
2021-10-01 09:36:02 -06:00
Érico Nogueira Rolim ce2bdcedf5 libspl: fix warning about missing spl_pagesize declaration
Though it's unlikely anyone will alter its signature, avoids any
possible type mismatch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Érico Nogueira <erico.erc@gmail.com>
Closes #12567
2021-09-24 16:59:26 -06:00
John Wren Kennedy df5ea74ff6 Assorted parameter changes for performance tests
* Add async runs for sequential_writes, random_readwrite_fixed and
  random_writes
* Remove some larger block sizes that give similar results to others
* Remove nthreads == 4 from random_writes_zil test

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12576
2021-09-21 16:17:36 -06:00
Rich Ercolani 59eab1093a Handle partial reads in zfs_read
Currently, dmu_read_uio_dnode can read 64K of a requested 1M in one
loop, get EFAULT back from zfs_uiomove() (because the iovec only holds
64k), and return EFAULT, which turns into EAGAIN on the way out. EAGAIN
gets interpreted as "I didn't read anything", the caller tries again
without consuming the 64k we already read, and we're stuck.

This apparently works on newer kernels because the caller which breaks
on older Linux kernels by happily passing along a 1M read request and a
64k iovec just requests 64k at a time.

With this, we now won't return EFAULT if we got a partial read.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12370 
Closes #12509
Closes #12516
2021-09-20 10:30:50 -07:00
Jorgen Lundman 1d901c3ee5 Upstream: unmount snapshots before destroying them on macOS
Add function zfs_destroy_snaps_nvl_os() call. The main issue is that
macOS needs to unmount any mounted snapshots before they can be
destroyed. Other platforms can handle this in the kernel, but sending
a storm of zed events to unmount seems undesirable when we can do it
in userland to start with.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Co-authored-by: ilovezfs <ilovezfs@icloud.com>
Closes #12550
2021-09-20 09:29:59 -06:00
Rich Ercolani 8a3fe59c03 Added test for being able to read various variants of zstd
As detailed in #12022 and #12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.

So I've added a test which contains a dataset with a file written by
Linux/x86_64 and one written by FBSD/ppc64.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12030
2021-09-20 09:08:20 -06:00
Alexander Motin 139690d6c3 Really zero the zero page
While switching abd_zero_buf allocation KPI I've missed the fact
that kmem_zalloc() zeroed the allocation, while kmem_cache_alloc()
does not.  Add explicit bzero() after it.

I don't think it should have caused real problems, but leaking one
memory page content all over the pool is not good.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12569
2021-09-17 10:17:18 -07:00
George Amanakis 2a49ebbb4d Avoid panic in case of pool errors and missing L2ARC
In case an ARC buffer is allocated only on L2ARC, and there are
underlying errors in a pool with the cache device in faulty state, a
panic can occur in arc_read_done()->arc_hdr_destroy()->
arc_hdr_l2arc_destroy()->arc_hdr_clear_flags() when trying to free
the ARC buffer.

Fix this by discarding the buffer's identity in arc_hdr_destroy(), in
case the buffer is not empty, before calling arc_hdr_l2hdr_destroy().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12392
2021-09-16 09:40:15 -07:00
Brian Behlendorf 6065740811 Linux 5.14 compat: META
Increase the Linux-Maximum version in the META file to 5.14.
All of the required compatibility patches have been merged
and the 5.14 kernel has been officially released.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12565
2021-09-15 14:09:31 -07:00
Allan Jude 4a1195ca50 Temporarily use root credentials to mount snapshots in .zfs
When mounting a snapshot in the .zfs/snapshots control directory,
temporarily assume roots credentials to perform the VFS_MOUNT().

This allows regular users and users inside jails to access these
snapshots.

The regular usermount code is not helpful here, since it requires
that the user performing the mount own the mountpoint, which won't
be the case for .zfs/snapshot/<snapname>

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-By: Modirum MDPay
Sponsored-By: Klara Inc.
Closes #11312
2021-09-14 17:10:00 -06:00
Brian Behlendorf 6954c22f35 Use fallthrough macro
As of the Linux 5.9 kernel a fallthrough macro has been added which
should be used to anotate all intentional fallthrough paths.  Once
all of the kernel code paths have been updated to use fallthrough
the -Wimplicit-fallthrough option will because the default.  To
avoid warnings in the OpenZFS code base when this happens apply
the fallthrough macro.

Additional reading: https://lwn.net/Articles/794944/

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12441
2021-09-14 10:17:54 -06:00
Jorgen Lundman 7443299fe0 Iterate encrypted clones at zvol_create_minor
Userland figures out which encryption-root keys are required to load,
and issues ZFS_IOC_LOAD_KEY.
The tail section of spa_keystore_load_wkey() will call
zvol_create_minors() on the encryption-root object.

Any clones of the encrypted zvol will not be plumbed. This commits
adds additional logic to detect if zvol has clones, and is encrypted,
then adds these to the list of zvols to call zvol_create_minors() on.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12471
2021-09-13 13:27:07 -07:00
Arun KV f82f0279ed Fixed data integrity issue when underlying disk returns error
Errors in zil_lwb_write_done() are not propagated to
zil_lwb_flush_vdevs_done() which can result in zil_commit_impl()
not returning an error to applications even when zfs was not able
to write data to the disk.

Remove the ZIO_FLAG_DONT_PROPAGATE flag from zio_rewrite() to
allow errors to propagate and consolidate the error handling for
flush and write errors to a single location (rather than having
error handling split between the "write done" and "flush done"
handlers).

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Arun KV <arun.kv@datacore.com>
Closes #12391
Closes #12443
2021-09-13 13:02:39 -07:00
Brian Behlendorf 695d4ae815 ZTS: Waiting for zvols to be available
This is a follow up patch for PR #12515 which addresses some
additional ZTS tests which are unreliable are should explicitly
wait for the required zvols to be available.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: @Theo13111
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12553
2021-09-13 12:18:01 -07:00
Brian Behlendorf b9ec4a15e5 Verify embedded blkptr's in arc_read()
The block pointer verification check in arc_read() should also
cover embedded block pointers.  While highly unlikely, accessing
a damaged block pointer can result in panic.  To further harden
the code extend the existing check to include embedded block
pointers and add a comment explaining the rational for this
sanity check.  Lastly, correct a flaw in zfs_blkptr_verify()
so the error count is checked even when checking a untrusted
config to verify the non-pool-specific portions of a block
pointer.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12535
2021-09-09 19:02:07 -06:00
Jorgen Lundman 5a54a4e051 Upstream: Add snapshot and zvol events
For kernel to send snapshot mount/unmount events to zed.

For kernel to send symlink creates/removes on zvol plumbing.
(/dev/run/dsk/zvol/$pool/$zvol -> /dev/diskX)

If zed misses the ENODEV, all errors after are EINVAL. Treat any error
as kernel module failure.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12416
2021-09-09 10:44:21 -07:00
Brian Behlendorf 2079111f42 Linux 5.15 compat: get_acl()
Kernel commits

332f606b32b6 ovl: enable RCU'd ->get_acl()
0cad6246621b vfs: add rcu argument to ->get_acl() callback

Added compatibility code to detect the new ->get_acl() interface
and correctly handle the case where the new rcu argument is set.

Reviewed-by: Coleman Kane <ckane@colemankane.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12548
2021-09-09 09:38:35 -07:00
Allan Jude a68e4b5919 Allow sending corrupt snapshots even if metadata is corrupted
When zfs_send_corrupt_data is set, use the TRAVERSE_HARD flag,
so traverse_visitbp() will not fail with ECKSUM if a blockpointer
cannot be read, but rather will continue and send the objects it can.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-By: Klara Inc.
Sponsored-By: WHC Online Solutions Inc.
Closes #12541
2021-09-09 08:17:31 -06:00
Rich Ercolani 7676ffc51f arc: Drop an incorrect assert
Unfortunately, there was an overzealous assertion that was (in pretty
specific circumstances) false, causing failure.  This assertion was
added in error, so we're removing it.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #9897
Closes #12020
Closes #12246
2021-09-08 14:00:03 -07:00
Jorgen Lundman 49d8a99a4d Upstream: zdb inode mapping
Unfortunately macOS reserves inode ID numbers 0-15, and we can
not used them. In macOS port we simply map them really high IDs.

Normally this is hidden inside the _os implementation, but this is
the one place in the common source files.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12530
2021-09-08 13:56:04 -07:00
Paul Dagnelie c634320e51 Compressed receive with different ashift can result in incorrect PSIZE on disk
We round up the psize to the nearest multiple of the asize or to the
lsize, whichever is smaller. Once that's done, we allocate a new
buffer of the appropriate size, zero the tail, and copy the data
into it. This adds a small performance cost to these kinds of writes,
but fixes the bookkeeping problems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12522
Closes #8462
2021-09-08 13:52:28 -07:00
Alexander a3588c68f7 Linux 5.15 compat: standalone <linux/stdarg.h>
Kernel commits

39f75da7bcc8 ("isystem: trim/fixup stdarg.h and other headers")
c0891ac15f04 ("isystem: ship and use stdarg.h")
564f963eabd1 ("isystem: delete global -isystem compile option")

(for now can be found in linux-next.git tree, will land into the
 Linus' tree during the ongoing 5.15 cycle with one of akpm merges)

removed the -isystem flag and disallowed the inclusion of any
compiler header files. They also introduced a minimal
<linux/stdarg.h> as a replacement for <stdarg.h>.
include/os/linux/spl/sys/cmn_err.h in the ZFS source tree includes
<stdarg.h> unconditionally. Introduce a test for <linux/stdarg.h>
and include it instead of the compiler's one to prevent module
build breakage.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #12531
2021-09-08 12:59:43 -07:00
Brian Behlendorf f616605821 Linux 5.15 compat: block device readahead
The 5.15 kernel moved the backing_dev_info structure out of
the request queue structure which causes a build failure.

Rather than look in the new location for the BDI we instead
detect this upstream refactoring by the existance of either
the blk_queue_update_readahead() or disk_update_readahead()
functions.  In either case, there's no longer any reason to
manually set the ra_pages value since it will be overridden
with a reasonable default (2x the block size) when
blk_queue_io_opt() is called.

Therefore, we update the compatibility wrapper to do nothing
for 5.9 and newer kernels.  While it's tempting to do the
same for older kernels we want to keep the compatibility
code to preserve the existing behavior.  Removing it would
effectively increase the default readahead to 128k.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12532
2021-09-08 09:03:13 -06:00
Don Brady ab15b1fc0e Detect iSCSI in the zpool cmd vdev media script
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #12206
2021-09-02 16:11:53 -07:00
George Melikov 3eb3e4d14c CI: don't install abigail-tools
We use docker image instead.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-02 10:02:27 -07:00
George Melikov 6ea058da16 Update ABI files via new libabigail version
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-02 10:02:18 -07:00
George Melikov 1aaebea2f5 Libabigail: make .abi files more consistent
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-02 10:02:08 -07:00
George Melikov d510924520 CI: use fresh libabigail via docker image
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-02 10:01:58 -07:00
George Melikov a9655fc2bd Check for libabigail version
We need to use 1.8.0+ version, older versions
may segfault and give inconsistent results.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12529
2021-09-02 10:01:45 -07:00
Jorgen Lundman 157a4d05bd autoconf: allow Release to contain hyphen
To avoid clashing with tags and releases, we'll use "zfs-macOS".

Meta:          1
Name:          zfs-macOS

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12437
2021-09-02 07:51:29 -06:00
Ryan Moeller c27c124a88 ZTS: Remove exceptions for flaky zhack on FreeBSD
Issue #11854 has been resolved, so we can remove the exceptions for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12527
2021-09-01 13:20:00 -07:00
Jorgen Lundman 3e8d5e4ff3 Add zpool_disable_datasets_os() / zfs_unmount_os()
zpool_disable_datasets_os():
macOS needs to do a bunch of work to kick everything off zvols.

zfs_unmount_os():
This allows us to unmount any zvols that may be mounted. Like with
zfs destroy foo/vol

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12436
2021-08-31 09:56:00 -06:00
Ryan Moeller 3b89d9518d FreeBSD: Don't remove SA xattr if not SA znode
We attempt to remove an existing SA xattr when setting a dir xattr, but
this only makes sense if the znode has been upgraded to the SA format.
Otherwise, we will hit an assert in zfs_sa_get_xattr.

Make sure this is an SA znode before attempting to remove the SA xattr.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12514
2021-08-30 16:01:09 -07:00
Rich Ercolani b1a1c64313 Fix cross-endian interoperability of zstd
It turns out that layouts of union bitfields are a pain, and the
current code results in an inconsistent layout between BE and LE
systems, leading to zstd-active datasets on one erroring out on
the other.

Switch everyone over to the LE layout, and add compatibility code
to read both.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12008
Closes #12022
2021-08-30 14:13:46 -07:00
Ka Ho Ng c3cb57ae47 ZTS: Enable punch-hole tests on FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ka Ho Ng <khng@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #12458
2021-08-30 13:33:32 -07:00
Ka Ho Ng f3bbeb970e FreeBSD: Implement hole-punching support
This adds supports for hole-punching facilities in the FreeBSD kernel
starting from __FreeBSD_version 1400032.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ka Ho Ng <khng@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes #12458
2021-08-30 13:33:30 -07:00
Brian Behlendorf 70bf547a98 ZTS: Waiting for zvols to be available
The ZTS block_device_wait helper function should use -e when waiting
for a file to appear since it will be either a block special device
or a symlink.  This didn't cause any failures but when a device path
was specified the function would wait longer than needed.

Additionally update the most flakey test cases to pass the file path
to block_device_wait to try and improve the test reliability.  The
udev behavior on Fedora in particular can result in frequent false
positives.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12515
2021-08-29 09:56:58 -06:00
Ryan Moeller 5bee265908 Correct checking bdev_check_media_change message
We're not looking for bdev_disk_changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12492
2021-08-27 10:02:54 -07:00
Tony Hutter 4dc115328f Make 'zpool labelclear -f' work on offlined disks
This patch allows you to clear the label on offlined disks in an active
pool with `-f`.  Previously, labelclear wouldn't let you do that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #12511
2021-08-27 10:26:49 -06:00
Trevor Bautista 00888c0898 Extend zpool-iostat to account for ZIO_PRIORITY_REBUILD (#12319)
Previously, zpool-iostat did not display any data regarding rebuild I/Os
in either the latency/size histograms (-w/-l/-r) or the queue data (-q).
This fix essentially utilizes the existing infrastructure for tracking
rebuild queue data and displays this data in the proper places within
zpool-iostat's output.

Signed-off-by: Trevor Bautista <tbautista@newmexicoconsortium.org>
Signed-off-by: Trevor Bautista <tbautista@lanl.gov>
Co-authored-by: Trevor Bautista <tbautista@newmexicoconsortium.org>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2021-08-26 11:26:49 -07:00
Anton Gubarkov b0f3f393d1 vdev_id: Return an error if config file is not found (#12508)
Signed-off-by: Anton Gubarkov <anton.gubarkov@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-08-25 13:01:26 -07:00
Sam Hathaway 2e9e2c8ff5 zpool-remove.8: describe top-level vdev sector size limitation
Document that top-level vdevs cannot be removed unless all top-level
vdevs have the same sector size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Sam Hathaway <sam@sam-hathaway.com>
Closes #11339
Closes #12472
2021-08-23 14:59:18 -07:00
Mark Johnston 3ee9a997a3 Initialize parity blocks before RAID-Z reconstruction benchmarking
benchmark_raidz() allocates a row to benchmark parity calculation and
reconstruction.  In the latter case, the parity blocks are left
uninitialized, leading to reports from KMSAN.

Initialize parity blocks to 0xAA as we do for the data earlier in the
function.  This does not affect the selected RAID-Z implementation on
any of several systems tested.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12473
2021-08-23 11:10:17 -07:00
Ryan Moeller 8ae86e2edc ZTS: Add tests for creation time
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12432
2021-08-17 10:25:58 -07:00
Richard Yao abbf0bd4eb Linux 4.11 compat: statx support
Linux 4.11 added a new statx system call that allows us to expose crtime
as btime. We do this by caching crtime in the znode to match how atime,
ctime and mtime are cached in the inode.

statx also introduced a new way of reporting whether the immutable,
append and nodump bits have been set. It adds support for reporting
compression and encryption, but the semantics on other filesystems is
not just to report compression/encryption, but to allow it to be turned
on/off at the file level. We do not support that.

We could implement semantics where we refuse to allow user modification
of the bit, but we would need to do a dnode_hold() in zfs_znode_alloc()
to find out encryption/compression information. That would introduce
locking that will have a minor (although unmeasured) performance cost.
It also would be inferior to zdb, which reports far more detailed
information. We therefore omit reporting of encryption/compression
through statx in favor of recommending that users interested in such
information use zdb.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #8507
2021-08-17 10:25:58 -07:00
Gordon Bergling 0f402668f9 zfs.4: Fix typo s/compatiblity/compatibility/
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Gordon Bergling <gbergling@googlemail.com>
Closes #12464
2021-08-17 11:01:07 -06:00
Alexander Motin 6b88b4b501 Remove b_pabd/b_rabd allocation from arc_hdr_alloc()
When a header is allocated for full overwrite it is a waste of time
to allocate b_pabd/b_rabd for it, since arc_write() will free them
without ever being touched.  If it is a read or a partial overwrite
then arc_read() and arc_hdr_decrypt() allocate them explicitly.

Reduced memory allocation in user threads also reduces ARC eviction
throttling there, proportionally increasing it in ZIO threads, that
is not good.  To minimize or even avoid it introduce ARC allocation
reserve, allowing certain arc_get_data_abd() callers to allocate a
bit longer in situations where user threads will already throttle.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12398
2021-08-17 10:15:54 -06:00
Alexander Motin 72f0521aba Increase default volblocksize from 8KB to 16KB
Many things has changed since previous default was set many years ago.
Nowadays 8KB does not allow adequate compression or even decent space
efficiency on many of pools due to 4KB disk physical block rounding,
especially on RAIDZ and DRAID.  It effectively limits write throughput
to only 2-3GB/s (250-350K blocks/s) due to sync thread, allocation,
vdev queue and other block rate bottlenecks.  It keeps L2ARC expensive
despite many optimizations and dedup just unrealistic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12406
2021-08-17 09:59:46 -06:00
Alexander Motin bb7ad5d326 Optimize arc_l2c_only lists assertions
It is very expensive and not informative to call multilist_is_empty()
for each arc_change_state() on debug builds to check for impossible.
Instead implement special index function for arc_l2c_only->arcs_list,
multilists, panicking on any attempt to use it.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12421
2021-08-17 09:55:34 -06:00
Alexander Motin cfe8e960f1 Fix/improve dbuf hits accounting
Instead of clearing stats inside arc_buf_alloc_impl() do it inside
arc_hdr_alloc() and arc_release().  It fixes statistics being wiped
every time a new dbuf is filled from the ARC.

Remove b_l1hdr.b_l2_hits. L2ARC hits are accounted at b_l2hdr.b_hits.
Since the hits are accounted under hash lock, replace atomics with
simple increments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12422
2021-08-17 09:50:31 -06:00
Alexander Motin 7f9d9e6f39 Avoid vq_lock drop in vdev_queue_aggregate()
vq_lock is already too congested for two more operations per I/O.
Instead of dropping and reacquiring it inside vdev_queue_aggregate()
delegate the zio_vdev_io_bypass() and zio_execute() calls for parent
I/Os to callers, that drop the lock any way to execute the new I/O.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12297
2021-08-17 09:47:00 -06:00
Alexander Motin e829a865bf Use more atomics in refcounts
Use atomic_load_64() for zfs_refcount_count() to prevent torn reads
on 32-bit platforms.  On 64-bit ones it should not change anything.

When built with ZFS_DEBUG but running without tracking enabled use
atomics instead of mutexes same as for builds without ZFS_DEBUG.
Since rc_tracked can't change live we can check it without lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12420
2021-08-17 09:44:34 -06:00
Ryan Moeller 5bfc3a99f9 ZTS: Avoid unset $tmpdir in redacted_panic
The redacted_send tests make use of a $tmpdir variable, except in
redacted_send/redacted_panic the variable is never defined.

Use $TEST_BASE_DIR instead.

Clean up the stream file after the test.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12455
2021-08-16 16:38:34 -07:00
Allan Jude e945e8d7f4 Restore FreeBSD sysctl processing for arc.min and arc.max
Before OpenZFS 2.0, trying to set the FreeBSD sysctl vfs.zfs.arc_max
to a disallowed value would return an error.
Since the switch, it instead only generates WARN_IF_TUNING_IGNORED

Keep the ability to set the sysctl's specifically to 0, even though
that is less than the minimum, because some tests depend on this.

Also lost, was the ability to set vfs.zfs.arc_max to a value less
than the default vfs.zfs.arc_min at boot time. Restore this as well.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #12161
2021-08-16 09:35:19 -06:00
Ryan Moeller 41bee4039c zfs: add missed dependency of zfs module on zlib
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Martin Matuska <mm@FreeBSD.org>
Co-authored-by: Konstantin Belousov <kib@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
External-issue: https://reviews.freebsd.org/D31207
Closes #12442
2021-08-13 13:42:45 -07:00
Ryan Moeller 6daf036ab5 Add zfs.sh -r flag to reload modules
zfs.sh already can load and unload, so why not both?

This is convenient when developing changes to the module and you want
to rapidly make some changes, rebuild the module, reload the module,
and test the changes.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12450
2021-08-13 13:37:46 -07:00
Ryan Moeller a7491f9990 Fix usage of find in tests/Makefile.am
The path is not optional on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12453
2021-08-13 13:13:57 -07:00
Tony Nguyen 6bc61d22c4 Run arc_evict thread at higher priority
Run arc_evict thread at higher priority, nice=0, to give it more CPU
time which can improve performance for workload with high ARC evict
activities.

On mixed read/write and sequential read workloads, I've seen between
10-40% better performance.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #12397
2021-08-10 11:36:26 -06:00
Rich Ercolani f3678d70ff Make get_key_material_file fail more verbosely
It turns out, there are a lot of possible reasons for fopen to fail.
Let's share which reason we failed for today.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12410
2021-08-05 17:48:33 -06:00
Brian Behlendorf ae1e40b329 Enable /proc/diskstats for zvols
The /proc/diskstats accounting needs to be explicitly enabled
for block devices which do not use multi-queue.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12440
Closes #12066
2021-08-05 15:35:34 -06:00
George Melikov e1870be450 Man zpool-scrub.8: describe sequential scrub
Describe sequential scrub and add examples of scrub status.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12429
2021-08-05 15:30:28 -06:00
hedongzhang 4357552785 Modify checksum obtain method of QAT
CpaDcGeneratefooter function that obtain the checksum code
does not support the CPA_DC_STATELESS mode. So we get the
adler32 chencksum of the end of the zlib from dc_results.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chengfei Zhu <chengfeix.zhu@intel.com>
Signed-off-by: hedong.zhang <h_d_zhang@163.com>
Closes #12343
2021-08-03 11:46:33 -06:00
Mark Johnston 46b2623586 Allow disabling of unmapped I/O on FreeBSD
We have a tunable which permits one to disable the use of unmapped I/O
for the buffer cache.  Respect it in ZFS as well.  This is useful for
KMSAN, which cannot easily maintain shadow state for unmapped pages.

No functional change intended, as unmapped I/O is permitted by default
and there's no real reason to disable it in practice except for
debugging.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12446
2021-08-02 13:18:24 -06:00
Alexander Motin 7eebcd2be6 Avoid small buffer copying on write
It is wrong for arc_write_ready() to use zfs_abd_scatter_enabled to
decide whether to reallocate/copy the buffer, because the answer is
OS-specific and depends on the buffer size.  Instead of that use
abd_size_alloc_linear(), moved into public header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12425
2021-07-27 16:05:47 -07:00
Brian Behlendorf b72611f0f6 Fix format specifier warnings
Commit 5dbf6c5a66 did not address these format specifier warnings
since they were introduced by an unrelated commit which had not
been rebased on 5dbf6c5a66 when merged.  Fix them.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12435
2021-07-27 09:50:00 -07:00
Brian Behlendorf 4bd99c11d7 Remove overlooked __sun_attr__ based macros
The __NORETURN, __CONST, and __PURE macros in the FreeBSD platform
code were based on the __sun_attr__ macro which was removed in
commit 5dbf6c5a6.  This caused a build failure because the
__NORETURN macro was still used in one place in kernel code.
The __CONST and __PURE macros were entirely unused.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12435
2021-07-27 09:49:11 -07:00
George Melikov 9776838cfb Update ABI files with generated in CI worker
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12388
2021-07-26 16:55:18 -07:00
George Melikov 0b072481af storeabi: add no-corpus-path flag
Try to minimize diffs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12388
2021-07-26 16:54:53 -07:00
Jorgen Lundman 273730d5b5 macOS can also set va_type
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12357
2021-07-26 16:38:06 -07:00
Jorgen Lundman 02601d8aa4 Move check_file to os/$platform section
Keep check_file_generic() in shared code base, and allow special case
code in check_file() in os section. In future, macOS will have
additional checks in check_file().

Linux and FreeBSD wrappers just calls check_file_generic().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12385
2021-07-26 16:34:11 -07:00
Alexander Motin dd3bda39cf Add comment on metaslab_class_throttle_reserve() locking
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Issue #12314
Closes #12419
2021-07-26 16:30:20 -07:00
John Wren Kennedy bdd2bfd02c Assorted fixes for the performance tests
- Bail out early if we're running the perf tests and forget to
  specify disks.
- Allow perf tests to run with any number of disks.
- Remove weekly vs. nightly settings
- Move variables with common values to perf.shlib
- Use zinject to clear the ARC over export/import
- Fix dbuf cache size calculation

When the meaning of `dbuf_cache_max_bytes` changed, the performance
test that covers the dbuf cache started to fail. The test would try to
write files for the test using the max possible size of the cache,
inevitably filling the pool and failing. This change uses
`dbuf_cache_shift` to correctly calculate the dbuf cache size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #12408
2021-07-26 15:47:08 -06:00
Matthew Ahrens d8381f50d6 Read past end of argv array in zpool_do_import()
`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and
`pool_specified` to `import_pools()`.  If `pool_specified==FALSE`, the
`argv[]` arguments are not used.  However, these values may be off the
end of the `argv[]` array, so loading them could dereference unmapped
memory.  This error is reported by the asan build:

```
=================================================================
==6003==ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 8 at 0x6030000004a8 thread T0
    #0 0x562a078b50eb in zpool_do_import zpool_main.c:3796
    #1 0x562a078858c5 in main zpool_main.c:10709
    #2 0x7f5115231bf6 in __libc_start_main
    #3 0x562a07885eb9 in _start

0x6030000004a8 is located 0 bytes to the right of 24-byte region
allocated by thread T0 here:
    #0 0x7f5116ac6b40 in __interceptor_malloc
    #1 0x562a07885770 in main zpool_main.c:10699
    #2 0x7f5115231bf6 in __libc_start_main
```

This commit passes NULL for these arguments if they are off the end
of the `argv[]` array.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #12339
2021-07-26 12:51:39 -07:00
Václav Skála 31c41aea9c Add missing properties to zfs allow manpage
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Václav Skála <skala@vshosting.cz>
Closes #12402
2021-07-26 12:44:01 -07:00
George Amanakis ab8a8f0745 Fixes in persistent L2ARC
In l2arc_add_vdev() first decide whether the device is eligible for
L2ARC rebuild or whole device trim and then add it to the list of cache
devices. Otherwise l2arc_feed_thread() might already start writing on
the device invalidating previous content as l2ad_hand = l2ad_start.
However l2arc_rebuild_vdev() needs the device present in the cache
device list to figure out its l2arc_dev_t. Fix this by moving most of
l2arc_rebuild_vdev() in a new function l2arc_rebuild_dev() which does
not need to search in the cache device list.

In contrast to l2arc_add_vdev() we do not have to worry about
l2arc_feed_thread() invalidating previous content when onlining a
cache device. The device parameters (l2ad*) are not cleared when
offlining the device and writing new buffers will not invalidate
all previous content. In worst case only buffers that have not had
their log block written to the device will be lost.

Retire persist_l2arc_00{4,5,8} tests since they cover code already
covered by the remaining ones. Test persist_l2arc_006 is renamed to
persist_l2arc_004 and persist_l2arc_007 is renamed to persist_l2arc_005.

Fix a typo in persist_l2arc_004, and remove an assertion that is not
always true from l2arc_arcstats_pos. Also update an assertion in
persist_l2arc_005 and explain why in a comment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12365
2021-07-26 12:30:24 -07:00
наб 037af3e0d4 Remove NOTE(CONSTCOND) and note.h
These were mostly used to annotate do {} while(0)s

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12201
2021-07-26 12:07:53 -07:00
наб 2c69ba6444 Normalise /*FALLTHR{OUGH,U}*/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12201
2021-07-26 12:07:39 -07:00
наб 90f1c3c946 Prune /*NOTREACHED*/
This includes a simplification of mkbusy and format correctness in zhack
and ztest

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12201
2021-07-26 12:07:26 -07:00
наб 5dbf6c5a66 Replace /*PRINTFLIKEn*/ with attribute(printf)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12201
2021-07-26 12:07:15 -07:00
Mark Johnston 1373709450 Initialize dn_next_type[] in the dnode constructor
It seems nothing ensures that this array is zeroed when a dnode is
freshly allocated, so in principle it retains the values from the
previous allocation.  In practice it seems to be the case that the
fields should end up zeroed, but we can zero the field anyway for
consistency.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-07-26 11:53:47 -07:00
Mark Johnston 3a185275a0 Zero pad bytes following TX_WRITE log data
When logging a TX_WRITE record in the case where file data has to be
copied from the DMU, we pad the log record size to a multiple of 8
bytes.  In this case, any padding bytes should be zeroed, otherwise the
contents of uninitialized memory are written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-07-26 11:53:47 -07:00
Mark Johnston 58714c2817 Zero pad bytes when allocating a ZIL record
When allocating a record, we round up the allocation size to a multiple
of 8.  In this case, any padding bytes should be zeroed, otherwise the
contents of uninitialized memory are written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-07-26 11:53:47 -07:00
Mark Johnston 03363b2f86 Initialize all fields in zfs_log_xvattr()
When logging TX_SETATTR, we could otherwise fail to initialize part of
the corresponding ZIL record depending on which fields are present in
the xvattr.  Initialize the creation time and the AV scan timestamp to
zero so that uninitialized bytes are not written to the ZIL.

This was found using KMSAN.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-07-26 11:53:47 -07:00
Mark Johnston da27b8bc7f Initialize "autoreplace" in spa_ld_get_props()
spa_prop_find() may fail to find the specified property, in which case
it suppresses ENOENT from zap_lookup().  In this case, the return value
is left uninitialized, so spa_autoreplace was being initialized using an
uninitialized stack variable.

This was found using KMSAN.  It appears to be a regression from commit
9eb7b46ed0, which removed the initialization of "autoreplace" from the
definition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12383
2021-07-26 11:53:47 -07:00
Coleman Kane 1c24bf966c Linux 5.14 compat: explicity assign set_page_dirty
Kernel 5.14 introduced a change where set_page_dirty of
struct address_space_operations is no longer implicitly set to
__set_page_dirty_buffers(), which ended up resulting in a NULL
pointer deref in the kernel when it is attempted to be called.
This change sets .set_page_dirty in the structure to
__set_page_dirty_nobuffers(), which was introduced with the
related patch set. The breaking change was introduce in commit
0af573780b0b13fceb7fabd49dc1b073cee9a507 to torvalds/linux.git.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #12427
2021-07-26 10:55:55 -07:00
Rich Ercolani f1ca7999bb Fix unfortunate NULL in spa_update_dspace
After 1325434b, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12380 
Closes #12428
2021-07-26 10:51:30 -07:00
Brian Behlendorf 1b06b03a7b Linux 5.14 compat: blk_alloc_disk()
In Linux 5.14, blk_alloc_queue is no longer exported, and its usage
has been superseded by blk_alloc_disk, which returns a gendisk struct
from which we can still retrieve the struct request_queue* that is
needed in the one place where it is used. This also replaces the call
to alloc_disk(minors), and minors is now set via struct member
assignment.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12362 
Closes #12409
2021-07-23 15:28:03 -07:00
Ryan Moeller 14b43fbd9c zloop: Add a max iterations option, use default run/pass times
It is useful to have control over the number of iterations of zloop so
we can easily produce "x core dumps found *in y iterations*" metrics.

Using random values for run/pass times doesn't improve coverage in a
meaningful way.

Randomizing run time could be seen as a compromise between running a
greater variety of shorter tests versus a smaller variety of longer
tests within a fixed time span.  However, it is not desirable when
running a fixed number of iterations.

Pass time already incorporates randomness within ztest.

Either parameter can be passed to ztest explicitly if the defaults are
not satisfactory.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12411
2021-07-22 15:29:27 -06:00
Alexander Motin 46197dc858 FreeBSD: Ignore make_dev_s() errors
Since errors returned by zvol_create_minor_impl() are ignored by the
common code, it is more convenient to ignore make_dev_s() errors there.
It allows, for example, to get device created for the zvol after later
rename instead of having it further stuck in half-created state.
zvol_rename_minor() already ignores those errors.

While there, switch from MAXPHYS to maxphys in FreeBSD 13+.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12375
2021-07-22 10:22:14 -06:00
Jorgen Lundman c14ad80fcb Remove old orig_fd variable from zfs send
Possibly required in the past, but is currently fills no purpose.
Ordinarily such tiny cleanup is not generally worth it, however
on the macOS port, in a future commit, we do unspeakable things to the
"fd" for send/recv, and it would be easier to only have to deal with
one "fd" instead of two.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12404
2021-07-21 20:22:27 -06:00
Alexander Motin 1b50749ce9 Optimize allocation throttling
Remove mc_lock use from metaslab_class_throttle_*().  The math there
is based on refcounts and so atomic, so the only race possible there
is between zfs_refcount_count() and zfs_refcount_add().  But in most
cases metaslab_class_throttle_reserve() is called with the allocator
lock held, which covers the race.  In cases where the lock is not
held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we
do not use zfs_refcount_count().  And even if we assume some other
non-existing scenario, the worst that may happen from this race is
few more I/Os get to allocation earlier, that is not a problem.

Move locks and data of different allocators into different cache
lines to avoid false sharing.  Group spa_alloc_* arrays together
into single array of aligned struct spa_alloc spa_allocs.  Align
struct metaslab_class_allocator.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12314
2021-07-21 06:40:36 -06:00
George Melikov bc93935ef0 CI: generate ABI files if changed
So commit author can just download them as
artifacts and commit.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12379
2021-07-20 16:21:00 -06:00
Kevin Jin a7bd20e309 Add Module Parameter Regarding Log Size Limit
* Add Module Parameters Regarding Log Size Limit

zfs_wrlog_data_max
The upper limit of TX_WRITE log data. Once it is reached,
write operation is blocked, until log data is cleared out
after txg sync. It only counts TX_WRITE log with WR_COPIED
or WR_NEED_COPY.

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes #12284
2021-07-20 09:40:24 -06:00
Alexander Motin 8172df643b Minor ARC optimizations
Remove unneeded global, practically constant, state pointer variables
(arc_anon, arc_mru, etc.), replacing them with macros of real state
variables addresses (&ARC_anon, &ARC_mru, etc.). 

Change ARC_EVICT_ALL from -1ULL to UINT64_MAX, not requiring special
handling in inner loop of ARC reclamation.  Respectively change bytes
argument of arc_evict_state() from int64_t to uint64_t.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12348
2021-07-20 08:13:21 -06:00
Jorgen Lundman e04210035e dmu_redact.c does not call bqueue_destroy
Ensure all calls to bqueue_init() has a corresponding call to bqueue_destroy()

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12118
2021-07-20 08:08:45 -06:00
Alexander 23c13c7e80 A few fixes of callback typecasting (for the upcoming ClangCFI)
* zio: avoid callback typecasting
* zil: avoid zil_itxg_clean() callback typecasting
* zpl: decouple zpl_readpage() into two separate callbacks
* nvpair: explicitly declare callbacks for xdr_array()
* linux/zfs_nvops: don't use external iput() as a callback
* zcp_synctask: don't use fnvlist_free() as a callback
* zvol: don't use ops->zv_free() as a callback for taskq_dispatch()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #12260
2021-07-20 08:03:33 -06:00
Ryan Moeller 65b9293641 Use SET_ERROR for more errors in FreeBSD vnops
We should use SET_ERROR when we first get an error.

Add it in the FreeBSD xattr implementations where missing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12356
2021-07-19 10:52:50 -06:00
Ryan Moeller de12cd2511 Remove unused fields from zvol_task_t
We don't use or need the pool name or value source in the zvol tasks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12361
2021-07-19 10:02:35 -06:00
Alexander Motin eecceeae9f FreeBSD: Switch from MAXPHYS to maxphys on FreeBSD 13+
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12378
2021-07-19 09:56:58 -06:00
Kevin Bowling ca14e08cbf Detect HAVE_LARGE_STACKS at compile time
Move HAVE_LARGE_STACKS definitions to header and set when appropriate.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Kevin Bowling <kbowling@FreeBSD.org>
Closes #12350
2021-07-16 14:28:55 -06:00
George Melikov b17b19943e zpool_influxdb: fix -Werror=stringop-truncation
Use strlcpy instead of problematic strncpy

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #12344
2021-07-16 14:04:00 -06:00
Rich Ercolani b7ec530233 Correct zfs-send(8) on readonly sends
zfs-send(8) claimed in the flags list you could use -pR when sending
a readonly filesystem or volume. You cannot.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12336
2021-07-16 13:58:01 -06:00
Alexander Motin c1b5869bab Introduce dsl_dir_diduse_transfer_space()
Most of dsl_dir_diduse_space() and dsl_dir_transfer_space() CPU time
is a dd_lock overhead and time spent in dmu_buf_will_dirty(). Calling
them one after another is a waste of time and even more contention.
Doing that twice for each rewritten block within dbuf_write_done()
via dsl_dataset_block_kill() and dsl_dataset_block_born() created one
of the biggest CPU overheads in case of small blocks rewrite.

dsl_dir_diduse_transfer_space() combines functionality of these two
functions for cases where it is needed, but without double overhead,
practically for the cost of dsl_dir_diduse_space() or even cheaper.

While there, optimize dsl_dir_phys() calls in dsl_dir_diduse_space()
and dsl_dir_transfer_space().  It seems Clang detects some aliasing
there, repeating dd->dd_dbuf->db_data dereference multiple times,
increasing dd_lock scope and contention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Author: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12300
2021-07-16 13:39:24 -06:00
Jorgen Lundman 41eba77061 pass handle to do_unmount()
The same change has already been done for domount(). On macOS platform
we need to have access to zhp to handle devdisks and snapshots.
Also, symmetry is pleasing.

In addition, the code in zpool_disable_datasets which sorts the
mountpoints did not sort the related handle, which meant that the
mountpoint, and the handle that it is paired with, was lost.
You'd get a random handle with the mountpoint.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12296
2021-07-15 12:31:00 -06:00
наб d9f0f1582c config/libatomic: require -latomic iff atomic.c doesn't link w/o it
In absence of LTO, and dynamic libatomic, la.so ends up in the needs
section of every toolchain executable; some consider this an issue.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12345 
Closes #12359
2021-07-13 13:50:48 -07:00
Rich Ercolani 1325434b2d Tinker with slop space accounting with dedup
* Tinker with slop space accounting with dedup

Do not include the deduplicated space usage in the slop space
reservation, it leads to surprising outcomes.

* Update spa_dedup_dspace sometimes

Sometimes, we get into spa_get_slop_space() with
spa_dedup_dspace=~0ULL, AKA "unset", while spa_dspace is correctly set.

So call the code to update it before we use it if we hit that case.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12271
2021-07-13 09:47:57 -06:00
Alexander Motin f7de776da2 Fix ARC ghost states eviction accounting
arc_evict_hdr() returns number of evicted bytes in scope of specific
state.  For ghost states it does not mean the amount of really freed
memory, but the logical buffer size.  It is correct for the eviction
process, but not for waking up threads waiting for ARC size reduction,
as added in "Revise ARC shrinker algorithm" commit, causing premature
wakeups while ARC is still overflowed, allowing even bigger overflow,
plus processing overhead when next allocation will also get blocked,
probably also for too short time.

To fix that make arc_evict_hdr() also return the amount of really
freed memory, which for the ghost states is only the header, and use
it to update arc_evict_count instead.  Originally I was thinking to
not return it at all, since arc_get_data_impl() does not account for
the headers, but decided that some slow allocation progress is better
than long waits, reaching on my tests up to 100ms.

To reduce negative latency effects of long time periods when reclaim
thread can free little real memory, start reclamation process earlier,
before we actually reached the overflow threshold, when we have to
throttle new allocations.  We can also do it without taking global
arc_evict_lock, reducing the contention.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12279
2021-07-13 09:41:59 -06:00
Brian Behlendorf 07a4c76e90 Update bug report template
- Remove the "SPL Version" line, the repositories have been merged
  since the 0.8 release and we no longer need to ask about this.

- Simply ask for the kernel version / patch level and add a hint
  about how to get this information on Linux and FreeBSD.

- Remove "Status: Triage Needed" from the template, in practice
  we really haven't been using this label so let's step setting it.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #12340
2021-07-12 14:05:50 -06:00
George Wilson 958826be7a file reference counts can get corrupted
Callers of zfs_file_get and zfs_file_put can corrupt the reference
counts for the file structure resulting in a panic or a soft lockup.
When zfs send/recv runs, it will add a reference count to the
open file, and begin to send or recv the stream. If the file descriptor
is closed, then when dmu_recv_stream() or dmu_send() return we will
call zfs_file_put to remove the reference we placed on the file
structure. Unfortunately, because zfs_file_put() uses the file
descriptor to lookup the file structure, it may end up finding that
the file descriptor table no longer contains the file struct, thus
leaking the file structure. Or it might end up finding a file
descriptor for a different file and blindly updating its reference
counts. Other failure modes probably exists.

This change reworks the zfs_file_[get|put] interface to not rely
on the file descriptor but instead pass the zfs_file_t pointer around.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-issue: DLPX-76119
Closes #12299
2021-07-10 19:00:37 -06:00
Jorgen Lundman 03dba7ae31 dprintf_dnode: strcpy -> strlcpy
Missed a couple of strcpy() in earlier commit, this is only used with
--enable-debug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12311
2021-07-07 21:13:40 -06:00
Jorgen Lundman d2119d0e6f Replace strchrnul() with strrchr()
Could have gone either way with this one, either adding it to
macOS/Windows SPL, or returning it to "classic" usage with strrchr().
Since the new special way isn't really used, and only used once,
we have this commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes  #12312
2021-07-07 21:08:13 -06:00
Alexander Motin eb5983e1b7 FreeBSD: Use unmapped I/O for scattered/gang ABD buffers
Many FreeBSD disk drivers support "unmapped" I/O mode, when data
buffer represented not with a virtually contiguous KVA-mapped address
range, but with a list of physical memory pages.  Originally it was
designed to do I/O from buffers without KVA mapping (unmapped).  But
moving virtual addresses out of equation allows us to operate even
non-contiguous data buffers with one condition: all buffer discon-
tinuities must be aligned to memory page borders.

Doing I/O to capable GEOM device this patch traverses through non-
linear ABD buffers, validating the chunks borders.  If the condition
is met, it supplies GEOM with the list of original physical memory
pages instead of copying the data into temporary contiguous buffer.
On capable hardware on pools with ashift=12 and default ABD chunk of
4KB it should handle all the I/O without additional memory copying.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12320
2021-07-07 16:39:00 -07:00
Alexander Motin bdd11cbb90 FreeBSD: Hardcode abd_chunk_size to PAGE_SIZE
It makes no sense to set it below PAGE_SIZE, since it increases all
overheads and makes returning memory to OS problematic.  It makes no
sense to set it above PAGE_SIZE, since such allocations and especially
frees are too expensive and cause KVA fragmentation to benefit from
fewer chunks.  After that it makes no sense to keep more complicated
math here.

What may have sense though is just a tunable border between linear and
scatter ABDs, previously also controlled by this tunable.  Retain that
functionality by taking abd_scatter_min_size tunable from Linux, just
with different default value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12328
2021-07-06 17:39:23 -07:00
Alexander Motin 97752ba22a Move gethrtime() calls out of vdev queue lock
This dramatically reduces the lock contention on systems with slower
(non-TSC) timecounters.  With TSC the difference is minimal, but since
this lock is pretty congested, any improvement counts.  Plus I don't
see any reason to do it under the lock other than the latency of the
lock itself, which this change actually reduces.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12281
2021-07-06 14:38:00 -07:00
Justin Gottula 6e4e3c3ab6 Udev rules: remove zvol compat symlinks (without the leading zvol/)
This is a potentially arguable change, because it removes some
compatibility cruft that certain systems or people may have come to rely
on (either a very long time ago, or unwisely in recent times).

On the other hand, it's been literally over a decade since OpenZFS
switched to the strategy of using opaque numbered /dev/zd* device nodes,
with the canonical zvol access path being a directory tree of symlinks
created by udev rules inside /dev/zvol/*. (See #102.) Even at the time,
the /dev/* scheme was labeled as being for "compatibility".

This commit removes the second tree of symlinks located directly at
/dev/*, under the assumption that anybody with any sense has been using
the intended /dev/zvol/* path for a very very long time now.

(The more I think about this, the more I anticipate that some large
fraction of people will have been blissfully unaware that the intention
has been for them to use the /dev/zvol/* tree all along, and they will
have come to rely upon the /dev/* tree simply because it's been there
this whole time despite being a compat thing.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12303
2021-07-06 13:41:17 -07:00
Justin Gottula f24c7c359e Use substantially more robust program exit status logic in zvol_id
Currently, there are several places in zvol_id where the program logic
returns particular errno values, or even particular ioctl return values,
as the program exit status, rather than a straightforward system of
explicit zero on success and explicit nonzero value(s) on failure.

This is problematic for multiple reasons. One particularly interesting
problem that can arise, is that if any of these values happens to have
all 8 least significant bits unset (i.e., it is a positive or negative
multiple of 256), then although the C program sees a nonzero int value
(presumed to be a failure exit status), the actual exit status as seen
by the system is only the bottom 8 bits of that integer: zero.

This can happen in practice, and I have encountered it myself. In a
particularly weird situation, the zvol_open code in the zfs kernel
module was behaving in such a manner that it caused the open() syscall
to fail and for errno to be set to a kernel-private value (ERESTARTSYS,
which happens to be defined as 512). It turns out that 512 is evenly
divisible by 256; or, in other words, its least significant 8 bits are
all-zero. So even though zvol_id believed it was returning a nonzero
(failure) exit status of 512, the system modulo'd that value by 256,
resulting in the actual exit status visible by other programs being 0!
This actually-zero (non-failure) exit status caused problems: udev
believed that the program was operating successfully, when in fact it
was attempting to indicate failure via a nonzero exit status integer.
Combined with another problem, this led to the creation of nonsense
symlinks for zvol dev nodes by udev.

Let's get rid of all this problematic logic, and simply return
EXIT_SUCCESS (0) is everything went fine, and EXIT_FAILURE (1) if
anything went wrong.

Additionally, let's clarify some of the variable names (error is similar
to errno, etc) and clean up the overall program flow a bit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:10:36 -07:00
Justin Gottula b19e2bdfb5 Print zvol_id error messages to stderr rather than stdout
The zvol_id program is invoked by udev, via a PROGRAM key in the
60-zvol.rules.in rule file, to determine the "pretty" /dev/zvol/*
symlink paths paths that should be generated for each opaquely named
/dev/zd* dev node.

The udev rule uses the PROGRAM key, followed by a SYMLINK+= assignment
containing the %c substitution, to collect the program's stdout and then
"paste" it directly into the name of the symlink(s) to be created.

Unfortunately, as currently written, zvol_id outputs both its intended
output (a single string representing the symlink path that should be
created to refer to the name of the dataset whose /dev/zd* path is
given) AND its error messages (if any) to stdout.

When processing PROGRAM keys (and others, such as IMPORT{program}), udev
uses only the data written to stdout for functional purposes. Any data
written to stderr is used solely for the purposes of logging (if udev's
log_level is set to debug).

The unintended consequence of this is as follows: if zvol_id encounters
an error condition; and then udev fails to halt processing of the
current rule (either because zvol_id didn't return a nonzero exit
status, or because the PROGRAM key in the rule wasn't written properly
to result in a "non-match" condition that would stop the current rule on
a nonzero exit); then udev will create a space-delimited list of symlink
names derived directly from the words of the error message string!

I've observed this exact behavior on my own system, in a situation where
the open() syscall on /dev/zd* dev nodes was failing sporadically (for
reasons that aren't especially relevant here). Because the open() call
failed, zvol_id printed "Unable to open device file: /dev/zd736\n" to
stdout and then exited.

The udev rule finished with SYMLINK+="zvol/%c %c". Assuming a volume
name like pool/foo/bar, this would ordinarily expand to
   SYMLINK+="zvol/pool/foo/bar pool/foo/bar"
and would cause symlinks to be created like this:
   /dev/zvol/pool/foo/bar -> /dev/zd736
   /dev/pool/foo/bar      -> /dev/zd736

But because of the combination of error messages being printed to
stdout, and the udev syntax freely accepting a space-delimited sequence
of names in this context, the error message string
   "Unable to open device file: /dev/zd736\n"
in reality expanded to
   SYMLINK+="zvol/Unable to open device file: /dev/zd736"
which caused the following symlinks to actually be created:
   /dev/zvol/Unable -> /dev/zd736
   /dev/to          -> /dev/zd736
   /dev/open        -> /dev/zd736
   /dev/device      -> /dev/zd736
   /dev/file:       -> /dev/zd736
   /dev//dev/zd736  -> /dev/zd736

(And, because multiple zvols had open() syscall errors, multiple zvols
attempted to claim several of those symlink names, resulting in numerous
udev errors and timeouts and general chaos.)

This commit rectifies all this silliness by simply printing error
messages to stderr, as Dennis Ritchie originally intended.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:10:06 -07:00
Justin Gottula f1aef8d4df Udev rules: use match (==) rather than assign (=) for PROGRAM
Assignment syntax (=) can be used for the PROGRAM key. But the PROGRAM
key is really a match key, not an assign key. The internal logic used by
udev to decide whether a PROGRAM key "matched" or not (which determines
whether the remainder of the rule is evaluated) depends on whether the
operator was OP_MATCH (==) or OP_NOMATCH (!=). [1]

The man page claims that '"=", ":=", and "+=" have the same effect as
"=="' for PROGRAM keys. And, after a brief perusal, the udev source code
does seem to confirm that operators other than OP_MATCH (==) or
OP_NOMATCH (!=) are implicitly converted to OP_MATCH (==). [2] But it's
not entirely clear that this is definitely the case: anecdotal testing
seems to indicate that when OP_ASSIGN (=) is used, the program's exit
status is disregarded and the remainder of the rule is processed
regardless of whether it was, in fact, a successful exit.

The bottom line here is that, if zvol_id hits some snag and returns a
nonzero exit status, then we almost certainly do NOT want to continue on
with the rule and use whatever the stdout contents may have been to
mindlessly create /dev/zvol/* symlinks. Therefore, let's be extra-sure
and use the match (==) operator explicitly, to eliminate any possibility
that udev might do the wrong thing, and ensure that a nonzero exit
status will definitely short-circuit the rest of the rule, bypassing the
SYMLINK+= assignments.

[1]
udev,
 file src/udev/udev-rules.c,
  func udev_rule_apply_token_to_event,
   switch case TK_M_PROGRAM if r != 0 (nonzero exit status):
      return token->op == OP_NOMATCH;
   switch case TK_M_PROGRAM if r == 0 (zero exit status):
      return token->op == OP_MATCH;
   func retval 0 => key is considered to have matched
   func retval 1 => key is considered to have NOT matched

[2]
udev,
 file src/udev/udev-rules.c,
  func parse_token,
   at func start:
      bool is_match = IN_SET(op, OP_MATCH, OP_NOMATCH);
   in else-if case streq(key, "PROGRAM"):
      if (!is_match) op = OP_MATCH;

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:09:39 -07:00
Justin Gottula 17c794e7b0 Udev rules: replace deprecated $tempnode with $devnode
The $tempnode substitution is so old that it's not even mentioned in the
man page anymore. It is still technically supported by udev, but with
plenty of "deprecated" comments surrounding it.

The preferred modern equivalent of $tempnode is $devnode (or
alternatively, %N).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:09:09 -07:00
Justin Gottula a5398f8782 Udev rules: use non-ancient comma syntax
This file is old as dirt. It's entirely possible that commas were
optional in udev back at that time. But they're definitely supposed to
be there nowadays.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Justin Gottula <justin@jgottula.com>
Closes #12302
2021-07-02 13:08:42 -07:00
Alexander Motin b192a2c0a1 Remove avl_size field from struct avl_tree
This field is used only by illumos mdb.  On other platforms it only
increases the struct size from 32 to 40 bytes.  For struct vdev_queue
including 13 instances of avl_tree_t size means active cache lines.

Keep the padding in user-space for now to not break the ABI.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12290
2021-07-01 09:32:31 -06:00
Alexander Motin 490c845efe Compact dbuf/buf hashes and lock arrays
With default dbuf cache size of 1/32 of ARC, it makes no sense to have
hash table of the same size (or even bigger on Linux).  Reduce it to
1/8 of ARC's one, still leaving some slack, assuming higher I/O rate
via dbuf cache than via ARC.

Remove padding from ARC hash locks array.  The idea behind padding
is to avoid false sharing between locks.  It would have sense if
there would be a limited number of very busy locks.  But since we
have no limit on the number, using the same memory for more locks we
can achieve even lower lock contention with the same false sharing,
or we can use less memory for the same contention level.

Reduce number of hash locks from 8192 to 2048.  The number is still
big enough to not cause contention, but reduced memory size improves
cache hit rate for mutex_tryenter() in ARC eviction thread, saving
about 1% of the thread time.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12289
2021-07-01 09:30:31 -06:00
Jorgen Lundman c6d1112bf4 Fix abd leak, kmem_free correct size of abd_t
Fix a leak of abd_t that manifested mostly when using
raidzN with at least as many columns as N (e.g. a
four-disk raidz2 but not a three-disk raidz2).
Sufficiently heavy raidz use would eventually run a system
out of memory.

Additionally:

* Switch abd_cache arena to FIRSTFIT, which empirically
improves perofrmance.

* Make abd_chunk_cache more performant and debuggable.

* Allocate the abd_zero_buf from abd_chunk_cache rather
than the heap.

* Don't try to reap non-existent qcaches in abd_cache arena.

* KM_PUSHPAGE->KM_SLEEP when allocating chunks from their
own arena

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Co-authored-by: Sean Doran <smd@use.net>
Closes #12295
2021-07-01 09:28:15 -06:00
Jorgen Lundman eca174527e Upstream: dmu_zfetch_stream_fini leaks refcount
dmu_zfetch_stream_fini() is missing calls to destroy the refcounts,
leaking them and the mutex inside.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #12294
2021-07-01 09:22:16 -06:00
Kevin Jin 50e09eddd0 Optimize txg_kick() process (#12274)
Use dp_dirty_pertxg[] for txg_kick(), instead of dp_dirty_total in
original code. Extra parameter "txg" is added for txg_kick(), thus it
knows which txg to kick. Also txg_kick() call is moved from
dsl_pool_need_dirty_delay() to dsl_pool_dirty_space() so that we can
know the txg number assigned for txg_kick().

Some unnecessary code regarding dp_dirty_total in txg_sync_thread() is
also cleaned up.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: jxdking <lostking2008@hotmail.com>
Closes #12274
2021-07-01 09:20:27 -06:00
Alexander Motin 42afb12da7 Remove refcount from spa_config_*()
The only reason for spa_config_*() to use refcount instead of simple
non-atomic (thanks to scl_lock) variable for scl_count is tracking,
hard disabled for the last 8 years.  Switch to simple int scl_count
reduces the lock hold time by avoiding atomic, plus makes structure
fit into single cache line, reducing the locks contention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12287
2021-07-01 09:16:54 -06:00
Ryan Moeller cfc564f9b1 ZED: Match added disk by pool/vdev GUID if found (#12217)
This enables ZED to auto-online vdevs that are not wholedisk managed by
ZFS.

Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2021-06-30 07:37:20 -07:00
Brian Behlendorf 4694131a0a Linux 5.13 compat: META
Increase the Linux-Maximum version in the META file to 5.13.
All of the required compatibility patches have been merged
and the 5.13 kernel has been officially released.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-06-29 13:16:38 -07:00
Laurențiu Nicola 3482c2b79a zed: fix sending emails (#12292)
Commit 6fc3099 broke the quoting when invoking the mail program, revert
that change.

Signed-off-by: Laurențiu Nicola <lnicola@dend.ro>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2021-06-29 12:33:49 -07:00
Alexander 6a19dea7f6 module/zfs: simplify ddt_stat_add() loop
LLVM's Polly (ISL to be precise) is unhappy with the loop from
ddt_stat_add():

  CC [M]  fs/zfs/zfs/ddt.o
../lib/External/isl/isl_schedule_node.c:2470: cannot insert node
between set or sequence node and its filter children

(building with the custom patch which adds Polly support to Kbuild)

The mentioned loop is rather suboptimal. All that we need is to just
treat ddt_stat_t as an array of u64 and perform 1:1 addition or
substraction. This can be done in simpler for-loop with the
determined index and bounds. Compiler will expand d_end - d into
a number of ddt_stat_t fields at compile time.
This prevents Polly from failing on this file.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #12253
2021-06-29 08:26:11 -06:00
Alexander Motin 5b7053a9a5 Avoid 64bit division in multilist index functions
The number of sublists in a multilist is relatively small. We dont need
64 bits to calculate an index. 32 bits is sufficient and makes the
code more efficient.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> 
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12288
2021-06-29 06:59:14 -06:00
Michal Vasilek f20fb199e6 Fix plymouth passphrase prompt with dracut
plymouth --command splits the command on spaces which means
that zfs-load-key was getting the filesystem name enclosed
in single quotes (since 13c59bb76) and failing. This commit
fixes it by piping the password directly to the command
similar to how it's done in other scripts (initramfs,
dracut without plymouth).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Michal Vasilek <michal@vasilek.cz>
Related-to: #9193
Related-to: #9202
Closes #12147
2021-06-25 22:43:25 -07:00
Rich Ercolani 2084e9f748 Fix build with KASAN
The stock zstd code expects some helpers from ASAN if present.
This works fine in userland, but in kernel, KASAN also gets detected,
and lacks those helpers. So let's make some empty substitutes for
that case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12232
2021-06-25 22:28:12 -07:00
Alexander Motin 5e2c8338bf Help compiller optimize out abd_verify()
While abd_verify() does nothing when built without debug, compiler
can't optimize it out by itself due to calls to external list_*()
and abd_verify_scatter().  This commit makes it explicit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12280
2021-06-25 16:38:31 -07:00
Martin Matuška 14d2841b53 FreeBSD: fix compilation of FreeBSD world after 29274c9f6
prng32_bounded() is available to kernel only on FreeBSD 13+.

Call inline random_get_pseudo_bytes() with correct pointer type.
To be consistent, apply to Linux as well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #12282
2021-06-25 10:28:51 -07:00
Brian Behlendorf 88a4833039 Update cache file when setting compatibility property
Unlike most other properties the 'compatibility' property is stored
in the pool config object and not the DMU_OT_POOL_PROPS object.

This had the advantage that the compatibility information is available
without needing to fully import the pool (it can be read with zdb).
However, this means we need to make sure to update both the copy of
the config in the MOS and the cache file.  This wasn't being done.

This commit adds a call to spa_async_request() to ensure the copy of
the config in the cache file gets updated as well as the one stored
in the pool.  This same change is made for the 'comment' property
which suffers from the same inconsistency.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Colm Buckley <colm@tuatha.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12261 
Closes #12276
2021-06-24 14:30:02 -07:00
Paul Dagnelie 8f11b1d26e Fix flag copying in resume case
A couple flags weren't being copied in the case where we're doing size
estimation on a resume.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes: #12266
2021-06-24 13:42:01 -06:00
jumbi77 86f5e0bbce zfs_metaslab_mem_limit should be 25 instead of 75
According to current zfs man page zfs_metaslab_mem_limit should be
25 instead of 75.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: jumbi77@users.noreply.github.com
Closes #12273
2021-06-24 10:02:54 -07:00
Rich Ercolani 126615303d Stop using "zstreamdump" in tests/
zstreamdump was replaced with "zstream dump"; let's stop using the
old name, compat symlink or no.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12277
2021-06-24 09:38:33 -07:00
Jonathon 942121cfcb Update libera webchat client URL
Libera have made a webchat client available. This change builds on #12127.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev>
Closes #12251
2021-06-23 17:14:58 -07:00
Attila Fülöp 1b610ae45f gcc 11 cleanup
Compiling with gcc 11.1.0 produces three new warnings.
Change the code slightly to avoid them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #12130
Closes #12188
Closes #12237
2021-06-23 17:57:06 -06:00
Brian Behlendorf 63f4b959a6 ZTS: Add known exceptions
The receive-o-x_props_override test case reliably fails on the
FreeBSD main builders (but not on Linux), until the root cause is
understood add this test to the FreeBSD exception list.

On Linux the alloc_class_012_pos test case may occasionally fail.
This is a known false positive which has also been added to the
Linux exception list until the test can be made entirely reliable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12272
2021-06-23 15:53:13 -07:00
Rich Ercolani 8e739b2c9f Annotated dprintf as printf-like
ZFS loves using %llu for uint64_t, but that requires a cast to not 
be noisy - which is even done in many, though not all, places.
Also a couple places used %u for uint64_t, which were promoted
to %llu. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12233
2021-06-22 21:53:45 -07:00
Antonio Russo a81b812495 Revert Consolidate arc_buf allocation checks
This reverts commit 13fac09868.

Per the discussion in #11531, the reverted commit---which intended only
to be a cleanup commit---introduced a subtle, unintended change in
behavior.

Care was taken to partially revert and then reapply 10b3c7f5e4
which would otherwise have caused a conflict.  These changes were
squashed in to this commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Suggested-by: @chrisrd
Suggested-by: robn@despairlabs.com
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11531 
Closes #12227
2021-06-22 21:39:15 -07:00
Alexander Motin 29274c9f6d Optimize small random numbers generation
In all places except two spa_get_random() is used for small values,
and the consumers do not require well seeded high quality values.
Switch those two exceptions directly to random_get_pseudo_bytes()
and optimize spa_get_random(), renaming it to random_in_range(),
since it is not related to SPA or ZFS in general.

On FreeBSD directly map random_in_range() to new prng32_bounded() KPI
added in FreeBSD 13.  On Linux and in user-space just reduce the type
used to uint32_t to avoid more expensive 64bit division.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12183
2021-06-22 17:35:23 -06:00
Brian Behlendorf ba91311561 Linux 5.12 compat: META
Increase the Linux-Maximum version in the META file to 5.12.
All of the required compatibility patches have been merged.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-06-21 08:22:40 -07:00
Alexander Motin c4c162c1e8 Use wmsum for arc, abd, dbuf and zfetch statistics. (#12172)
wmsum was designed exactly for cases like these with many updates
and rare reads.  It allows to completely avoid atomic operations on
congested global variables.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12172
2021-06-16 18:19:34 -06:00
George Amanakis 9ffcaa370a Avoid deadlock when removing L2ARC devices under I/O
In case we have I/O and try to remove an L2ARC device a deadlock might
occur. arc_read()->zio_read()->zfs_blkptr_verify() waits for SCL_VDEV
to be dropped while holding the hash_lock. However, spa_l2cache_load()
holds SCL_ALL and waits for the hash_lock in l2arc_evict().

Fix this by moving zfs_blkptr_verify() to the top top arc_read() before
the hash_lock is taken. Verify the block pointer and return a checksum
error if damaged rather than halting the system, by using
BLK_VERIFY_LOG instead of BLK_VERIFY_HALT.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12054
2021-06-16 18:17:42 -06:00
Rich Ercolani ff31750405 Fix importing with symlinks
It turns out that symlinks are heavily used on Linux in /dev/disk.
So let's allow importing from them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12238
2021-06-14 09:59:54 -07:00
наб 83ba91adf6 systemd: import: expand $ZPOOL_IMPORT_OPTS correctly
Turns out $ZPOOL_IMPORT_OPTS expands in a shell-like fashion,
yielding 'import' '-aN' '-o' 'cachefile=none' for an unset variable,
and 'import' '-aN' '-o' 'cachefile=none' 'word1' 'word2' for a
white-spaced one, but ${ZPOOL_IMPORT_OPTS} expands like "${Z_I_O}"
would in a shell, yielding 'import' '-aN' '-o' 'cachefile=none' ''
(empty) and 'import' '-aN' '-o' 'cachefile=none' 'word1 word2' (spaced)

Fixes eec5ba113e "dracut: 90zfs: respect
zfs_force=1 on systemd systems"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes: #12231
2021-06-14 09:48:53 -06:00
Matthew Ahrens 069bf406b4 vdev_draid_min_asize() ignores reserved space
vdev_draid_min_asize() returns the minimum size of a child vdev.  This
is used when determining if a disk is big enough to replace a child.
It's also used by zdb to determine how big of a child to make to test
replacement.

vdev_draid_min_asize() says that the child’s asize has to be at least
1/Nth of the entire draid’s asize, which is the same logic as raidz.
However, this contradicts the code in vdev_draid_open(), which
calculates the draid’s asize based on a reduced child size:

  An additional 32MB of scratch space is reserved at the end of each
  child for use by the dRAID expansion feature

So the problem is that you can replace a draid disk with one that’s
vdev_draid_min_asize(), but it actually needs to be larger to accommodate
the additional 32MB.  The replacement is allowed and everything works at
first (since the reserved space is at the end, and we don’t try to use
it yet), but when you try to close and reopen the pool,
vdev_draid_open() calculates a smaller asize for the draid, because of
the smaller leaf, which is not allowed.

I think the confusion is that vdev_draid_min_asize() is correctly
returning the amount of required *allocatable* space in a leaf, but the
actual *size* of the leaf needs to be at least 32MB more than that.
ztest_vdev_attach_detach() assumes that it can attach that size of
device, and it actually can (the kernel/libzpool accepts it), but it
then later causes zdb to not be able to open the pool.

This commit changes vdev_draid_min_asize() to return the required size
of the leaf, not the size that draid will make available to the metaslab
allocator.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11459
Closes #12221
2021-06-13 10:48:53 -07:00
Paul Zuchowski afa7b34845 Do not hash unlinked inodes
In zfs_znode_alloc we always hash inodes.  If the
znode is unlinked, we do not need to hash it.  This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9741
Closes #11223
Closes #11648
Closes #12210
2021-06-11 17:00:33 -07:00
наб 10bcc4da6c scripts/commitcheck.sh: fix false positive for signed commits
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12105
2021-06-11 09:10:25 -07:00
наб feb04e6680 Forbid basename(3) and dirname(3)
There are at least two interpretations of basename(3),
in addition to both functions being allowed to /both/ return a static
buffer (unsuitable in multi-threaded environments) /and/ raze the input
(which encourages overallocations, at best)

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12105
2021-06-11 09:10:21 -07:00
наб 64dfdaba37 libzutil: import: filter out unsuitable files earlier
Only accept the right type of file, if available, and reject too-small
files before opening them on Linux

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12105
2021-06-11 09:10:13 -07:00
наб bf80fb53f5 linux/libzutil: zpool_open_func: don't dup name, extract untouchables
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12105
2021-06-11 09:10:09 -07:00
наб 0854d4c186 libzutil: add zfs_{base,dir}name()
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12105
2021-06-11 09:10:05 -07:00
наб 3aa81a6635 linux/libzutil: use ARRAY_SIZE instead of constant for search paths
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12105
2021-06-11 09:10:00 -07:00
Rich Ercolani 1a345d645a Added uncompress requirement
Having an old enough version of "file" and no "uncompress" program
installed can cause rpmbuild as root to crash and mangle rpmdb.

So let's add a build dependency for RPM-based systems.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12071
Closes: #12168
2021-06-11 09:38:23 -06:00
Brian Behlendorf 9d639d8799 ZTS: Add zfs_clone_livelist_dedup.ksh to Makefile.am
Commit 86b5f4c12 added a new zfs_clone_livelist_dedup.ksh test case
but didn't include it in the Makefile.am.  This results in the test
not being included in the dist tarball so it's never run by the CI.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #12224
2021-06-11 09:21:36 -06:00
Alexander Motin ffdf019cb3 Re-embed multilist_t storage
This commit partially reverts changes to multilists in PR 7968
(multi-threaded spa-sync()) and adds some cache line alignments to
separate read-only multilists and heavily modified refcount's to different
cache lines.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-by: iXsystems, Inc.
Closes #12158
2021-06-10 10:42:31 -06:00
наб eec5ba113e dracut: 90zfs: respect zfs_force=1 on systemd systems
On systemd systems provide an environment generator in order
to respect the zfs_force=1 kernel command line option.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11403
Closes #12195
2021-06-10 09:26:37 -07:00
Alexander Motin 371f88d96f Remove pool io kstats (#12212)
This mostly reverts "3537 want pool io kstats" commit of 8 years ago.

From one side this code using pool-wide locks became pretty bad for
performance, creating significant lock contention in I/O pipeline.
From another, there are more efficient ways now to obtain detailed
statistics, while this statistics is illumos-specific and much less
usable on Linux and FreeBSD, reported only via procfs/sysctls.

This commit does not remove KSTAT_TYPE_IO implementation, that may
be removed later together with already unused KSTAT_TYPE_INTR and
KSTAT_TYPE_TIMER.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12212
2021-06-10 08:27:33 -07:00
Rich Ercolani 860051f1d1 Added error for writing to /dev/ on Linux
Starting in Linux 5.10, trying to write to /dev/{null,zero} errors out.
Prefer to inform people when this happens rather than hoping they guess
what's wrong.

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes:  #11991
2021-06-09 18:57:57 -06:00
наб 327c904615 lib{efi,avl,share,tpool,zfs_core,zfsbootenv,zutil}: -fvisibility=hidden
No symbols affected in libavl
No symbols affected by libtpool, but pre-ANSI declarations got purged
No symbols affected by libzfs_core
No symbols affected by libzfs_bootenv

libefi got cleaned, gained efi_debug documentation in efi_partition.h,
and removes one undocumented and unused symbol from libzfs_core:
  D default_vtoc_map

libnvpair saw removal of these symbols:
  D nv_alloc_nosleep_def
  D nv_alloc_sleep
  D nv_alloc_sleep_def
  D nv_fixed_ops_def
  D nvlist_hashtable_init_size
  D nvpair_max_recursion

libshare saw removal of these symbols from libzfs:
  T libshare_nfs_init
  T libshare_smb_init
  T register_fstype
  B smb_shares

libzutil saw removal of these internal symbols from libzfs_core:
  T label_paths
  T slice_cache_compare
  T zpool_find_import_blkid
  T zpool_open_func
  T zutil_alloc
  T zutil_strdup

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12191
2021-06-09 17:04:32 -07:00
наб d406a695c6 libefi: remove efi_auto_sense()
It's present (but undocumented) in the illumos gate and used exclusively
by rmformat(1) (which I recommend as a nice blast from the past),
and also the math assumes 512B sectors and is therefore wrong

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12191
2021-06-09 17:03:42 -07:00
наб 4b7ed6a286 zgenhostid.8: revisit
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12212
2021-06-09 14:36:03 -07:00
наб 1b37cc1abe Consistentify miscellaneous style on remaining manpages
Most notably this fixes the vdev_id(8) non-.Xrs in vdev_id.conf.5

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12212
2021-06-09 14:35:53 -07:00
наб 2badb3457a Move properties, parameters, events, and concepts around manual sections
The pages moved as follows:
  zpool-features.{5 => 7}
  spl{-module-parameters.5 => .4}
  zfs{-module-parameters.5 => .4}
  zfs-events.5 => into zpool-events.8
  zfsconcepts.{8 => 7}
  zfsprops.{8 => 7}
  zpoolconcepts.{8 => 7}
  zpoolprops.{8 => 7}

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Co-authored-by: Daniel Ebdrup Jensen <debdrup@FreeBSD.org>
Closes #12149
Closes #12212
2021-06-09 14:35:30 -07:00
наб b0f3e8a6eb man: use one Makefile, use OpenZFS for .Os
The prevailing style is to use either nothing, or the originating
organisational umbrella (here: OpenZFS), and these aren't Linux manpages

This also deduplicates the substitution code, and makes adding/removing
sexions simpler in future

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12212
2021-06-09 14:34:47 -07:00
Brian Behlendorf 88af959b24 Fix minor shellcheck 0.7.2 warnings
The first warning of a misspelling is a false positive, so we annotate
the script accordingly.  As for the x-prefix warnings update the check
to use the conventional '[ -z <string> ]' syntax.

all-syslog.sh:46:47: warning: Possible misspelling: ZEVENT_ZIO_OBJECT
    may not be assigned, but ZEVENT_ZIO_OBJSET is. [SC2153]
make_gitrev.sh:53:6: note: Avoid x-prefix in comparisons as it no
    longer serves a purpose [SC2268]
man-dates.sh:10:7: note: Avoid x-prefix in comparisons as it no
    longer serves a purpose [SC2268]

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12208
2021-06-09 12:21:24 -07:00
Rich Ercolani 08cd071735 Correct a flaw in the Python 3 version checking
It turns out the ax_python_devel.m4 version check assumes that
("3.X+1.0" >= "3.X.0") is True in Python, which is not when X+1
is 10 or above and X is not. (Also presumably X+1=100 and ...)

So let's remake the check to behave consistently, using the
"packaging" or (if absent) the "distlib" modules.

(Also, update the Github workflows to use the new packages.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12073
2021-06-08 18:20:16 -06:00
наб 1fcfc21cd8 zed.d/history_event-zfs-list-cacher.sh.in: parallelise, simplify
This:
  (a) improves the error log message,
  (b) locks per pool instead of globally,
  (c) locks the actual output file instead of /var/lock/zfs-list,
      which would otherwise linger there forever (well, still will,
      but you can remove it and it won't come back), and
  (d) preserves attributes of the output file
      instead of reverting them to 0:0 644

It is imperative that the previous commit
("zed-functions.sh: zed_lock(): don't truncate lock")
be included in any series that contains this one

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-06-08 16:09:39 -07:00
наб e20c9330d7 zed.d/all-debug.sh: simplify
By locking the log file itself, we can omit arduous rebinding and
explicit umask setting, but, perhaps more importantly, avoid permanently
littering /var/lock/ with zed.debug.log.lock we will never delete

It is imperative that the previous commit
("zed-functions.sh: zed_lock(): don't truncate lock")
be included in any series that contains this one

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-06-08 16:09:33 -07:00
наб b828cf1d01 zed-functions.sh: zed_lock(): don't truncate lock
By appending instead of truncating, we can lock on any file (with write
permissions) instead of only dedicated lock files, since the locking
process itself no longer alters the file in any way

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-06-08 16:09:22 -07:00
Alan Somers 75b4cbf625 libzfs: On FreeBSD, use MNT_NOWAIT with getfsstat
`getfsstat(2)` is used to retrieve the list of mounted file systems,
which libzfs uses when fetching properties like mountpoint, atime,
setuid, etc.  The `mode` parameter may be `MNT_NOWAIT`, which uses
information in the VFS's cache, or `MNT_WAIT`, which effectively does a
`statfs` on every single mounted file system in order to fetch the most
up-to-date information.  As far as I can tell, the only fields that
libzfs cares about are the filesystem's name, mountpoint, fstypename,
and mount flags.  Those things are always updated on mount and unmount,
so they will always be accurate in the VFS's mount cache except in two
circumstances:

1) When a file system is busy unmounting
2) When a ZFS file system changes the value of a mount-overridable
   property like atime or setuid, but doesn't remount the file system.
   Right now that only happens when the property is changed by an
   unprivileged user who has delegated authority to change the property
   but not to mount the dataset.  But perhaps libzfs could choose to do
   it for other reasons in the future.

Switching to `MNT_NOWAIT` will greatly improve speed with no downside,
as long as we explicitly update the mount cache whenever we change a
mount-overridable property.

For comparison, Illumos gets this information using the native
`getmntany` and `getmntent` functions, which also use cached
information.  The illumos function that would refresh the cache,
`resetmnttab`, is never called by libzfs.

And on GNU/Linux, `getmntany` and `getmntent` don't even communicate
with the kernel directly.  They simply parse the file they are given,
which is usually /etc/mtab or /proc/mounts.  Perhaps the implementation
of /proc/mounts is synchronous, ala MNT_WAIT; I don't know.

Sponsored-by:	Axcient
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes: #12091
2021-06-08 07:36:43 -06:00
наб 9685f363c3 tests/file_check: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:59:01 -07:00
наб f423411c44 module/zfs: vdev_removal: spa_vdev_remove_thread: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:58:56 -07:00
наб 6bab6ee838 module/zfs: vdev_indirect: vdev_indirect_repair: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:58:51 -07:00
наб f70f3e2fbc module/zfs: dbuf: dbuf_read_impl: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:58:47 -07:00
наб f719d3b160 module/zfs: arc: arc_hdr_realloc_crypt: remove unused variables
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:58:42 -07:00
наб 4ff49c5a06 libzfs: zfs_send: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:58:28 -07:00
наб 4f1009face zdb: zdb_decompress_block: don't needlessly set buf
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:58:23 -07:00
наб 77f3b67520 libzutil: zpool_find_config: remove unused variable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12187
2021-06-07 20:58:10 -07:00
наб 2d815d955e Modernise/fix/rewrite unlinted manpages
zpool-destroy.8: flatten, fix description
zfs-wait.8: flatten, fix description, use list for events
zpool-reguid.8: flatten, fix description
zpool-history.8: flatten, fix description
zpool-export.8: flatten, fix description, remove -f "unmount" reference
  AFAICT no such command exists even in Illumos (as of today, anyway),
  and we definitely don't call it
zpool-labelclear.8: flatten, fix description
zpool-features.5: modernise
spl-module-parameters.5: modernise
zfs-mount-generator.8: rewrite
zfs-module-parameters.5: modernise

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12169
2021-06-07 13:41:54 -06:00
Rich Ercolani afb96fa6ee Force --enable-debug on FreeBSD if INVARIANTS is set
There's already logic to force INVARIANTS on for building if it's
present in the running kernel; however, not having DEBUG enabled
when DEBUG and INVARIANTS are can cause strange panics.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12185
Closes #12163
2021-06-07 13:29:27 -06:00
Serapheim Dimitropoulos 86b5f4c121 Livelist logic should handle dedup blkptrs
Update the logic to handle the dedup-case of consecutive
FREEs in the livelist code. The logic still ensures that
all the FREE entries are matched up with a respective
ALLOC by keeping a refcount for each FREE blkptr that we
encounter and ensuring that this refcount gets to zero
by the time we are done processing the livelist.

zdb -y no longer panics when encountering double frees

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #11480
Closes #12177
2021-06-07 13:09:07 -06:00
Alexander Motin ea400129c3 More aggsum optimizations
- Avoid atomic_add() when updating as_lower_bound/as_upper_bound.
Previous code was excessively strong on 64bit systems while not
strong enough on 32bit ones.  Instead introduce and use real
atomic_load() and atomic_store() operations, just an assignments
on 64bit machines, but using proper atomics on 32bit ones to avoid
torn reads/writes.

 - Reduce number of buckets on large systems.  Extra buckets not as
much improve add speed, as hurt reads.  Unlike wmsum for aggsum
reads are still important.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12145
2021-06-07 09:02:47 -07:00
наб e5e76bd643 libzfs: write_inuse_diffs_one: format strerror() with "%s"
Fixes 50353dbd ("Let zfs diff be more  permissive") which accidentally
introduced a build warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12197
2021-06-04 16:04:37 -07:00
наб 2ba7a9e5cc i-t: don't try to import from empty cache
Chases 7c64ee9e77
 ("zfs-import-{cache,scan}: change condition to FileNotEmpty")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12108
2021-06-04 14:01:08 -07:00
наб a0242eceff dracut: 90zfs: zfs-load-key: wait for key to appear for up to 10 seconds
Also reduce password retries to 3 to match i-t

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12065
Closes #12108
2021-06-04 14:01:08 -07:00
наб b2c68bea50 Use %%/* instead of awk -F/ {print $1} to strip datasets
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12108
2021-06-04 14:01:08 -07:00
наб c38bc221b2 dracut: 90zfs: zfs-load-key: don't load unencrypted bootfs' keylocation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11800
Closes #12108
2021-06-04 14:01:08 -07:00
наб cfc8dd1983 dracut: 90zfs: module-setup: try /lib*/libgcc_s.so*, relax /u/l/gcc path
SUSE stores the library at /lib64/libgcc_s.so.1 (/lib/libgcc_s.so.1 for
i686 glibc), which is in the search path

Also relax the /usr/lib path to catch systems similar to SUSE
(/usr/lib64/gcc/x86_64-suse-linux/10/libgcc_s.so) but without
the top-level lib64

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11750
Closes #12108
2021-06-04 14:01:08 -07:00
Rich Ercolani 50353dbd05 Let zfs diff be more permissive
In the current world, `zfs diff` will die on certain kinds of errors
that come up on ordinary, not-mangled filesystems - like EINVAL,
which can come from a file with multiple hardlinks having the one
whose name is referenced deleted.

Since it should always be safe to continue, let's relax about all
error codes - still print something for most, but don't immediately
abort when we encounter them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12072
2021-06-04 15:00:39 -06:00
jharmening 8dddb25d2c FreeBSD: incorporate changes to the VFS_QUOTACTL(9) KPI
VFS_QUOTACTL(9) has been updated to allow each filesystem to indicate
whether it has changed the busy state of the mount.  The filesystem
may still assume that its .vfs_quotactl entrypoint is always called
with the mount busied, but only needs to unbusy the mount (and clear
*mp_busy) if it does something that actually requires the mount to be
unbusied.  It no longer needs to blindly copy-paste the UFS protocol
for calling vfs_unbusy(9) for the Q_QUOTAOFF and Q_QUOTAON commands.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jason Harmening <jason.harmening@gmail.com>
Closes #12052
2021-06-04 14:11:08 -06:00
Ryan Moeller 1f8e5b6cb8 Fix error check in nvlist_print_json_string
Move check for errors from mbrtowc() into the loop.  The error values
are not actually negative, so we don't break out of the loop when they
are encountered.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12175
Closes #12176
2021-06-04 13:53:44 -06:00
наб f84fe3fc87 Lint most manpages
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12129
2021-06-04 12:48:31 -07:00
наб 4a98300feb mancheck: accept lints, accept lint overrides
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12129
2021-06-04 12:48:26 -07:00
Brian Behlendorf 7837845822 Linux: Set spl_kmem_cache_slab_limit when page size !4K
For small objects the kernel's slab implementation is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.

This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).

To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to 16K for all architectures. 

This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12152
Closes #11429
Closes #11574
Closes #12150
2021-06-03 14:37:45 -06:00
наб 739cfb965b libzfs: convert to -fvisibility=hidden
Also mark all printf-like funxions in libzfs_impl.h as printf-like
and add --no-show-locs to storeabi, in hopes diffs will make more sense
in future

This removes these symbols from libzfs:
  D nfs_only
  T SHA256Init
  T SHA2Final
  T SHA2Init
  T SHA2Update
  T SHA384Init
  T SHA512Init
  D share_all_proto
  D smb_only
  T zfs_is_shared_proto
  W zpool_mount_datasets
  W zpool_unmount_datasets

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12048
2021-06-03 13:17:55 -07:00
наб e00aae4be2 libefi: efi_get_devname: don't allocate procfs path
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12048
2021-06-03 13:17:49 -07:00
наб eefaa55f64 libzfs: don't distribute libzfs_impl.h
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12048
2021-06-03 13:17:35 -07:00
наб 94f942c658 libspl: staticify buf and pagesize, rename aok to libspl_assert_ok
Exporting names this short can easily cause nasty collisions with user code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12050
2021-06-03 11:04:13 -06:00
Colm f97142c748 A couple of small style cleanups
In `zpool_load_compat()`:

  * initialize `l_features[]` with a loop rather than a static
    initializer.

  * don't redefine system constants; use private names instead

Rationale here:

When an array is initialized using a static {foo}, only the specified
members are initialized to the provided values, the rest are
initialized to zero. While B_FALSE is of course zero, it feels
unsafe to rely on this being true forever, so I'm inclined to sacrifice
a few microseconds of runtime here and initialize using a loop.

When looking for the correct combination of system constants to use
(in open() and mmap()), I prefer to use private constants rather than
redefining system ones; due to the small chance that the system
ones might be referenced later in the file. So rather than defining
O_PATH and MAP_POPULATE, I use distinct constant names.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #12156
2021-06-03 09:13:42 -06:00
наб f645d4416f zfs-module-parameters.5: remove nonexistent parameters
zfs_arc_overflow_shift was never a parameter:
ca0bf58d65 ("Illumos 5497 - lock
contention on arcs_mtx") is the only result in
git log -Soverflow_shift, and it wasn't exposed then, nor is it now

zfs_read_chunk_size was renamed to zfs_vnops_read_chunk_size in
e53d678d4a ("Share zfs_fsync, zfs_read,
zfs_write, et al between Linux and FreeBSD")

zio_decompress_fail_fraction was never a parameter: it was added in
c3bd3fb4ac ("OpenZFS 9403 - assertion
failed in arc_buf_destroy()") as a developer aid for setting in zdb, but
it's a dangerous test tunable and has no place in public documentation,
(not to mention that it obviously doesn't work):
> Although this did uncover a few low priority issues, this
  unfortuantely also causes ztest to ASSERT in many locations where the
  code is working correctly since it is designed to fail on IO errors.
  Developers can manually set this variable with the '-o' option to find
  and debug issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12157
2021-06-02 12:53:22 -07:00
наб ace760a0b4 spl-module-parameters.5: remove spl_kmem_cache_{expire,obj_per_slab_min}
Both were removed in 4fbdb10c7b ("remove
kmem_cache module parameter KMC_EXPIRE_AGE")

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12157
2021-06-02 12:52:32 -07:00
Rich Ercolani 6c7c7201d9 Quick fixes for two ZTS failures
On FreeBSD 14, these two tests started erroring out like the
objects they're attempting to examine don't exist.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12165
2021-06-01 15:34:19 -06:00
Rich Ercolani 8bd41ebd54 Added another missed case to arc_summary3
It turns out that sometimes, evidently only when run inside the
ZTS handler, arc_summary3 | head > /dev/null will die with ENOTCONN,
and ruin the test run.

Added handling for that.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12160
2021-06-01 15:20:50 -06:00
Ryan Moeller e1af0d0d73 libzfs_core: Fix some style violations
Made function names start on a new line. Added a blank line between
functions. This helps when grepping for functions.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #12137
2021-06-01 15:13:26 -06:00
grembo 65d9212aee FreeBSD boot code reminder after zpool upgrade
There used to be a warning after upgrading a zpool in FreeBSD, so users
won't forget to update the boot loader that pool is booted from.

This change brings this warning back, but only if the bootfs property
is set on the pool, which should be sufficient for the vast majority of
FreeBSD installations. People running something custom are most likely
aware of what to do after an upgrade in their specific environment.

Functionality is implemented in an OS specific helper function.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Michael Gmelin <grembo@FreeBSD.org>
Signed-off-by: Michael Gmelin <grembo@FreeBSD.org>
Closes #12099
Closes #12104
2021-06-01 15:03:49 -06:00
Rich Ercolani 3f81aba766 Remove iov_iter_advance() for iter_write
The additional iter advance is incorrect, as copy_from_iter() has
already done the right thing.  This will result in the following
warning being printed to the console as of the 5.12 kernel.

    Attempted to advance past end of bvec iter

This change should have been included with #11378 when a
similar change was made on the read side.

Suggested-by: @siebenmann
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Issue #11378
Closes #12041
Closes #12155
2021-06-01 11:58:08 -07:00
наб f7d7ee0583 Turn checkbashisms into a make target
make_gitrev.sh actually breaks checkbashisms' parser,
which /insists/ that the end-of-line " is actually a string start

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12101
2021-06-01 11:38:54 -07:00
наб c3ef9f7528 Turn shellcheck into a normal make target. Fix new files it caught
This checks every file it checked (and a few more),
but explicitly instead of "if it works it works" best-effort
(which wasn't that good anyway)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #10512
Closes #12101
2021-06-01 11:38:49 -07:00
наб d3858ab788 udev/rules.d: .gitignore: glob all rules
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12101
2021-06-01 11:38:45 -07:00
наб 8d5f211fc8 i-t: don't suggest zpool-import with altroot to /root
This *will fail* when remounted by the real root

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12148
2021-05-29 21:33:53 -07:00
наб 7b5bf4758b i-t: let rootdelay= set $ZFS_INITRD_PRE_MOUNTROOT_SLEEP
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Ref: https://github.com/openzfs/zfs/issues/11420#issuecomment-850338673
Closes #11663
Closes #12148
2021-05-29 21:32:48 -07:00
наб d484a7255b zstream: force-install zstreamdump link
Accidentally introduced by commit dd00925e8d.

Force-install the zstreamdump link, this is a supported configuration
and the install should not fail if it needs to overwrite an existing
file.

Also cd to work around some funny platforms as noted in AC_PROG_LN_S doc

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12143
2021-05-29 20:37:05 -07:00
наб 102c91b4f8 Widen mancheck to all of man and test-runner
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:37 -07:00
наб 4910903c07 test-runner.1: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:32 -07:00
наб d44449025c zfs-events.5: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:26 -07:00
наб c3fb6653b5 vdev_id.conf.5: modernise
Also yeet pci_slot since it doesn't seem to exist?

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:21 -07:00
наб 56454b0391 man: use Nm/Cm/Fl consistently
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:19 -07:00
наб 1f3cbcfcc5 zed.8: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:14 -07:00
наб 3bcbb6af16 cstyle.1: modernise
Also remove note about the OS/Net consolidation, now the illumos gate

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:07 -07:00
наб 76b8f7cf53 vdev_id.8: modernise, note scsi topology
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:01 -07:00
наб 5aace42ce7 zhack.1: modernise
The spacing on zhack    feature stat    pool is a bit iffy(?)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:22:00 -07:00
наб 2c9c5bc859 zpool_influxdb.8: modernise
Also rip out the section about potentially including in the OpenZFS
distribution and simplify -e description

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:59 -07:00
наб 3d00fbf9a0 zinject.8: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:58 -07:00
наб 3365d2cf71 raidz_test.1: modernise
Also re-add articles left out by the slav who wrote this

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:56 -07:00
наб defa5a0ae5 zpoolprops.8: fix spacing in ashift
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:55 -07:00
наб 2b42a1e57e fsck.zfs.8: modernise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:54 -07:00
наб 639871363a arcstat.1: modernise
Also slim down the description a tad

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:53 -07:00
наб 8134178f3b ztest.1: modernise
I fixed a few typos, but avoided changing anything beyond that;
the sould of the document should be preserved

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:52 -07:00
наб b1821d06c3 zgenhostid.8: use single-line indent macro for single-line examples
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12125
2021-05-29 20:21:46 -07:00
наб 757df52928 libzfs: add zfs_get_underlying_type. Stop including libzfs_impl.h in cmd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12116
2021-05-29 14:26:38 -07:00
наб e618e4a4ff include: move SPA_MINBLOCKSHIFT and zio_encrypt to sys/fs/zfs.h
These are used by userspace, so should live in a public header

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12116
2021-05-29 14:26:32 -07:00
наб f00f469052 libzfs: format safety
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12116
2021-05-29 14:26:25 -07:00
наб c4e5a07fd0 libzfs: expose zfs_mount_delegation_check
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12116
2021-05-29 14:25:48 -07:00
Manoj Joseph 45516b4a0a long options for ztest
This change introduces long options for ztest. It builds the usage
message as well as the long_options array from a single table. It also
adds #defines for the default values.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Manoj Joseph <manoj.joseph@delphix.com>
Closes #12117
2021-05-28 16:06:07 -06:00
Paul Dagnelie 5d59178e98 Don't direct to freenode in issue template
While Libera doesn't yet have a webchat client, we should at least
direct them to the right network. Once a webchat client is available,
we can direct them to it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #12127
2021-05-28 15:08:41 -06:00
Armin Wehrfritz 6465e590dc RPM: Explicitly set the required min/max kernel version for the DKMS package
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Armin Wehrfritz <dkxls23@gmail.com>
Closes #12124
2021-05-27 23:06:45 -07:00
Rich Ercolani d66b817f0a Minor fix to configure on s390x
configure on s390x has a key check fail with an error about
a variable being used uninitialized. So let's initialize it.

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12126
2021-05-27 22:39:53 -07:00
Alexander Motin 86706441a8 Introduce write-mostly sums
wmsum counters are a reduced version of aggsum counters, optimized for
write-mostly scenarios.  They do not provide optimized read functions,
but instead allow much cheaper add function.  The primary usage is
infrequently read statistic counters, not requiring exact precision.

The Linux implementation is directly mapped into percpu_counter KPI.
The FreeBSD implementation is directly mapped into counter(9) KPI.
In user-space due to lack of better implementation mapped to aggsum.

Unfortunately neither Linux percpu_counter nor FreeBSD counter(9)
provide sufficient functionality to completelly replace aggsum, so
it still remains to be used for several hot counters.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12114
2021-05-27 14:27:29 -06:00
Alexander Motin 2041d6eecd Improve scrub maxinflight_bytes math.
Previously, ZFS scaled maxinflight_bytes based on total number of
disks in the pool.  A 3-wide mirror was receiving a queue depth of 3
disks, which it should not, since it reads from all the disks inside.
For wide raidz the situation was slightly better, but still a 3-wide
raidz1 received a depth of 3 disks instead of 2.

The new code counts only unique data disks, i.e. 1 disk for mirrors
and non-parity disks for raidz/draid.  For draid the math is still
imperfect, since vdev_get_nparity() returns number of parity disks
per group, not per vdev, but still some better than it was.

This should slightly reduce scrub influence on payload for some pool
topologies by avoiding excessive queuing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored-By:	iXsystems, Inc.
Closing #12046
2021-05-27 10:11:39 -06:00
Rich Ercolani ba646e3e89 Bend zpl_set_acl to permit the new userns* parameter
Just like #12087, the set_acl signature changed with all the bolted-on
*userns parameters, which disabled set_acl usage, and caused #12076.

Turn zpl_set_acl into zpl_set_acl and zpl_set_acl_impl, and add a
new configure test for the new version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12076
Closes #12093
2021-05-27 08:55:49 -07:00
наб 5611bdc8f0 etc/systemd/zfs-mount-generator: output tweaks
git-diff--w-dirty, but:
  * zfs-load-key-$DSET.service -> zfs-load-key@$DSET.service
  * flattened set -eu into other /bin/sh flags
  * simpler (for 1 2 3 vs while [ counter ]; counter+=1) prompt loop
  * exec $ZFS where applicable

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11915
Closes #11917
2021-05-27 08:49:26 -07:00
наб 5a1fb060fd etc/systemd/zfs-mount-generator: rewrite in C
A plain rewrite of the shell version, and generates identical
units, save for replacing some empty lines with nothing, having fewer
meaningless spaces in After=s and different spacing in the lock scripts,
for a clean git diff -w

This is a gain of anywhere from 0m0.336s vs 0m0.022s (15.27x)
to 0m0.202s vs 0m0.006s (33.67x), depending on the hardware,
a.k.a. from "absolutely unusable" to "perfectly fine"

This also properly deals with canmount=noauto units across multiple
pools

See PR for detailed timings (of an early version) and diffs

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11915
Closes #11917
2021-05-27 08:45:51 -07:00
Rich Ercolani 0bb736ce0b Reinstate the old zpool read label logic as a fallback
In case of AIO failure, we should probably fallback to the old
behavior and still work.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12032
Closes #12040
2021-05-26 22:07:31 -07:00
наб 20bd864edc mount.zfs.8: match to reality; zfsprops.8: add missing temporary options
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12111
2021-05-26 21:44:56 -07:00
наб eae3598ae4 mount.zfs.8: modernise
No changes to the text itself

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12111
2021-05-26 21:44:50 -07:00
наб fb9baa9b20 zfsprops.8: remove nbmand-not-used-on-Linux and pointer to mount(8)
Linux man-pages' mount(8) points at fcntl(2), as does mount(2),
and support for it is little-used, deprecated, and configurable
since 4.5.

As far as I can tell, FreeBSD doesn't support nbmand at all ‒
mandatory locks are mostly dead

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12111
2021-05-26 21:44:29 -07:00
наб 69cbd0a360 Various Linux kABI cosmetics
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12103
2021-05-26 15:26:06 -07:00
наб 202498c958 linux: don't fall through to 3-arg vfs_getattr
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12103
2021-05-26 15:25:34 -07:00
наб a1ca39655b Forbid strtok(3)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:51:30 -07:00
наб a0d7e27a13 zdb: remove strtok
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:51:18 -07:00
наб 1ce6d70c52 zpool: print_zpool_script_list: remove strtok
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:51:03 -07:00
наб a281f7690d zpool: import: use realloc for realloc, remove strtok
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:50:59 -07:00
наб 31f4c8cb19 linux/libzutil: zfs_path_order: remove strtok
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:50:52 -07:00
наб 2a8c606082 libzutil: zfs_resolve_shortname: remove strtok
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:50:46 -07:00
наб 91d6780671 libzutil: zfs_strcmp_shortname: remove strtok
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:50:41 -07:00
наб 2fdd61a30b libzutil: zfs_strcmp_pathname: don't allocate, remove strtok
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:50:35 -07:00
наб a0cb347cea libzfs_core: fini: don't check for refcount twice
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12094
2021-05-26 14:50:08 -07:00
Alexander Motin c71a2bb52f FreeBSD: Update dataset_kstats for zvols in dev mode
Previous commit added accounting for geom mode, but not for dev.
In geom mode we actually have GEOM statistics, while in dev mode
additional accounting actually makes more sense.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #12097
2021-05-26 12:14:26 -06:00
наб 671ea40f62 freebsd/libzfs: import execvPe() from FreeBSD 13
It allocates less and properly deals with argv={NULL}

With minor cosmetic changes to match cstyle, remove whitespace damage,
and restore direct string printing

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12051
2021-05-26 11:03:47 -06:00
Rich Ercolani f172c3088f Correct flaws in arc_summary[23] and their test.
The change correctly handles BrokenPipeError and improves the
associated tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12037
Closes #12036
2021-05-25 20:02:01 -06:00
Alexander Motin 211cee4fcf FreeBSD: avoid memory allocation in arc_prune_async
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Closes #12049
2021-05-25 19:38:34 -06:00
Rich Ercolani 90c0524535 Add note for printing all dbgmsg entries on FreeBSD
I looked for a bit, and couldn't find any documentation on
how to print all logged dbgmsg entries, just messages since
the DTrace probe started, until @allanjude kindly pointed me
toward the sysctl.

So let's add that note where the DTrace probe is mentioned for
FreeBSD, so other people can find it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12113
2021-05-25 18:08:27 -07:00
vermavipinkumar dce1bf99ec Propagate vdev state due to invalid label corruption
Propagate vdev child state to parents on invalid label
Add VDEV_AUX_BAD_LABEL to print_import_config()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Co-authored-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Signed-off-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Closes #12088
2021-05-25 12:32:07 -06:00
Toomas Soome 1a1302f8c4 zdb: dump_history needs space
One space is missing from zdb -h output causing strings to be concatenated. (fixing #11940)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes  #12098
2021-05-25 11:33:18 -06:00
Christian Schwarz 0989d798fa ZTS: remove verify_slog_support helper
verify_slog_support no longer applies to ZFS since slog support
is always available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #12092
2021-05-24 14:57:29 -06:00
Alexander Motin f8646c871a FreeBSD: Retry OCF ENOMEM errors.
ZFS does not expect transient errors from crypto.  For read they are
counted as checksum errors, while for write end up in panic.  To not
panic on random low memory conditions retry ENOMEM errors in the OCF
wrapper function.

While there remove unneeded timeout and priority from msleep().

External-issue: https://reviews.freebsd.org/D30339
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #12077
2021-05-24 14:42:45 -06:00
наб 5d1a32a542 linux/libshare: smb: don't leak share name in smb_disable_share_one()
Fixes: 645fb9cc21 "Implemented sharing datasets via SMB using libshare"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12015
2021-05-21 10:16:18 -07:00
наб dd00925e8d zstreamdump: replace with link to zstream
zstreamdump(8) was in quite a bad state,
and the wrapper didn't work if invoked without /sbin in $PATH

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12015
2021-05-21 10:16:14 -07:00
наб 93ef500388 Don't abuse vfork()
According to POSIX.1, "vfork() has the same effect as fork(2),
except that the behavior is undefined if the process created by vfork()
either modifies any data other than a variable of type pid_t
used to store the return value from vfork(), [...],
or calls any other function before successfully calling _exit(2)
or one of the exec(3) family of functions."

These do all three, and work by pure chance
(or maybe they don't, but we blisfully don't know).
Either way: bad idea to call vfork() from C,
unless you're the standard library, and POSIX.1-2008 removes it entirely

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12015
2021-05-21 10:16:06 -07:00
наб 5da6353987 libzfs: run_process: don't leak fd on reopen failure
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12082
2021-05-21 09:49:05 -07:00
наб 7c20ceebdd libzfs: run_process: reuse line, don't leak it
line will grow as wide as it needs (glibc starts off at 120),
we can store a narrower view; this also fixes leaks in a few scenarios

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12082
2021-05-21 09:48:59 -07:00
наб 30dadd5c04 libzfs: run_process: set O_NONBLOCK on lines pipe
Without this, we can deadlock: the child is stuck writing to the pipe,
and we are stuck waiting on the child

With this, we the child fills up the pipe (a few hundred kBish)
and starts getting EAGAINs, which allows it to either crash
or ignore them

libzfs_run_process_get_stdout*() is used only by zpool -c scripts,
which output short runs of K=V pairs, so the likelihood of losing
legitimate data there is relatively low

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12082
2021-05-21 09:47:53 -07:00
наб e72383825b raidz_test: use only async-signal-safe functions in signal handler
execl*() before glibc 2.24 could allocate, but only if called with at
least 1024 arguments, which five isn't

errno modification is also fine, so long as we restore it at the end

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12086
2021-05-20 16:37:38 -07:00
Rich Ercolani 0b1b66b473 Update tmpfile() existence detection
Linux changed the tmpfile() signature again in torvalds/linux@6521f89,
which in turn broke our HAVE_TMPFILE detection in configure.

Update that macro to include the new case, and change the signature of
zpl_tmpfile as appropriate.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12060
Closes: #12087
2021-05-20 16:02:36 -07:00
Brian Behlendorf 8fb577ae6d Fix dRAID sequential resilver silent damage handling
This change addresses two distinct scenarios which are possible
when performing a sequential resilver to a dRAID pool with vdevs
that contain silent unknown damage. Which in this circumstance
took the form of the devices being intentionally overwritten with
zeros. However, it could also result from a device returning incorrect
data while a sequential resilver was in progress.

Scenario 1) A sequential resilver is performed while all of the
dRAID vdevs are ONLINE and there is silent damage present on the
vdev being resilvered. In this case, nothing will be repaired
by vdev_raidz_io_done_reconstruct_known_missing() because
rc->rc_error isn't set on any of the raid columns. To address
this vdev_draid_io_start_read() has been updated to always mark
the resilvering column as ESTALE for sequential resilver IO.

Scenario 2) Multiple columns contain silent damage for the same
block and a sequential resilver is performed. In this case it's
impossible to generate the correct data from parity unless all of
the damaged columns are being sequentially resilvered (and thus
only good data is used to generate parity). This is as expected
and there's nothing which can be done about it. However, we need
to be careful not to make to situation worse. Since we can't
verify the data is actually good without a checksum, we must
only repair the devices which are being sequentially resilvered.
Otherwise, an incorrect repair to a device which previously
contained good data could effectively lock in the damage and
make reconstruction impossible. A check for this was added to
vdev_raidz_io_done_verified() along with a new test case.

Lastly, this change updates the redundancy_draid_spare1 and
redundancy_draid_spare3 test cases to be more representative
of normal dRAID replacement operation.  Specifically, what we
care about is that the scrub run after a sequential resilver
does not find additional blocks which need repair.  This would
indicate the sequential resilver failed to rebuild a section of
one of the devices. Note also the tests were switched to using
the verify_pool() function which still checks for checksum errors.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12061
2021-05-20 15:05:26 -07:00
Lauri Tirkkonen 6ac2d7f76f zfs-allow.8: mention 'bookmark' permission
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Lauri Tirkkonen <lauri@hacktheplanet.fi>
Closes #12064
2021-05-20 09:03:03 -07:00
наб adc851d9f8 d/zfsutils.zfs.init derivatives: shellcheck, fix header
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-05-20 08:55:47 -07:00
наб 9627bdc697 contrib/bash_completion.d: fix obvious shellcheck problems
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-05-20 08:55:38 -07:00
наб 359b6cca0f zgenhostid: use argument path directly
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-05-20 08:55:31 -07:00
наб 6fc3099248 Trim excess shellcheck annotations. Widen to all non-Korn scripts
Before, make shellcheck checked
  scripts/{commitcheck,make_gitrev,man-dates,paxcheck,zfs-helpers,zfs,
           zfs-tests,zimport,zloop}.sh
  cmd/zed/zed.d/{{all-debug,all-syslog,data-notify,generic-notify,
                 resilver_finish-start-scrub,scrub_finish-notify,
                 statechange-led,statechange-notify,trim_finish-notify,
                 zed-functions}.sh,history_event-zfs-list-cacher.sh.in}
  cmd/zpool/zpool.d/{dm-deps,iostat,lsblk,media,ses,smart,upath}
now it also checks
  contrib/dracut/{02zfsexpandknowledge/module-setup,
                  90zfs/{export-zfs,parse-zfs,zfs-needshutdown,
                         zfs-load-key,zfs-lib,module-setup,
                         mount-zfs,zfs-generator}}.sh.in
  cmd/zed/zed.d/{pool_import-led,vdev_attach-led,
                 resilver_finish-notify,vdev_clear-led}.sh
  contrib/initramfs/{zfsunlock,hooks/zfs.in,scripts/local-top/zfs}
  tests/zfs-tests/tests/perf/scripts/prefetch_io.sh
  scripts/common.sh.in
  contrib/bpftrace/zfs-trace.sh
  autogen.sh

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-05-20 08:55:23 -07:00
наб 2ca77988a5 Fix SC2181 ("[ $?") outside tests/
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12042
2021-05-20 08:54:47 -07:00
Rich Ercolani 1d106ab57a Simple change to fix building in recent environments
Renamed _fini too for symmetry.

Suggested-by: @ensch
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12059
Closes: #11987
Closes: #12056
2021-05-19 20:46:42 -07:00
Alexander Motin 7457b024ba Scale worker threads and taskqs with number of CPUs
While use of dynamic taskqs allows to reduce number of idle threads,
hardcoded 8 taskqs of each kind is a big overkill for small systems,
complicating CPU scheduling, increasing I/O reorder, etc, while
providing no real locking benefits, just not needed there.

On another side, 12*8 worker threads per kind are able to overload
almost any system nowadays.  For example, pool of several fast SSDs
with SHA256 checksum makes system barely responsive during scrub, or
with dedup enabled barely responsive during large file deletion.

To address both problems this patch introduces ZTI_SCALE macro, alike
to ZTI_BATCH, but with multiple taskqs, depending on number of CPUs,
to be used in places where lock scalability is needed, while request
ordering is not so much.  The code is made to create new taskq for
~6 worker threads (less for small systems, but more for very large)
up to 80% of CPU cores (previous 75% was not good for rounding down).
Both number of threads and threads per taskq are now tunable in case
somebody really wants to use all of system power for ZFS.

While obviously some benchmarks show small peak performance reduction
(not so big really, especially on systems with SMT, where use of the
second threads does not give as much performance as the first ones),
they also show dramatic latency reduction and much more smooth user-
space operation in case of high CPU usage by ZFS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11966
2021-05-14 09:13:53 -07:00
Brian Behlendorf 6a13add559 ZTS: Increase redundancy test timeout
The redundancy_draid.ksh and redundancy_raidz.ksh tests were updated
by commit 93c8e91fe to additionally verify self-healing.  This
additional check increased the run time which can now occasionally
exceed the default maximum timeout in the CI environment.  To prevent
this from causing failures increase the default timeout for the
redundancy test cases.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12043
2021-05-14 09:11:56 -07:00
наб 5236eafdcd i-t: rewrite hooks
This produces a leaner image, doesn't fail if zdb doesn't exist,
properly handles hostnameless systems, doesn't mention crypto modules
for no reason, doesn't add useless empty executable in hopes an
eight-year-old PR is merged, uses i-t builtins for all copies

Also optimize the checkbashisms filter to spawn one (or a few) awks
instead of one per regular file and remove initramfs/hooks therefrom due
to a command -v false positive

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12017
2021-05-13 21:47:53 -07:00
Paul Zuchowski fce29d6aa4 Fix dmu_recv_stream test for resumable
Use dsl_dataset_has_resume_receive_state()
not dsl_dataset_is_zapified() to check if
stream is resumable.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #12034
2021-05-13 21:46:14 -07:00
Ryan Moeller 210231ede0 FreeBSD: Implement xattr=sa
FreeBSD historically has not cared about the xattr property; it was
always treated as xattr=on.  With xattr=on, xattrs are stored as files
in a hidden xattr directory.  With xattr=sa, xattrs are stored as
system attributes and get cached in nvlists during xattr operations.
This makes SA xattrs simpler and more efficient to manipulate.  FreeBSD
needs to implement the SA xattr operations for feature parity with
Linux and to ensure that SA xattrs are accessible when migrated or
replicated from Linux.

Following the example set by Linux, refactor our existing extattr vnops
to split off the parts handling dir style xattrs, and add the
corresponding SA handling parts.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11997
2021-05-13 15:14:12 -07:00
Ryan Moeller d86debf576 FreeBSD: Use SET_ERROR to trace xattr name errors
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11997
2021-05-13 15:14:01 -07:00
Ryan Moeller 099ca8186b FreeBSD: Don't force xattr mount option
The kernel will use the xattr property by default when not overridden
by a mount option.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11997
2021-05-13 15:13:20 -07:00
Scott Colby da124ad8ec zed: Add Pushover notifier
Add zed_notify_pushover to zed-functions.sh, along with the necessary
configuration variables in zed.rc.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Scott Colby <scott@scolby.com>
Closes #12012
2021-05-13 10:02:24 -07:00
Brian Behlendorf 6217656da3 Revert "Fix raw sends on encrypted datasets when copying back snapshots"
Commit d1d4769 takes into account the encryption key version to
decide if the local_mac could be zeroed out. However, this could lead
to failure mounting encrypted datasets created with intermediate
versions of ZFS encryption available in master between major releases.
In order to prevent this situation revert d1d4769 pending a more
comprehensive fix which addresses the mount failure case.

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11294
Issue #12025
Issue #12300
Closes #12033
2021-05-13 10:00:17 -07:00
наб 618a65cd7a Widen mancheck target to all pages, fix them
mandoc: ./man/man8/zfs-mount-generator.8.in:188:2:
        ERROR: skipping end of block that is not open: RE
mandoc: ./man/man8/zfs_ids_to_path.8:38:2:
        ERROR: skipping unknown macro: .LP
mandoc: ./man/man8/zfs_ids_to_path.8:48:2:
        ERROR: inserting missing end of block: Sh breaks Bl
mandoc: ./man/man8/zfs-wait.8:69:2:
        ERROR: skipping end of block that is not open: El
mandoc: ./man/man8/zfs-program.8:460:2:
        ERROR: inserting missing end of block: It breaks Bd
mandoc: ./man/man8/zfs-mount-generator.8:188:2:
        ERROR: skipping end of block that is not open: RE
mandoc: ./man/man8/zstream.8:43:2:
        ERROR: skipping unknown macro: .LP
mandoc: ./man/man8/zstream.8:107:2:
        ERROR: inserting missing end of block: Sh breaks Bl
mandoc: ./man/man8/zstream.8:107:2:
        ERROR: inserting missing end of block: Sh breaks Bl
make: *** [Makefile:1529: mancheck] Error 1

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12017
2021-05-12 21:32:48 -07:00
наб 37086897b0 libzfs: add keylocation=https://, backed by fetch(3) or libcurl
Add support for http and https to the keylocation properly to
allow encryption keys to be fetched from the specified URL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #9543
Closes #9947 
Closes #11956
2021-05-12 21:21:35 -07:00
Brian Behlendorf 7d07d1be39 ZTS: Add known exceptions
The following seven tests been observed to occasionally fail during
CI testing.  This commit adds them to the list of known somewhat
flaky test cases.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12023
2021-05-11 19:55:12 -07:00
Coleman Kane 48c7b0e444 linux 5.13 compat: bdevops->revalidate_disk() removed
Linux kernel commit 0f00b82e5413571ed225ddbccad6882d7ea60bc7 removes the
revalidate_disk() handler from struct block_device_operations. This
caused a regression, and this commit eliminates the call to it and the
assignment in the block_device_operations static handler assignment
code, when configure identifies that the kernel doesn't support that
API handler.

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11967 
Closes #11977
2021-05-11 19:53:02 -07:00
Ryan Moeller 4704be2879 Remove unimplemented virus scanning hooks
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11972
2021-05-10 22:02:25 -07:00
наб 38c6d6cedd module/zfs: remove zfs_zevent_console and zfs_zevent_cols
zfs_zevent_console committed multiple printk()s per line without
properly continuing them ‒ a single event could easily be fragmented
across over thirty lines, making it useless for direct application

zfs_zevent_cols exists purely to wrap the output from zfs_zevent_console

The niche this was supposed to fill can be better served by something
akin to the all-syslog ZEDLET

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #7082 
Closes #11996
2021-05-10 11:00:15 -07:00
наб 2babd20045 libzfs: zfs_asprintf(): don't return undefined pointer
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-08 09:37:40 -07:00
наб 87b671f3ac libzfsbootenv: lzbe_set_boot_device(): don't free undefined pointer
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-08 09:27:53 -07:00
наб c493943404 zfs_get_enclosure_sysfs_path(): don't free undefined pointer
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-08 09:25:59 -07:00
наб 8dfb9e57c7 zfs_get_enclosure_sysfs_path(): don't leak dev path
Also always free tmp2 at the end

Before:
nabijaczleweli@tarta:~/uwu$ valgrind --leak-check=full ./blergh
==8947== Memcheck, a memory error detector
==8947== Using Valgrind-3.14.0 and LibVEX
==8947== Command: ./blergh
==8947==
(null)
==8947==
==8947== HEAP SUMMARY:
==8947==     in use at exit: 23 bytes in 1 blocks
==8947==   total heap usage: 3 allocs, 2 frees, 1,147 bytes allocated
==8947==
==8947== 23 bytes in 1 blocks are definitely lost in loss record 1 of 1
==8947==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==8947==    by 0x48D74B7: vasprintf (vasprintf.c:73)
==8947==    by 0x48B7833: asprintf (asprintf.c:35)
==8947==    by 0x401258: zfs_get_enclosure_sysfs_path
                         (zutil_device_path_os.c:191)
==8947==    by 0x401482: main (blergh.c:107)
==8947==
==8947== LEAK SUMMARY:
==8947==    definitely lost: 23 bytes in 1 blocks
==8947==    indirectly lost: 0 bytes in 0 blocks
==8947==      possibly lost: 0 bytes in 0 blocks
==8947==    still reachable: 0 bytes in 0 blocks
==8947==         suppressed: 0 bytes in 0 blocks
==8947==
==8947== For counts of detected and suppressed errors, rerun with: -v
==8947== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

nabijaczleweli@tarta:~/uwu$ sed -n 191p zutil_device_path_os.c
        tmpsize = asprintf(&tmp1, "/sys/block/%s/device", dev_name);

After:
nabijaczleweli@tarta:~/uwu$ valgrind --leak-check=full ./blergh
==9512== Memcheck, a memory error detector
==9512== Using Valgrind-3.14.0 and LibVEX
==9512== Command: ./blergh
==9512==
(null)
==9512==
==9512== HEAP SUMMARY:
==9512==     in use at exit: 0 bytes in 0 blocks
==9512==   total heap usage: 3 allocs, 3 frees, 1,147 bytes allocated
==9512==
==9512== All heap blocks were freed -- no leaks are possible
==9512==
==9512== For counts of detected and suppressed errors, rerun with: -v
==9512== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-08 09:24:57 -07:00
наб ca46fa602b zpool: vdev_run_cmd(): don't free undefined pointers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-08 09:21:59 -07:00
наб 68ebbd9a93 libzfs: zpool_load_compat(): don't free undefined pointers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-08 09:19:20 -07:00
наб 8bc357ba92 libzfs: zpool_load_compat(): open feature file cloexec
As a bonus, this also passes the open flags into the open flags instead
of the mode (it worked by accident because O_RDONLY is 0),
correctly detects a failed map,
and prefaults the entire file since we're always writing to every page

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11993
2021-05-08 09:16:26 -07:00
illiliti 36e8abee95 copy-builtin: posix conformance
This commits contains changes to allow running `copy-builtin` without
bash + some minor improvements.

changed shebang to /bin/sh
added -f option to `set` to globally disable unneeded globbing
replaced all `echo` commands within add_after() with `printf`
alternative to avoid possible issues with options (-neE)
dropped non-portable superfluous `readlink` command
replaced superfluous `true` command with `:` builtin alternative
replaced non-portable `--recursive` option of `cp` command with `-R`
alternative
dropped non-portable `local` keyword

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: illiliti <illiliti@protonmail.com>
Closes #12004
2021-05-08 08:58:26 -07:00
Brian Behlendorf 93c8e91fe7 Fix dRAID self-healing short columns
When dRAID performs a normal read operation only the data columns
in the raid map are read from disk.  This is enough information to
calculate the checksum, verify it, and return the needed data to the
application.  It's only in the event of a checksum failure that the
additional parity and any empty columns must be read since they are
required for parity reconstruction.

Reading these additional columns is handled by vdev_raidz_read_all()
which calls vdev_draid_map_alloc_empty() to expand the raid_map_t
and submit IOs for the missing columns.  This all works correctly,
but it fails to account for any "short" columns.  These are data
columns which are padded with a empty skip sector at the end.
Since that empty sector is not needed for a normal read it's not
read when columns is first read from disk.  However, like the parity
and empty columns the skip sector is needed to perform reconstruction.

The fix is to mark any "short" columns as never being read by clearing
the rc_tried flag when expanding the raid_map_t.  This will cause
the entire column to re-read from disk in the event of a checksum
failure allowing the self-healing functionality to repair the block.

Note that this only effects the self-healing feature because when
scrubbing a pool the parity, data, and empty columns are all read
initially to verify their contents.  Furthermore, only blocks which
contain "short" columns would be effected, and only when the memory
backing the skip sector wasn't already zeroed out.

This change extends the existing redundancy_raidz.ksh test case to
verify self-healing (as well as resilver and scrub).  Then applies
the same test case to dRAID with a slightly modified version of
the test script called redundancy_draid.ksh.  The unused variable
combrec was also removed from both test cases.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12010
2021-05-08 08:57:25 -07:00
наб a27ab6d43b dracut/90/module-setup: mainly shellcheck cleanup
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11956
2021-05-07 17:20:42 -07:00
наб 1966e959ca Replace ZoL with OpenZFS where applicable
Afterward, git grep ZoL matches:
  * README.md:  * [ZoL Site](https://zfsonlinux.org)
  - Correct
  * etc/default/zfs.in:# ZoL userland configuration.
  - Changing this would induce a needless upgrade-check,
    if the user has modified the configuration;
    this can be updated the next time the defaults change
  * module/zfs/dmu_send.c:   * ZoL < 0.7 does not handle [...]
  - Before 0.7 is ZoL, so fair enough

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11956
2021-05-07 17:20:37 -07:00
Ryan Moeller 8c0991e813 FreeBSD: Remove !FreeBSD ifdef'd code
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11994
2021-05-07 15:13:44 -07:00
Ryan Moeller 0dd7da9d7a Clean up use of zfs_log_create in zfs_dir
zfs_log_create returns void, so there is no reason to cast its return
value to void at the call site.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11994
2021-05-07 15:13:10 -07:00
наб 3bd6b0e05a zed: protect against wait4()/fork() races to the global PID table
This can be very easily triggered by adding a sleep(1) before
the wait4() on a PID-starved system: the reaper thread would wait
for a child before its entry appeared, letting old entries accumulate:

  Invoking "all-debug.sh" eid=3021 pid=391
  Finished "(null)" eid=0 pid=391 time=0.002432s exit=0
  Invoking "all-syslog.sh" eid=3021 pid=336
  Finished "(null)" eid=0 pid=336 time=0.002432s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3021 pid=347
  Invoking "all-debug.sh" eid=3022 pid=349
  Finished "history_event-zfs-list-cacher.sh" eid=3021 pid=347
                                              time=0.001669s exit=0
  Finished "(null)" eid=0 pid=349 time=0.002404s exit=0
  Invoking "all-syslog.sh" eid=3022 pid=370
  Finished "(null)" eid=0 pid=370 time=0.002427s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3022 pid=391
  avl_find(tree, new_node, &where) == NULL
  ASSERT at ../../module/avl/avl.c:641:avl_add()
  Thread 1 "zed" received signal SIGABRT, Aborted.

By employing this wider lock, we atomise [wait, remove] and [fork, add]:
slowing down the reaper thread now just causes some zombies
to accumulate until it can get to them

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11963
Closes #11965
2021-05-07 15:10:16 -07:00
Alyssa Ross c074a7de13 Return required size when encode_fh size too small
Quoting <linux/exportfs.h>:

> encode_fh() should return the fileid_type on success and on error
> returns 255 (if the space needed to encode fh is greater than
> @max_len*4 bytes). On error @max_len contains the minimum size (in 4
> byte unit) needed to encode the file handle.

ZFS was not setting max_len in the case where the handle was too
small.  As a result of this, the `t_name_to_handle_at.c' example in
name_to_handle_at(2) did not work on ZFS.

zfsctl_fid() will itself set max_len if called with a fid that is too
small, so if we give zfs_fid() that behavior as well, the fix is quite
easy: if the handle is too small, just use a zero-size fid instead of
the handle.

Tested by running t_name_to_handle_at on a normal file, a directory, a
.zfs directory, and a snapshot.

Thanks-to: Puck Meerburg <puck@puckipedia.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Closes #11995
2021-05-07 15:08:16 -07:00
Alexander Motin 4fb9e5638b Simplify/fix dnode_move() for dn_zfetch
Previous code tried to keep prefetch streams while moving dnode.  But
it was at least not updating per-stream zs_fetchback pointers, causing
use-after-free on next access.  Instead of that I see much easier and
cleaner to just drop old prefetch state and start new from scratch.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11936
Closes #11998
2021-05-07 15:07:03 -07:00
Matthew Ahrens 610cb4fb8c undocumented libzfs API changes broke "zfs list"
While OpenZFS does permit breaking changes to the libzfs API, we should
avoid these changes when reasonably possible, and take steps to mitigate
the impact to consumers when changes are necessary.

Commit e4288a8397 made a libzfs API change that is especially
difficult for consumers because there is no change to the function
signatures, only to their behavior.  Therefore, consumers can't notice
that there was a change at compile time.  Also, the API change was
incompletely and incorrectly documented.

The commit message mentions `zfs_get_prop()` [sic], but all callers of
`get_numeric_property()` are impacted: `zfs_prop_get()`,
`zfs_prop_get_numeric()`, and `zfs_prop_get_int()`.

`zfs_prop_get_int()` always calls `get_numeric_property(src=NULL)`, so
it assumes that the filesystem is not mounted.  This means that e.g.
`zfs_prop_get_int(ZFS_PROP_MOUNTED)` always returns 0.

The documentation says that to preserve the previous behavior, callers
should initialize `*src=ZPROP_SRC_NONE`, and some callers were changed
to do that.  However, the existing behavior is actually preserved by
initializing `*src=ZPROP_SRC_ALL`, not `NONE`.

The code comment above `zfs_prop_get()` says, "src: ... NULL will be
treated as ZPROP_SRC_ALL.".  However, the code actually treats NULL as
ZPROP_SRC_NONE.  i.e. `zfs_prop_get(src=NULL)` assumes that the
filesystem is not mounted.

There are several existing calls which use `src=NULL` which are impacted
by the API change, most noticeably those used by `zfs list`, which now
assumes that filesystems are not mounted.  For example,
`zfs list -o name,mounted` previously indicated whether a filesystem was
mounted or not, but now it always (incorrectly) indicates that the
filesystem is not mounted (`MOUNTED: no`).  Similarly, properties that
are set at mount time are ignored.  E.g. `zfs list -o name,atime` may
display an incorrect value if it was set at mount time.

To address these problems, this commit reverts commit e4288a8397:
"zfs get: don't lookup mount options when using "-s local""

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11999
2021-05-06 11:24:56 -07:00
Ryan Moeller b1e44cdcea FreeBSD: Initialize/destroy zp->z_lock
zp->z_lock is used in shared code for protecting projid and scantime.
We don't exercise these paths much if at all on FreeBSD, so have been
lucky enough not to have issues with the uninitialized locks so far.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #12003
2021-05-06 09:45:16 -07:00
Rich Ercolani 1f77b58789 Updated zfs_dbgmsg_enable documentation to be more accurate
Changed the default specified for zfs_dbgmsg_enable, added
clarification of interaction with zfs_flags.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #11984
Closes #11986
2021-05-05 21:16:51 -07:00
Ryan Moeller c903a756ac Miscellaneous code cleanup
Remove some extra whitespace.

Use pointer-typed asserts in Linux's znode cache destructor for more
info when debugging.

Simplify a couple of conversions from inode to znode when we already
have the znode.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11974
2021-04-30 16:39:07 -07:00
Ryan Moeller e4efb70950 FreeBSD: Clean up ASSERT/VERIFY use in module
Convert use of ASSERT() to ASSERT0(), ASSERT3U(), ASSERT3S(), 
ASSERT3P(), and likewise for VERIFY().  In some cases it ended up 
making more sense to change the code, such as VERIFY on nvlist 
operations that I have converted to use fnvlist instead.  In one 
place I changed an internal struct member from int to boolean_t to 
match its use.  Some asserts that combined multiple checks with && 
in a single assert have been split to separate asserts, to make it 
apparent which check fails.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11971
2021-04-30 16:36:10 -07:00
наб ec4f330816 zed.d/zed-functions.sh: fix zed_guid_to_pool() on dash
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11935
Closes #11954
2021-04-30 15:04:41 -07:00
наб 208675a09b zed.d/history_event-zfs-list-cacher.sh: no grep for snapshot detection
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11935
2021-04-30 15:04:38 -07:00
наб 8dca000040 zed.d/*-notify.sh: use mktemp instead of generating temp path manually
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11935
2021-04-30 15:04:33 -07:00
наб 71def603cd zed.d/pool_import-led.sh: fix for current zpool scripts
Also minor clean-up with folding state_to_val() into a case,
unrolling the lesser-available seq into numbers,
ignoring vdev states we don't care about,
and documentation comments

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11934
Closes #11935
2021-04-30 15:04:28 -07:00
наб 1a7d7182ac libzutil: fix dm_get_underlying_path() return if not a DM device
For example, this would happily return "/dev/(null)" for /dev/sda1

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11935
2021-04-30 15:04:19 -07:00
Ryan Moeller ccb46cab50 ZTS: Fix xattr_002_neg passing too soon
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11970
2021-04-30 07:37:02 -07:00
Ryan Moeller 801c76149b FreeBSD: Prune some unneeded definitions
IS_XATTRDIR is never used.
v_count is only used in two places, one immediately followed by the
use of the real name, v_usecount.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #11973
2021-04-30 07:34:53 -07:00
наб 6f4e132fec ZTS: cli_root/zfs_load-key: add separate key files
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue: #11956
Closes #11976
2021-04-30 07:31:22 -07:00
Toomas Soome 17b83525f5 zdb: dump_history can be improved
We only recognize some history records, instead, use
same logic as in print_history_records() in zpool_main.c.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11940
2021-04-29 16:44:07 -07:00
Alan Somers e4288a8397 zfs get: don't lookup mount options when using "-s local"
Looking up mount options can be very expensive on servers with many
mounted file systems.  When doing "zfs get" with any "-s" option that
does not include "temporary", the mount list will never be used.  This
commit optimizes for that case.

This is a breaking commit for libzfs!  Callers of zfs_get_prop are now
required to initialize src.  To preserve existing behavior, they should
initialize it to ZPROP_SRC_NONE.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11955
2021-04-29 14:19:44 -07:00
Arshad Hussain bc9c7265ae vdev_id: variable not getting expanded under map_slot()
Under function map_slot() variable passed as args
were not getting properly substituted or expanded.
This patch fixes the substitution issue.

Reviewed-by: Niklas Edmundsson <nikke@acc.umu.se>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11951 
Closes #11959
2021-04-29 13:58:49 -07:00
Nathaniel Wesley Filardo 056a658dee vdev_mirror: don't scrub/resilver devices that can't be read
This ensures that we don't accumulate checksum errors against offline or
unavailable devices but, more importantly, means that we don't
needlessly create DTL entries for offline devices that are already
up-to-date.

Consider a 3-way mirror, with disk A always online (and so always with
an empty DTL) and B and C only occasionally online.  When A & B resilver
with C offline, B's DTL will effectively be appended to C's due to these
spurious ZIOs even as the resilver empties B's DTL:

  * These ZIOs land in vdev_mirror_scrub_done() and flag an error

  * That flagged error causes vdev_mirror_io_done() to see
    unexpected_errors, so it issues a ZIO_TYPE_WRITE repair ZIO, which
    inherits ZIO_FLAG_SCAN_THREAD because zio_vdev_child_io() includes
    that flag in ZIO_VDEV_CHILD_FLAGS.

  * That ZIO fails, too, and eventually zio_done() gets its hands on it
    and calls vdev_stat_update().

  * vdev_stat_update() sees the error and this zio...

    * is not speculative,
    * is not due to EIO (but rather ENXIO, since the device is closed)
    * has an ->io_vd != NULL (specifically, the offline leaf device)
    * is a write
    * is for a txg != 0 (but rather the read block's physical birth txg)
    * has ZIO_FLAG_SCAN_THREAD asserted

  * So: vdev_stat_update() calls vdev_dtl_dirty() on the offline vdev.

Then, when A & C resilver with B offline, that story gets replayed and
C's DTL will be appended to B's.

In fact, one does not need this permanently-broken-mirror scenario to
induce badness: breaking a mirror with no DTLs and then scrubbing will
create DTLs for all offline devices.  These DTLs will persist until the
entire mirror is reassembled for the duration of the *resilver*, which,
incidentally, will not consider the devices with good data to be sources
of good data in the case of a read failure.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com>
Closes #11930
2021-04-27 17:48:11 -07:00
Brian Behlendorf 919714554b zfs.spec.in: remove post ldconfig scriptlets
In Fedora 28 the packaging guidelines were changed such that ldconfig
should no longer be called in either the %post or %postun scriptlets.
Instead the new compatibility macros %ldconfig_post, %ldconfig_postun,
and %ldocnfig_scriptlets should be used.

Since we only currently support Fedora 31 and newer, we could drop
%post or %postun scriptlets entirely according to the guidelines.
However, since we also use the same spec file for CentOS / RHEL
it's convenient to call the macros which are available starting
with CentOS / RHEL 8.  For CentOS / RHEL 7 we must still call
ldconfig in the traditional way.

https://fedoraproject.org/wiki/Changes/Removing_ldconfig_scriptlets

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11931
2021-04-27 17:45:40 -07:00
Toomas Soome 09131144b7 zdb: ASSERT issues when DEBUG is not defined
If zdb is not built with DEBUG mode, the ASSERT macros will be
eliminated.

This will leave vim defined, but not used (gcc warning) and
checkpoint spacemap validation loop will do nothing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11932
2021-04-27 08:33:37 -07:00
Brian Behlendorf b1f7341203 ZTS: Add known exceptions
Both the zpool_initialize_import_export and checkpoint_discard_busy
test cases a known to occasionally fail.  Add them to the list of
known possible failures and reference the appropriate issue on the
tracker.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11949
2021-04-27 08:27:03 -07:00
Martin Matuška a69356cf26 Drop "All rights reserved" from files by trasz@FreeBSD.org
This obeys the change in freebsd/freebsd-src@bce7ee9d4

External-issue: https://reviews.freebsd.org/D26980
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11947
2021-04-27 08:25:48 -07:00
Prawn b0269cd8ce receive: don't fail inheriting (-x) properties on wrong dataset type
Receiving datasets while blanket inheriting properties like zfs 
receive -x mountpoint can generally be desirable, e.g. to avoid 
unexpected mounts on backup hosts.

Currently this will fail to receive zvols due to the mountpoint 
property being applicable to filesystems only.  This limitation 
currently requires operators to special-case their minds and tools 
for zvols.

This change gets rid of this limitation for inherit (-x) by
Spiting up the dataset type handling: Warnings for inheriting (-x), 
errors for overriding (-o).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11416
Closes #11840
Closes #11864
2021-04-26 17:23:51 -07:00
Mateusz Guzik f172f759b9 FreeBSD: damage control racing .. lookups in face of mkdir/rmdir
External-issue: https://reviews.freebsd.org/D29769
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11926
2021-04-26 12:44:40 -07:00
Romain Dolbeau 0375465536 Fix AVX512BW Fletcher code on AVX512-but-not-BW machines
Introduce a specific valid function for avx512f+avx512bw (instead 
of checking only for avx512f).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Romain Dolbeau <romain@dolbeau.org>
Closes #11937
Closes #11938
2021-04-26 12:42:42 -07:00
наб f9bece92e2 zed: protect against wait4()/fork() races to the launched process tree
As soon as wait4() returns, fork() can immediately return with the same
PID, and race to lock _launched_processes_lock, then try to add the new
(duplicate) PID to _launched_processes, which asserts

By locking before wait4(), we ensure, that, given that same
unfortunate scheduling, _launched_processes_lock cannot be locked by the
spawner before we pop the process in the reaper, and only afterward will
it be added

This moves where the reaper idles when there are children from the
wait4() to the pause(), locking for the duration of that single syscall
in both the no-children and running-children cases; the impact of this
is one to two syscalls (depending on _launched_processes_lock state)
per loop

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11924
Closes #11928
2021-04-22 17:49:21 -07:00
Daniel Stevenson 60ffc1c460 Fixed incorrect man page reference in zfsprops(8)
The special_small_blocks section directed readers to zpool(8) for
documentation on special allocation classes, while they are actually
documented in zpoolconcepts(8).

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Daniel Stevenson <daniel@dstev.net>
Closes #11918
2021-04-20 10:16:00 -07:00
наб 44287ffa0d etc/systemd/zfs-mount-generator: don't fail if no cached pools
If $FSLIST exists but is empty, the generator fails with
  sort: cannot read: '/etc/zfs/zfs-list.cache/*':
  No such file or directory

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11915
2021-04-19 10:52:44 -07:00
наб dc3a56d38b libshare: nfs: commonify nfs_enable_share()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
2021-04-19 09:07:09 -07:00
наб 62866fc96c freebsd/libshare: nfs: make nfs_is_shared() thread-safe
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
2021-04-19 09:07:04 -07:00
наб 8357bbff1f libshare: nfs: commonify nfs_{init,fini}_tmpfile(), nfs_disable_share()
Also open the temp file cloexec

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
2021-04-19 09:06:57 -07:00
наб a3d387b31d libshare: nfs: commonify nfs_exports_[un]lock(), FILE_HEADER
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
2021-04-19 09:06:52 -07:00
наб 0f4d83117a libshare: nfs: don't leak nfs_lock_fd when lock fails
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11886
2021-04-19 09:06:31 -07:00
наб fef8bd41fc libspl: implement atomics in terms of atomics
This replaces the generic libspl atomic.c atomics implementation
with one based on builtin gcc atomics.  This functionality was added
as an experimental feature in gcc 4.4.  Today even CentOS 7 ships
with gcc 4.8 as the default compiler we can make this the default.

Furthermore, the builtin atomics are as good or better than our
hand-rolled implementation so it's reasonable to drop that custom code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11904
2021-04-18 22:13:24 -07:00
Brian Behlendorf 50d9ff93df ZTS: Improve redundancy test scripts
- Add additional logging to provide more information about why the
  test failed.  This including logging more of the individual commands
  and the contents and differences of the record files on failure.

- Updated get_vdevs() to properly exclude all top-level vdevs
  including raidz3 and draid[1-3].

- Replaced gnudd with dd.  This is the only remaining place in the
  test suite gnudd is used and it shouldn't be needed.

- The refill_test_env function expects the pool as the first argument
  but never sets the pool variable.

- Only fill the test pools to 50% of capacity instead of 75% to help
  speed up the tests.

- Fix replace_missing_devs() calculation, MINDEVSIZE should be
  MINVDEVSIZE.

- Fix damage_devs() so it overwrites almost all of the device so
  we're guaranteed to damage filesystem blocks.

- redundancy_stripe.ksh should not use log_mustnot to check if the
  pool is healthy since the return value may be misinterpreted.
  Just perform a normal conditional check and log the failure.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11906
2021-04-18 21:58:36 -07:00
Attila Fülöp 7c9702e2a7 ICP: Silence objtool "stack pointer realignment" warnings
Objtool requires the use of a DRAP register while aligning the
stack. Since a DRAP register is a gcc concept and we are
notoriously low on registers in the crypto code, it's not worth
the effort to mimic gcc generated stack realignment.

We simply silence the warning by adding the offending object files
to OBJECT_FILES_NON_STANDARD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #6950
Closes #11914
2021-04-17 13:11:18 -07:00
наб a31ac10185 libzfs: refresh property cache after inheriting userprop
This matches what happens when inheriting a system property

Consider the following program:
	int main() {
		void *zhp = libzfs_init();
		void *dataset = zfs_open(zhp, "zest/__test", 1);

		printf("before:");
		dump_nvlist(zfs_get_user_props(dataset), 2);
		printf("\n");

		zfs_prop_inherit(dataset, "xyz.nabijaczleweli:test", 0);
		printf("after:");
		dump_nvlist(zfs_get_user_props(dataset), 2);
		printf("\n");

		zfs_refresh_properties(dataset);
		printf("refreshed:");
		dump_nvlist(zfs_get_user_props(dataset), 2);
		printf("\n");
	}

And the output before:
	# zfs set xyz.nabijaczleweli:test=hehe zest/__test
	# ./a.out
	before:  xyz.nabijaczleweli:test:
	      value: 'hehe'
	      source: 'zest/__test'

	after:  xyz.nabijaczleweli:test:
	      value: 'hehe'
	      source: 'zest/__test'

	refreshed:

As compared to the output after:
	# zfs set xyz.nabijaczleweli:test=hehe zest/__test
	# ./a.out
	before:  xyz.nabijaczleweli:test:
	      value: 'hehe'
	      source: 'zest/__test'

	after:
	refreshed:

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11064
Closes #11911
2021-04-17 12:39:54 -07:00
наб bfe8b9fff3 libzfs: don't mark prompt+raw as retriable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11911
Closes #11031
2021-04-17 12:39:28 -07:00
Mateusz Guzik 309c32c954 Combine zio caches if possible
This deduplicates 2 sets of caches which use the same allocation size.

Memory savings fluctuate a lot, one sample result is FreeBSD running
"make buildworld" saving ~180MB RAM in reduced page count associated
with zio caches.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11877
2021-04-17 12:36:04 -07:00
наб c682b67827 contrib/dracut: 90: zfs-{rollback,snapshot}-bootfs: use @sbindir@
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-16 15:25:12 -07:00
наб ac541438a2 contrib/i-t: properly mount root's children with spaces
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-16 15:24:55 -07:00
наб d8ced6613d contrib/dracut: 90: mount essential datasets under root
This partly mirrors what the i-t script does (though that mounts all
children, recursively) ‒ /etc, /usr, /lib*, and /bin are all essential,
if present, to successfully invoke the real init, which will then mount
everything else it might need in the right order

The following extreme-case set-up boots w/o issues now:
  /               zoot            zfs  rw,relatime,xattr,noacl
  ├─/etc          zoot/etc        zfs  rw,relatime,xattr,noacl
  ├─/usr          zoot/usr        zfs  rw,relatime,xattr,noacl
  │ └─/usr/local  zoot/usr/local  zfs  rw,relatime,xattr,noacl
  ├─/var          zoot/var        zfs  rw,relatime,xattr,noacl
  │ ├─/var/lib    zoot/var/lib    zfs  rw,relatime,xattr,noacl
  │ ├─/var/log    zoot/var/log    zfs  rw,relatime,xattr,posixacl
  │ ├─/var/cache  zoot/var/cache  zfs  rw,relatime,xattr,noacl
  │ └─/var/tmp    zoot/var/tmp    zfs  rw,relatime,xattr,noacl
  ├─/home         zoot/home       zfs  rw,relatime,xattr,noacl
  │ └─/home/nab   zoot/home/nab   zfs  rw,relatime,xattr,noacl
  ├─/boot         zoot/boot       zfs  rw,relatime,xattr,noacl
  ├─/root         zoot/home/root  zfs  rw,relatime,xattr,noacl
  ├─/opt          zoot/opt        zfs  rw,relatime,xattr,noacl
  └─/srv          zoot/srv        zfs  rw,relatime,xattr,noacl

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-16 15:24:22 -07:00
наб 51bc60e582 contrib/dracut: 90: generator: only log to kmsg if debug set on cmdline
"debug" is also used by systemd itself, and there's really no reason for
the generator to write this much garbage by default

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-16 15:24:06 -07:00
наб 626a792bb3 contrib/dracut: 02: don't spill device names across multiple lines
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11898
2021-04-16 15:23:35 -07:00
Attila Fülöp a9c93ac533 ICP: Add missing stack frame info to SHA asm files
Since the assembly routines calculating SHA checksums don't use
a standard stack layout, CFI directives are needed to unroll the
stack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11733
2021-04-16 15:11:26 -07:00
Paul Zuchowski f2286383d0 Fix crash in zio_done error reporting
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #11872
Closes #11896
2021-04-16 11:00:53 -07:00
Ryan Moeller d92af6fc8d zfs-send(8): Restore sorting of flags
Before #11710 the flags in zfs-send(8) were sorted.
Restore order and bump the date.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11905
2021-04-15 17:43:07 -07:00
наб 23b6f17abb linux/spl: proc: use global table_{min,max} values instead of local ones
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11879
2021-04-15 14:55:50 -07:00
наб a0978d15aa linux/libspl: gethostid: read from /proc/sys/kernel/spl/hostid, simplify
Fixes get_system_hostid() if it was set via the aforementioned sysctl
and simplifies the code a bit.  The kernel and user-space must agree,
after all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11879
2021-04-15 14:55:47 -07:00
наб 7de4c88b39 linux/spl: base proc_dohostid() on proc_dostring()
This fixes /proc/sys/kernel/spl/hostid on kernels with mainline commit
32927393dc1ccd60fb2bdc05b9e8e88753761469 ("sysctl: pass kernel pointers
to ->proc_handler") ‒ 5.7-rc1 and up

The access_ok() check in copy_to_user() in proc_copyout_string() would
always fail, so all userspace reads and writes would fail with EINVAL

proc_dostring() strips only the final new-line,
but simple_strtoul() doesn't actually need a back-trimmed string ‒
writing "012345678   \n" is still allowed, as is "012345678zupsko", &c.

This alters what happens when an invalid value is written ‒
previously it'd get set to what-ever simple_strtoul() returned
(probably 0, thereby resetting it to default), now it does nothing

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11878
Closes #11879
2021-04-15 14:55:43 -07:00
наб 2d14207c98 libspl: lift common bits of getexecname()
Merge the actual implementations of getexecname() and slightly clean
up the FreeBSD one.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11879
2021-04-15 14:55:40 -07:00
наб 375bdb2b20 module/zfs/zvol.c: purge unused zvol_volmode_cb_arg
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11879
2021-04-15 14:55:37 -07:00
Jitendra Patidar 08795ab8d3 ZFS traverse_visitbp optimization to limit prefetch
Traversal code, traverse_visitbp() does visit blocks recursively.
Indirect (Non L0) Block of size 128k could contain, 1024 block pointers
of 128 bytes. In case of full traverse OR incremental traverse, where
all blocks were modified, it could traverse large number of blocks
pointed by indirect. Traversal code does issue prefetch of blocks
traversed below indirect. This could result into large number of
async reads queued on vdev queue. So, account for prefetch issued for
blocks pointed by indirect and limit max prefetch in one go.

Module Param:
zfs_traverse_indirect_prefetch_limit: Limit of prefetch while traversing
an indirect block.

Local counters:
prefetched: Local counter to account for number prefetch done.
pidx: Index for which next prefetch to be issued.
ptidx: Index at which next prefetch to be triggered.

Keep "ptidx" somewhere in the middle of blocks prefetched, so that
blocks prefetch read gets the enough time window before their demand
read is issued.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #11802 
Closes #11803
2021-04-15 13:49:27 -07:00
наб 86418090d7 ZTS: add zed_fd_spill to verify the fds ZEDLETs inherit
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11891
2021-04-15 13:46:05 -07:00
наб aa6a14c0d5 zed: set O_CLOEXEC on persistent fds, remove closefrom() from pre-exec
Also don't dup /dev/null over stdio if daemonised

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11891
2021-04-15 13:46:02 -07:00
Paul Dagnelie 414f7249dc Add SIGSTOP and SIGTSTP handling to issig
This change adds SIGSTOP and SIGTSTP handling to the issig function; 
this mirrors its behavior on Solaris. This way, long running kernel 
tasks can be stopped with the appropriate signals. Note that doing 
so with ctrl-z on the command line doesn't return control of the tty 
to the shell, because tty handling is done separately from stopping 
the process. That can be future work, if people feel that it is a 
necessary addition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Issue #810 
Issue #10843 
Closes #11801
2021-04-15 13:34:35 -07:00
Brian Behlendorf c95c5176b6 Fix 'make checkbashisms` warnings
The awk command used by the checkbashisms target incorrectly
adds the escape character before the ! and # characters.  This
results in the following warnings because these characters do not
need to be escaped.

    awk: cmd. line:1: warning: regexp escape sequence
        `\!' is not a known regexp operator
    awk: cmd. line:1: warning: regexp escape sequence
        `\#' is not a known regexp operator

Remove the unneeded escape character before ! and #.

Valid escape sequences are:

    https://www.gnu.org/software/gawk/manual/html_node/Escape-Sequences.html

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11902
2021-04-15 08:53:08 -07:00
Yuri Pankov 96904d879c Fix vdev health padding in zpool list -v
Do not (incorrectly, right instead left) pad health string itself,
it will be taken care of when printing property value below.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes #11899
2021-04-14 09:02:16 -07:00
Brian Behlendorf 3e660534ea Obsolete earlier packages due to version bump
Follow up to d5ef91af which adds a missing 'obsoletes' for the
libzfs-devel package.

Add a comment to the zfs.spec file as a reminder that previous
versions of the package should be marked as obsolete.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11844 
Closes #11895
2021-04-13 16:33:06 -07:00
наб d197a150b4 libzfs: get rid of unused libzfs_handle::libzfs_{storeerr,chassis_id}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11868
2021-04-13 14:15:06 -07:00
наб 533527725b libzfs: get rid of libzfs_handle::libzfs_mnttab
All users did a freopen() on it. Even some non-users did!
This is point-less ‒ just open the mtab when needed

If I understand Solaris' getextmntent(3C) correctly, the non-user
freopen()s are very likely an odd, twisted vestigial tail of that ‒
but it's got a completely different calling convention and caching
semantics than any platform we support

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11868
2021-04-13 14:14:44 -07:00
наб 74e48f470e linux/libspl: getextmntent(): don't leak mnttab FILE*
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11868
2021-04-13 14:14:44 -07:00
наб b3530c4262 libzfs: zfs_mount_at(): load key for encryption root if MS_CRYPT
zfs_crypto_load_key() only works on encryption roots,
and zfs mount -la would fail if it encounters a datasets that
is sorted before their encroots.

To trigger:
  truncate -s 40G /tmp/test
  dd if=/dev/urandom of=/tmp/k bs=128 count=1 status=none
  zpool create -O encryption=on -O keylocation=file:///tmp/k \
               -O keyformat=passphrase test /tmp/test
  zfs create -o mountpoint=/a test/a
  zfs create -o mountpoint=/b test/b
  zfs umount test
  zfs unload-key test
  zfs mount -la

The final mount errored out with:
  Key load error: Keys must be loaded for
    encryption root of 'test/a' (test).
  Key load error: Keys must be loaded for
    encryption root of 'test/b' (test).

And only /test was mounted

This technically breaks the libzfs API, but the previous behavior was
decidedly a bug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11870 
Closes #11875
2021-04-12 21:26:55 -07:00
Mateusz Guzik 93f81eb721 FreeBSD: use vnlru_free_vfsops if available
Fixes issues when zfs is used along with other filesystems.

External-issue: https://cgit.freebsd.org/src/commit/?id=e9272225e6bed840b00eef1c817b188c172338ee
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11881
2021-04-12 11:01:46 -07:00
Mateusz Guzik 5ad86e973c FreeBSD: add missing seqc write begin/end around zfs_acl_chown_setattr
It happens to trip over an assert but does not matter for correctness at
this time. Done for future proofing.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11884
2021-04-12 10:59:57 -07:00
Mateusz Guzik d8c09f3fcc FreeBSD: add support for lockless symlink lookup
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11883
2021-04-12 10:59:22 -07:00
наб 53e9540e41 .gitmodules: link to openzfs github repository
Update the test images link to reference the openzfs github repository.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11868
2021-04-12 09:37:23 -07:00
Prawn ee6615e07a cmd/zfs receive: allow dry-run (-n) to check property args
zfs recv -n does not report some errors it could.  The code to bail 
out of the receive if in dry-run mode came a little early, skipping 
validation of cmdprops (recv -x and -o) among others.  Move the
check down to enable these additional checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11862
2021-04-12 09:35:55 -07:00
наб 722b7f9a4c libuutil: purge unused functions
Remove vestigial uu_open_tmp().  The problems with this implementation
are many, but the primary one is the TMPPATHFMT macro, which is
unused, and always has been.

Searching around for any users leads only to earlier imports of the
same, identical file, i.a. into an apple repository (which does patch
gethrtime() into it and gives us a copyright date of 2007),
and a MidnightBSD one from 2008.

Searching illumos-gate, uu_open_tmp appears, in current HEAD, three
times: in the header, libuutil's mapfile ABI, and the implementation.

This slowly grows up to eight occurrences as one moves back to the root
"OpenSolaris Launch" commit: the header, implementation, twice in
libuutil's spec ABI, twice (with multilib and non-multilib paths) in
libuutil.so's i386 and SPARC binary db ABIs.

That's 2005, and this file was abandonware even then, it's dead code.

The situation is similar for the uu_dprintf() family of functions and
uu_dump().  Nothing in accessibly recorded history has ever used them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11873
2021-04-12 09:32:43 -07:00
Colm e086db1656 Improvements to the 'compatibility' property
Several improvements to the operation of the 'compatibility' property:

1) Improved handling of unrecognized features:
Change the way unrecognized features in compatibility files are handled.

 * invalid features in files under /usr/share/zfs/compatibility.d
   only get a warning (as these may refer to future features not yet in
   the library),
 * invalid features in files under /etc/zfs/compatibility.d
   get an error (as these are presumed to refer to the current system).

2) Improved error reporting from zpool_load_compat.
Note: slight ABI change to zpool_load_compat for better error reporting.

3) compatibility=legacy inhibits all 'zpool upgrade' operations.

4) Detect when features are enabled outside current compatibility set
   * zpool set compatibility=foo <-- print a warning
   * zpool set feature@xxx=enabled <-- error
   * zpool status <-- indicate this state

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11861
2021-04-12 09:08:56 -07:00
Brian Behlendorf 888700bc6b ZTS: fix removal_condense_export test case
It's been observed in the CI that the required 25% of obsolete bytes
in the mapping can be to high a threshold for this test resulting in
condensing never being triggered and a test failure.  To prevent these
failures make the existing zfs_condense_indirect_obsolete_pct tuning
available so the obsolete percentage can be reduced from 25% to 5%
during this test.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11869
2021-04-11 21:49:13 -07:00
Brian Behlendorf 4ef03d077c Update libzfs.abi for zfs_send() change
Commit 099fa7e4 intentionally modified the libzfs ABI.  However, it
failed to include an update for the libzfs.abi file.  This commit
resolves the `make checkabi` warning due to that omission.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11710
2021-04-11 17:06:54 -07:00
pstef 458f82319a Balance parentheses in parameter descriptions
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Piotr Paweł Stefaniak <pstef@freebsd.org>
Closes #11882
2021-04-11 16:35:07 -07:00
Brian Behlendorf ea3cd8e420 ZTS: Add known exceptions
The fault/auto_spare_shared, l2arc/persist_l2arc_007_pos, and
alloc_class/alloc_class_013_pos test cases are not entirely reliable
and may occasionally fail resulting in a false positive in the CI.
Add these tests to known list of possible failures until they can
be made 100% reliable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11890
2021-04-11 15:55:38 -07:00
наб 10b575d04c lib/: set O_CLOEXEC on all fds
As found by
  git grep -E '(open|setmntent|pipe2?)\(' |
    grep -vE '((zfs|zpool)_|fd|dl|lzc_re|pidfile_|g_)open\('

FreeBSD's pidfile_open() says nothing about the flags of the files it
opens, but we can't do anything about it anyway; the implementation does
open all files with O_CLOEXEC

Consider this output with zpool.d/media appended with
"pid=$$; (ls -l /proc/$pid/fd > /dev/tty)":
  $ /sbin/zpool iostat -vc media
  lrwx------ 0 -> /dev/pts/0
  l-wx------ 1 -> 'pipe:[3278500]'
  l-wx------ 2 -> /dev/null
  lrwx------ 3 -> /dev/zfs
  lr-x------ 4 -> /proc/31895/mounts
  lrwx------ 5 -> /dev/zfs
  lr-x------ 10 -> /usr/lib/zfs-linux/zpool.d/media
vs
  $ ./zpool iostat -vc vendor,upath,iostat,media
  lrwx------ 0 -> /dev/pts/0
  l-wx------ 1 -> 'pipe:[3279887]'
  l-wx------ 2 -> /dev/null
  lr-x------ 10 -> /usr/lib/zfs-linux/zpool.d/media

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-11 15:45:59 -07:00
наб 92ffd87aaf libzfs{,_core}: set O_CLOEXEC on persistent (ZFS_DEV and MNTTAB) fds
These were fd 3, 4, and 5 by the time zfs change-key hit
execute_key_fob()

glibc appends "e" to setmntent() mode, but musl's just returns fopen()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-11 15:45:31 -07:00
наб 0fc401a7ef libzfs: zfs_crypto_create() requires a new key by definition: set newkey
This changes the password prompt for new encryption roots from
  Enter passphrase:
  Re-enter passphrase:
to
  Enter new passphrase:
  Re-enter new passphrase:
which makes more sense and is more consistent with "new passphrase"
now always meaning "come up with something" and plain "passphrase"
"remember that thing"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-11 15:44:54 -07:00
наб e568853f96 libzfs_crypto.c: remove unused key_locator enum
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-11 15:43:15 -07:00
наб 0395cf92e0 zfprops(8): fix spacing in jailed= arguments
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-11 15:42:24 -07:00
наб 0241dfac47 zfs-[un]jail(8): fix "zfs-jail [un]jail" leftovers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11866
2021-04-11 15:41:55 -07:00
наб e0779d1e20 zed: untangle _zed_conf_parse_path()
Dunno, maybe it's just me, but the previous style was /really/ confusing

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-11 15:27:34 -07:00
наб 346c85b722 zed: don't malloc() global zed_conf instance, optimise zed_conf layout
It's all of 40 bytes with 4-byte pointers and 64 with 8-byte ones
(previously 44 and 88, respectively) ‒
there's no reason it can't live on the stack

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-11 15:27:30 -07:00
наб 46d50eaf56 zed: remove zed_conf::{min,max}_events and ZED_{MIN,MAX}_EVENTS
No users, fields marked "reserved for future use", macros defined to 0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-11 15:27:25 -07:00
наб d09bd19629 zed: remove zed_conf::syslog_facility
No users, nobody sets it, main() hard-codes LOG_DAEMON, which is the
only correct value for this

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-11 15:27:13 -07:00
наб 88bf37d91a zed: _zed_conf_display_help(): be consistent about what got_err means
Users passed in EXIT_SUCCESS and EXIT_FAILURE, despite it being a bool

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-11 15:25:39 -07:00
наб d622f16b6b zed: untangle -h option listing
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-11 15:25:12 -07:00
наб 83cc6bbf79 zed: print out licence string as one big chunk
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11860
2021-04-11 15:23:59 -07:00
pablofsf 099fa7e475 Allow zfs to send replication streams with missing snapshots
A tentative implementation and discussion was done in #5285.
According to it a send --skip-missing|-s flag has been added.
In a replication stream, when there are snapshots missing in
the hierarchy, if -s is provided print a warning and ignore
dataset (and its children) instead of throwing an error

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pablo Correa Gómez <ablocorrea@hotmail.com>
Closes #11710
2021-04-11 12:05:35 -07:00
Olaf Faaland 2ec0b0dd71 kmod-zfs should obsolete kmod-spl as well as spl-kmod
Without this Obsoletes, using packages built --with-spec=redhat, an
upgrade from zfs-0.7 to zfs-2.x does not cause the kmod-spl-0.7 package
to be removed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #11865
2021-04-11 12:02:26 -07:00
наб d08dc34515 zvol_wait: properly handle zvol_volmode sysctl being 3/none
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-11 11:58:36 -07:00
наб 4640baab69 zfs_ids_to_path: print correct wrong values
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-11 11:58:16 -07:00
наб ecbf7c6707 zfs_ids_to_path: the -v comes after the executable name
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-11 11:57:56 -07:00
наб e09318829b contrib/bpftrace: exec bpftrace, remove useless cat
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-11 11:57:34 -07:00
наб 0f2915602e arc_summary3: just read /s/m/{mod}/version instead of spawning cat
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-11 11:57:14 -07:00
наб 519aec83f5 zvol_wait: fix for zvols with spaces in name, optimise
list_zvols() would happily, for zvols with spaces in their names,
assign the second half to volmode, &c., so use a normal read
and set IFS to a tab instead of using 4 separate AWK processes(?)

Similarly, in filter_out_deleted_zvols(), run zfs(8) once and use the
output directly instead of spawning a zfs(8) process per zvol

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-11 11:56:53 -07:00
наб ea4541e4c6 zstreamdump: exec zstream dump
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11859
2021-04-11 11:55:58 -07:00
Ryan Moeller a631283b74 Move zfsdev_state_{init,destroy} to common code
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11833
2021-04-08 21:17:43 -07:00
Ryan Moeller 1dff545278 Eliminate zfsdev_get_state_impl
After 3937ab20f zfsdev_get_state_impl can become zfsdev_get_state.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11833
2021-04-08 21:17:18 -07:00
TerraTech 161ed825ca zpl_inode.c: Fix SMACK interoperability
SMACK needs to have the ZFS dentry security field setup before
SMACK's d_instantiate() hook is called as it requires functioning
'__vfs_getxattr()' calls to properly set the labels.

Fxes:
1) file instantiation properly setting the object label to the
   subject's label
2) proper file labeling in a transmutable directory

Functions Updated:
1) zpl_create()
2) zpl_mknod()
3) zpl_mkdir()
4) zpl_symlink()

External-issue: https://github.com/cschaufler/smack-next/issues/1
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: TerraTech <TerraTech@users.noreply.github.com>
Closes #11646 
Closes #11839
2021-04-08 21:15:29 -07:00
Ryan Moeller 5d508d92d2 ZTS: Improve cleanup in removal_with_export
Kill the removal operation on every platform, not just Linux.
The test has been fixed and is now stable on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11856
2021-04-08 21:10:28 -07:00
Rich Ercolani 46fb478326 Added check for broken alien version
Added a check for alien 8.95.{1,2,3}, which is known to fail to 
generate debs 100% of the time, and instead print out a message 
informing the developer that it's known to be broken and linking 
them to more information.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #11848 
Closes #11850
2021-04-08 14:37:51 -07:00
Brian Behlendorf 600a1dc54c Use dsl_scan_setup_check() to setup a scrub
When a rebuild completes it will automatically schedule a follow up
scrub to verify all of the block checksums.  Before setting up the
scrub execute the counterpart dsl_scan_setup_check() function to
confirm the scrub can be started.  Prior to this change we'd only
check vdev_rebuild_active() which isn't as comprehensive, and using
the check function keeps all of this logic in one place.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11849
2021-04-08 14:33:15 -07:00
Tino Reichardt 9c3b926b0e Fix double sha1/sha1.o line in module/icp/Makefile.in
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #11852
2021-04-08 13:25:24 -07:00
Ryan Moeller 383401589e ZTS: Tests using zhack may fail on FreeBSD
As described in #11854, zhack is occasionally segfaulting on FreeBSD. 
Debugging this is proving to be tricky. To avoid false positives in
the CI add entries for the tests that use zhack in zts-report to 
accept that they may occasionally fail on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Issue #11854
Closes #11855
2021-04-08 13:21:53 -07:00
Ryan Moeller e778b0485b Ratelimit deadman zevents as with delay zevents
Just as delay zevents can flood the zevent pipe when a vdev becomes
unresponsive, so do the deadman zevents.

Ratelimit deadman zevents according to the same tunable as for delay
zevents.

Enable deadman tests on FreeBSD and add a test for deadman event
ratelimiting. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11786
2021-04-07 16:23:57 -07:00
наб c9c6537731 zed: only go up to current limit in close_from() fallback
Consider the following strace log:
  prlimit64(0, RLIMIT_NOFILE,
            NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = -1 EBADF (Bad file descriptor)
  dup2(0, 30000)                      = -1 EBADF (Bad file descriptor)
  dup2(0, 300000)                     = -1 EBADF (Bad file descriptor)
  prlimit64(0, RLIMIT_NOFILE,
            {rlim_cur=1024*1024, rlim_max=1024*1024}, NULL) = 0
  dup2(0, 30)                         = 30
  dup2(0, 300)                        = 300
  dup2(0, 3000)                       = 3000
  dup2(0, 30000)                      = 30000
  dup2(0, 300000)                     = 300000

Even a privileged process needs to bump its rlimit before being able
to use fds higher than rlim_cur.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:52:51 -07:00
наб 509a2dcf7d zed.8: the Diagnosis Engine is implemented
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:52:42 -07:00
наб 54f6daea7a zed: replace zed_file_write_n() with write(2), purge it
We set SA_RESTART early on, which will prevent EINTRs (indeed, to the
point of needing to clear it in the reaper, since it interferes with
pause(2)), which is the only error zed_file_write_n() actually handled
(plus, the pid write is no bigger than 12 bytes anyway)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:52:30 -07:00
наб 3d62acf0ad zed: merge all _NOT_IMPLEMENTED_ events
These events should currently never be generated.

Also untag _zed_event_add_nvpair() from merge with
zpool_do_events_nvprint() ‒ they serve different purposes (machine,
usually script vs human consumption) and format the output differently
as it stands

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:51:34 -07:00
наб 0d0720eb52 zed: remove unused zed_file_read_n()
Same deal as zed_file_close_on_exec()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:51:24 -07:00
наб 1a05182ba0 zed: bump zfs_zevent_len_max if we miss any events
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:51:15 -07:00
наб c52612ba03 zed.8: don't pretend an unprivileged user could change the script owner
And add a note on /why/ ZEDLETs need to be owned by root

Quoth chown(2), Linux man-pages project:
  Only a privileged process (Linux: one with the CAP_CHOWN capability)
  may change the owner of a file.

Quoth chown(2), FreeBSD:
     [EPERM]  The operation would change the ownership,
              but the effective user ID is not the super-user.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:51:06 -07:00
наб ed519ad495 zed: purge all mentions of a configuration file
There simply isn't a need for one, since the flags the daemon takes
are all short (mostly just toggles) and administrative in nature,
and are therefore better served by the age-old tradition of sourcing an
environment file and preparing the cmdline in the init-specific handler
itself, if needed at all

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:50:52 -07:00
наб 64c03a0a27 zed: implement close_from() in terms of /proc/self/fd, if available
/dev/fd on Darwin

Consider the following strace output:
  prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=1024*1024}) = 0

Yes, that is well over a million file descriptors!

This reduces the ZED start-up time from "at least a second" to
"instantaneous", and, under strace, from "don't even try" to "usable"
by simple virtue of doing five syscalls instead of over a million;
in most cases the main loop does nothing

Recent Linuxes (5.8+) have close_range(2) for this, but that's an
overoptimisation (and libcs don't have wrappers for it yet)

This is also run by the ZEDLET pre-exec. Compare:
  Finished "all-syslog.sh" eid=13 pid=6717 time=1.027100s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=13 pid=6718 time=1.046923s exit=0
to
  Finished "all-syslog.sh" eid=12 pid=4834 time=0.001836s exit=0
  Finished "history_event-zfs-list-cacher.sh" eid=12 pid=4835 time=0.001346s exit=0
lol

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:50:38 -07:00
наб 3bc3eef9c3 zed: print combined system/user time after ZEDLET death
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11834
2021-04-07 14:50:03 -07:00
Marcin Skarbek 7367be0da9 Add kmodtool fix to detect different System.map location
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Marcin Skarbek <git@skarbek.name>
Closes #7807 
Closes #11836
2021-04-07 10:17:39 -07:00
Olaf Faaland cfd4a25fce fix misplaced quotes in kmod-preamble
rpm/redhat/zfs-kmod.spec.in has a typo in the shell code that
creates the kmod-preamble file.  This typo results in the
preamble file having the wrong name,

./SOURCES/kmod-preamblenObsoletes

and missing the Obsoletes clause that has become part of the name.

Because the filename is incorrect, the built package does not have
"obsoletes" or "conflicts" set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #11851
2021-04-07 10:10:34 -07:00
Brian Behlendorf d5ef91af02 Obsolete earlier packages due to version bump
In order for package managers such as dnf to upgrade cleanly after
the package SONAME bump the obsolete package names must be known.
Update the new packages to correctly obsolete the old ones.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11844
Closes #11847
2021-04-07 10:09:21 -07:00
наб aa5a4eb5d0 i-t: don't brokenly set the scheduler for root pool vdev's disks
This effectively reverts
  4fc411f7a3 (part of #6807) and
  f6fbe25664 (#9042) ‒
the code itself and latter PR cite symmetry with whole-disk-vdev
behaviour (presumably because rootfs vdevs are rarely whole disks),
but the code is broken for NVME devices (indeed, it'd strip the
controller number instead of the (potential) partition number, turning
"nvme0n1p1" into "nvmen1p1", which would then subsequently fail the
sysfs existence check); it could be fixed to handle those (and any
others) rather easily by dereferencing /sys/class/block/$devname,
but this isn't the place for setting this ‒ as noted in the commit that
removed setting the scheduler by default
(9e17e6f254) ‒ use an udev rule

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11838
2021-04-06 18:29:31 -07:00
наб 55419e0a72 i-t: fix root=zfs:AUTO
IFS= would break loops in import_pool(), which would fault
any automatic import

Additionally $ZFS_BOOTFS from cmdline would interfere with find_rootfs()

If many pools were present, same thing could happen across multiple
find_rootfs() runs, so bail out early and clean up in error path

Suggested-by: @nachtgeist
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11278
Closes #11838
2021-04-06 18:28:38 -07:00
matt-fidd a03b288cf0 zfs get -p only outputs 3 columns if "clones" property is empty
get_clones_string currently returns an empty string for filesystem
snapshots which have no clones. This breaks parsable `zfs get` output as
only three columns are output, instead of 4.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Fiddaman <github@m.fiddaman.uk>
Co-authored-by: matt <matt@fiddaman.net>
Closes #11837
2021-04-06 16:05:54 -07:00
Matthew Ahrens bbcec73783 kmem_alloc(KM_SLEEP) should use kvmalloc()
`kmem_alloc(size>PAGESIZE, KM_SLEEP)` is backed by `kmalloc()`, which
finds contiguous physical memory.  If there isn't enough contiguous
physical memory available (e.g. due to physical page fragmentation), the
OOM killer will be invoked to make more memory available.  This is not
ideal because processes may be killed when there is still plenty of free
memory (it just happens to be in individual pages, not contiguous runs
of pages).  We have observed this when allocating the ~13KB `zfs_cmd_t`,
for example in `zfsdev_ioctl()`.

This commit changes the behavior of
`kmem_alloc(size>PAGESIZE, KM_SLEEP)` when there are insufficient
contiguous free pages.  In this case we will find individual pages and
stitch them together using virtual memory.  This is accomplished by
using `kvmalloc()`, which implements the described behavior by trying
`kmalloc(__GFP_NORETRY)` and falling back on `vmalloc()`.

The behavior of `kmem_alloc(KM_NOSLEEP)` is not changed; it continues to
use `kmalloc(GPF_ATOMIC | __GFP_NORETRY)`.  This is because `vmalloc()`
may sleep.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11461
2021-04-06 12:44:54 -07:00
наб 57a1214e3a zpool-features.5: remove "booting not possible with this feature"s
The exact limitations on what features are supported when booting
vary considerably depending on the environment.  In order to minimize
confusion avoid categorical statements which assume GRUB2 is being 
used.  The supported GRUB2 features are covered earlier in this man 
page for easy reference.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11842
2021-04-06 12:39:54 -07:00
George Melikov d35708b187 man: fix wrong .Xr macros usages
In addition, html doc will have working hyperlinks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11845
2021-04-06 12:27:40 -07:00
наб 61b50107a5 libzutil: zfs_isnumber(): return false if input empty
zpool list, which is the only user, would mistakenly try to parse the
empty string as the interval in this case:

  $ zpool list "a"
  cannot open 'a': no such pool
  $ zpool list ""
  interval cannot be zero
  usage: <usage string follows>
which is now symmetric with zpool get:
  $ zpool list ""
  cannot open '': name must begin with a letter

Avoid breaking the  "interval cannot be zero" string.
There simply isn't a need for this, and it's user-facing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11841 
Closes #11843
2021-04-06 12:25:53 -07:00
Brian Behlendorf ec580225d2 ZTS: pool_checkpoint improvements
The pool_checkpoint tests may incorrectly fail because several of
them invoke zdb for an imported pool.  In this scenario it's not
unexpected for zdb to fail if the pool is modified.  To resolve
this these zdb checks are now done after the pool has been exported.

Additionally, the default cleanup functions assumed the pool would
be imported when they were run.  If this was not the case they're
exit early and fail to cleanup all of the test state causing
subsequent tests to fail.  Add a check to only destroy the pool
when it is imported.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11832
2021-04-03 08:33:22 -07:00
Andrea Gelmini bf169e9f15 Fix various typos
Correct an assortment of typos throughout the code base.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11774
2021-04-02 18:52:15 -07:00
наб 943df59ed9 bash_completion.d: always call zfs/zpool binaries directly
/dev/zfs is 0:0 666 on most systems, so the [ -w /dev/zfs ] check always
succeeds, but if zfs isn't in $PATH (e.g. when completing from
"/sbin/zfs list" on a regular account) this can lead to error spew like

  nabijaczleweli@szarotka:~$ /sbin/zfs list bash: zfs: command not found
  @ bash: zfs: command not found

We only do read-only commands, and quite general ones at that,
so there's no need to elevate one way or another.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11828
2021-04-02 16:34:58 -07:00
Brian Behlendorf c0af3c7b2c Add RELEASES.md file
Document the project's policy regarding publishing and maintaining
official OpenZFS releases.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11821
2021-04-02 16:33:40 -07:00
наб 73218f41b4 zed: allow limiting concurrent jobs
200ms time-out is relatively long, but if we already hit the cap,
then we'll likely be able to spawn multiple new jobs when we wake up

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
2021-04-02 16:30:53 -07:00
наб 02a0fa1999 zed: remove unused zed_file_close_on_exec()
The FIXME comment was there since the initial implementation in 2014,
there are no users

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
2021-04-02 16:30:34 -07:00
наб ca2ce9c50b zed: use separate reaper thread and collect ZEDLETs asynchronously
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
2021-04-02 16:30:08 -07:00
наб 3ef80eefff zed: set names for all threads
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11807
2021-04-02 16:29:35 -07:00
Ryan Moeller 583e320546 ZTS: inheritance/inherit_001_pos is flaky
Add inheritance/inherit_001_pos to the maybe fails on FreeBSD list.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11830
2021-04-02 11:11:52 -07:00
Ryan Moeller dce3176349 Avoid taking global lock to destroy zfsdev state
We have exclusive access to our zfsdev state object in this section
until it is invalidated by setting zs_minor to -1, so we can destroy
the state without taking a lock if we do the invalidation last, after
a member to ensure correct ordering.

While here, strengthen the assertions that zs_minor is valid when we
enter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11751
2021-04-02 11:09:05 -07:00
Ryan Moeller 02aaf11fc7 FreeBSD: Fix stable/12 after AT_BENEATH removal
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11827
2021-04-02 11:06:44 -07:00
Brian Behlendorf fe6babced2 Bump libzfs.so and libzpool.so versions
Bump the library versions as advised by the libtool guidelines.

https://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html

Two new functions were added but no existing functions were changed,
so we increase the version and the age (version:revision:age).

Added functions (2):
- boolean_t zpool_is_draid_spare(const char *);
- zpool_compat_status_t zpool_load_compat(const char *,
      boolean_t *, char *, char *);

Additionally bump the libzpool.so version information.  This library
is for internal use but we still want to update the version to track
major changes to the interfaces.

The libzfsbootenv, libuutil, libnvpair and libzfs_core libraries
have not been updated.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11817
2021-04-01 16:53:05 -07:00
Ryan Moeller c05eec32a7 Allow pool names that look like Solaris disk names
Nothing bad happens if a prefix of your pool name matches a disk name.
This is a bit of a silly restriction at this point.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11781 
Closes #11813
2021-04-01 08:49:41 -07:00
Ryan Moeller 032a213e2e Don't scale zfs_zevent_len_max by CPU count
The lower bound for this scaling to too low and the upper bound is too
high.  Use a fixed default length of 512 instead, which is a reasonable
value on any system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11822
2021-04-01 08:45:04 -07:00
Ryan Moeller 3ba10f9a6a Atomically check and set dropped zevent count
ratelimit_dropped isn't protected by a lock and is expected to
be updated atomically.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11822
2021-04-01 08:43:01 -07:00
Brian Behlendorf 9f9e1b5425 CI: Increase free space in workflow
Recently we've been running out of free space in the ubuntu 20.04
environment resulting in test failures.  This appears to be caused
by a change in the default available free space and not because of
any change in OpenZFS. Try and avoid this failure by applying a
suggested workaround which removes some unnecessary files.

https://github.com/actions/virtual-environments/issues/2840

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11826
2021-04-01 08:39:27 -07:00
Brian Atkinson 19c9247d57 Fixing m4 iops rename check
The configure check for iops->rename wanting flags was missing the
AC_MSG_CHECKING() so it would just print yes without saying what was
being checked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11825
2021-04-01 08:37:41 -07:00
наб 0c2eb3f540 fsck.zfs: implement 4/8 exit codes as suggested in manpage
Update the fsck.zfs helper to bubble up some already-known-about 
errors if they are detected in the pool.

health=degraded => 4/"Filesystem errors left uncorrected"
health=faulted && dataset in /etc/fstab => 8/"Operational error"
pool not found => 8/"Operational error"
everything else => 0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11806
2021-03-31 10:49:56 -07:00
Mike Swanson 67859aedd1 Add compatibility file sets (ZoL 0.6.1, 0.6.4, OpenZFS 2.1)
ZoL 0.6.1 introduced feature flags with the three features that all
implementations at the time were guaranteed to have.  0.6.4 introduced
a few more until 0.6.5 added two after that.  OpenZFS 2.1 added the
dRAID feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes #11818
2021-03-31 09:40:25 -07:00
Brian Behlendorf 9ac82cab2d Update META
Increase the version to 2.1.99 to indicate the master branch is
newer than the 2.1.x release.  This ensures packages built from
master branch are considered to be newer than the last release.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-03-30 10:50:55 -07:00
Brian Behlendorf 3522f57b6a Tag 2.1.0-rc1
New features:
- Distributed Spare (dRAID) Feature
- Added "compatibility" property for zpool feature sets
- Added zpool_influxdb command to collect zpool statistics

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2021-03-29 16:39:52 -07:00
наб 38280c3526 zed: reap child after killing on time-out
When a child process is killed waitpid() must be called on the
pid the reap the zombie process.

Update BUGS section to reflect reality by replacing "zedlets
aren't time limited with "zedlets can be interrupted".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11769 
Closes #11798
2021-03-26 14:21:00 -07:00
Matthew Ahrens 2b56a63457 Use a helper function to clarify gang block size
For gang blocks, `DVA_GET_ASIZE()` is the total space allocated for the
gang DVA including its children BP's.  The space allocated at each DVA's
vdev/offset is `vdev_psize_to_asize(vd, SPA_GANGBLOCKSIZE)`.

This commit makes this relationship more clear by using a helper
function, `vdev_gang_header_asize()`, for the space allocated at the
gang block's vdev/offset.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11744
2021-03-26 11:19:35 -07:00
Matthew Ahrens b85f47efd0 When specifying raidz vdev name, parity count should match
When specifying the name of a RAIDZ vdev on the command line, it can be
specified as raidz-<vdevID> or raidzP-<vdevID>.
e.g. `zpool clear poolname raidz-0` or `zpool clear poolname raidz2-0`

If the parity is specified in the vdev name, it should match the actual
parity of that RAIDZ vdev, otherwise the command should fail.  This
commit makes it so.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Stuart Maybee <stuart.maybee@comcast.net>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11742
2021-03-26 11:12:22 -07:00
Luis Henriques 2037edbdaa Fix error code on __zpl_ioctl_setflags()
Other (all?) Linux filesystems seem to return -EPERM instead of -EACCESS
when trying to set FS_APPEND_FL or FS_IMMUTABLE_FL without the
CAP_LINUX_IMMUTABLE capability.  This was detected by generic/545 test
in the fstest suite.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Luis Henriques <henrix@camandro.org>
Closes #11791
2021-03-26 10:46:45 -07:00
Jessica Clarke ef977fce66 Support running FreeBSD buildworld on Arm-based macOS hosts
Arm-based Macs are like FreeBSD and provide a full 64-bit stat from the
start, so have no stat64 variants. Thus, define stat64 and fstat64 as
aliases for the normal versions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Closes #11771
2021-03-26 10:45:12 -07:00
Andrea Gelmini 8a915ba1f6 Removed duplicated includes
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11775
2021-03-22 12:34:58 -07:00
Andrea Gelmini be1e69f31c Fix typo in Python method name
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #11776
2021-03-22 12:32:38 -07:00
Alexander Motin 891568c990 Split dmu_zfetch() speculation and execution parts
To make better predictions on parallel workloads dmu_zfetch() should
be called as early as possible to reduce possible request reordering.
In particular, it should be called before dmu_buf_hold_array_by_dnode()
calls dbuf_hold(), which may sleep waiting for indirect blocks, waking
up multiple threads same time on completion, that can significantly
reorder the requests, making the stream look like random.  But we
should not issue prefetch requests before the on-demand ones, since
they may get to the disks first despite the I/O scheduler, increasing
on-demand request latency.

This patch splits dmu_zfetch() into two functions: dmu_zfetch_prepare()
and dmu_zfetch_run().  The first can be executed as early as needed.
It only updates statistics and makes predictions without issuing any
I/Os.  The I/O issuance is handled by dmu_zfetch_run(), which can be
called later when all on-demand I/Os are already issued.  It even
tracks the activity of other concurrent threads, issuing the prefetch
only when _all_ on-demand requests are issued.

For many years it was a big problem for storage servers, handling
deeper request queues from their clients, having to either serialize
consequential reads to make ZFS prefetcher usable, or execute the
incoming requests as-is and get almost no prefetch from ZFS, relying
only on deep enough prefetch by the clients.  Benefits of those ways
varied, but neither was perfect.  With this patch deeper queue
sequential read benchmarks with CrystalDiskMark from Windows via
iSCSI to FreeBSD target show me much better throughput with almost
100% prefetcher hit rate, comparing to almost zero before.

While there, I also removed per-stream zs_lock as useless, completely
covered by parent zf_lock.  Also I reused zs_blocks refcount to track
zf_stream linkage of the stream, since I believe previous zs_fetch ==
NULL check in dmu_zfetch_stream_done() was racy.

Delete prefetch streams when they reach ends of files.  It saves up
to 1KB of RAM per file, plus reduces searches through the stream list.

Block data prefetch (speculation and indirect block prefetch is still
done since they are cheaper) if all dbufs of the stream are already
in DMU cache.  First cache miss immediately fires all the prefetch
that would be done for the stream by that time.  It saves some CPU
time if same files within DMU cache capacity are read over and over.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11652
2021-03-19 22:56:11 -07:00
Chunwei Chen 296a4a369b Fix zfs_get_data access to files with wrong generation
If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.

This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #10593
Closes #11682
2021-03-19 22:53:31 -07:00
Andrew 66e6d3f128 Fix regression in POSIX mode behavior
Commit 235a85657 introduced a regression in evaluation of POSIX modes
that require group DENY entries in the internal ZFS ACL. An example
of such a POSX mode is 007. When write_implies_delete_child is set,
then ACE_WRITE_DATA is added to `wanted_dirperms` in prior to calling
zfs_zaccess_common(). This occurs is zfs_zaccess_delete().

Unfortunately, when zfs_zaccess_aces_check hits this particular DENY
ACE, zfs_groupmember() is checked to determine whether access should be
denied, and since zfs_groupmember() always returns B_TRUE on Linux and
so this check is failed, resulting ultimately in EPERM being returned.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #11760
2021-03-19 22:50:46 -07:00
Palash Gandhi c23850759f ZTS: New test for kernel panic induced by redacted send
This change adds a new test that covers a bug fix in the binary search
in the redacted send resume logic that causes a kernel panic.
The bug was fixed in https://github.com/openzfs/zfs/pull/11297.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Palash Gandhi <palash.gandhi@delphix.com>
Closes #11764
2021-03-19 22:47:50 -07:00
Martin Matuška cd5b812818 Allow setting bootfs property on pools with indirect vdevs
The FreeBSD boot loader relies on the bootfs property and is capable
of booting from removed (indirect) vdevs.

Reviewed-by Eric van Gyzen
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11763
2021-03-19 22:46:43 -07:00
Ryan Moeller 0ab84bff55 Fix typo in zgenhostid.8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11770
2021-03-19 22:39:42 -07:00
Brian Atkinson f52124dce8 Removing old code for k(un)map_atomic
It used to be required to pass a enum km_type to kmap_atomic() and
kunmap_atomic(), however this is no longer necessary and the wrappers
zfs_k(un)map_atomic removed these. This is confusing in the ABD code as
the struct abd_iter member iter_km no longer exists and the wrapper
macros simply compile them out.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11768
2021-03-19 22:38:44 -07:00
Serapheim Dimitropoulos 793c958f6f Initialize metaslab range trees in metaslab_init
= Motivation

We've noticed several zloop crashes within Delphix generated
due to the following sequence of events:

- A device gets expanded and new metaslabas are allocated for
  it. These metaslabs go through `metaslab_init()` but haven't
  gone through `metaslab_sync_done()` yet. This meas that the
  only range tree that's actually set is the `ms_allocatable`.
  All the others are NULL.

- A vdev_initialization is issues and `vdev_initialize_thread`
  starts processing one of these new metaslabs of the expanded
  vdev.

- As part of `vdev_initialize_calculate_progress()` we call
  into `metaslab_load()` and `metaslab_load_impl()` which
  in turn tries to dereference the metaslabs trees that
  are still NULL and therefore we crash.

The same failure can come up from the `vdev_trim` code paths.

= This Patch

We considered the following solutions to deal with this issue:

[A] Add logic to `vdev_initialize/trim` to skip those new
    metaslabs. We decided against this as it would be good
    to avoid exposing this lower-level detail to higer-level
    operations.

[B] Have `metaslab_load_impl()` return early for new metaslabs
    and thus never touch those range_trees that are NULL at
    that time. This seemed more of a work-around for the bug
    and not a clear-cut solution.

[C] Refactor our logic so all metaslabs have their range_trees
    created at the time of their creatin in `metaslab_init()`.

In this patch we decided to go with [C] because:

(1) It doesn't expose more metaslab details to higher level
    operations such as vdev initialize and trim.

(2) The current behavior of creating the range trees lazily
    in `metaslab_sync_done()` is unnecessarily complicated.

(3) Always initializing the metaslab range_trees makes other
    parts of the codebase cleaner. For example, we used to
    use `ms_freed` as the reference value for knowing whether
    all the range_trees have been initialized. Now we no
    longer need to do that check in most places (and in the
    few that we do we use the `ms_new` boolean field now
    which is more readable).

= Side Changes

Probably due to a mismerge we set `ms_loaded` to `B_TRUE` twice
in `metasloab_load_impl()`. In this patch we remove the extraneous
assignment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #11737
2021-03-19 22:36:02 -07:00
Coleman Kane ffd6978ef5 Linux 5.12 update: bio_max_segs() replaces BIO_MAX_PAGES
The BIO_MAX_PAGES macro is being retired in favor of a bio_max_segs()
function that implements the typical MIN(x,y) logic used throughout the
kernel for bounding the allocation, and also the new implementation is
intended to be signed-safe (which the former was not).

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11765
2021-03-19 22:33:42 -07:00
Coleman Kane e2a8296131 Linux 5.12 compat: idmapped mounts
In Linux 5.12, the filesystem API was modified to support ipmapped
mounts by adding a "struct user_namespace *" parameter to a number
functions and VFS handlers. This change adds the needed autoconf
macros to detect the new interfaces and updates the code appropriately.
This change does not add support for idmapped mounts, instead it
preserves the existing behavior by passing the initial user namespace
where needed.  A subsequent commit will be required to add support
for idmapped mounted.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11712
2021-03-19 21:00:59 -07:00
Matthew Ahrens 330c6c0523 Clean up RAIDZ/DRAID ereport code
The RAIDZ and DRAID code is responsible for reporting checksum errors on
their child vdevs.  Checksum errors represent events where a disk
returned data or parity that should have been correct, but was not.  In
other words, these are instances of silent data corruption.  The
checksum errors show up in the vdev stats (and thus `zpool status`'s
CKSUM column), and in the event log (`zpool events`).

Note, this is in contrast with the more common "noisy" errors where a
disk goes offline, in which case ZFS knows that the disk is bad and
doesn't try to read it, or the device returns an error on the requested
read or write operation.

RAIDZ/DRAID generate checksum errors via three code paths:

1. When RAIDZ/DRAID reconstructs a damaged block, checksum errors are
reported on any children whose data was not used during the
reconstruction.  This is handled in `raidz_reconstruct()`.  This is the
most common type of RAIDZ/DRAID checksum error.

2. When RAIDZ/DRAID is not able to reconstruct a damaged block, that
means that the data has been lost.  The zio fails and an error is
returned to the consumer (e.g. the read(2) system call).  This would
happen if, for example, three different disks in a RAIDZ2 group are
silently damaged.  Since the damage is silent, it isn't possible to know
which three disks are damaged, so a checksum error is reported against
every child that returned data or parity for this read.  (For DRAID,
typically only one "group" of children is involved in each io.)  This
case is handled in `vdev_raidz_cksum_finish()`. This is the next most
common type of RAIDZ/DRAID checksum error.

3. If RAIDZ/DRAID is not able to reconstruct a damaged block (like in
case 2), but there happens to be additional copies of this block due to
"ditto blocks" (i.e. multiple DVA's in this blkptr_t), and one of those
copies is good, then RAIDZ/DRAID compares each sector of the data or
parity that it retrieved with the good data from the other DVA, and if
they differ then it reports a checksum error on this child.  This
differs from case 2 in that the checksum error is reported on only the
subset of children that actually have bad data or parity.  This case
happens very rarely, since normally only metadata has ditto blocks.  If
the silent damage is extensive, there will be many instances of case 2,
and the pool will likely be unrecoverable.

The code for handling case 3 is considerably more complicated than the
other cases, for two reasons:

1. It needs to run after the main raidz read logic has completed.  The
data RAIDZ read needs to be preserved until after the alternate DVA has
been read, which necessitates refcounts and callbacks managed by the
non-raidz-specific zio layer.

2. It's nontrivial to map the sections of data read by RAIDZ to the
correct data.  For example, the correct data does not include the parity
information, so the parity must be recalculated based on the correct
data, and then compared to the parity that was read from the RAIDZ
children.

Due to the complexity of case 3, the rareness of hitting it, and the
minimal benefit it provides above case 2, this commit removes the code
for case 3.  These types of errors will now be handled the same as case
2, i.e. the checksum error will be reported against all children that
returned data or parity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11735
2021-03-19 16:22:10 -07:00
Mateusz Guzik 2f385c913f FreeBSD: make seqc asserts conditional on replay
Avoids tripping on asserts when doing pool recovery.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11739
2021-03-17 22:09:45 -07:00
Matthew Ahrens 46df6e98aa Remove unused rr_code
The `rr_code` field in `raidz_row_t` is unused.

This commit removes the field, as well as the code that's used to set
it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11736
2021-03-17 21:57:09 -07:00
Ryan Moeller ec3e4c6784 FreeBSD: Fix memory leaks in kstats
Don't handle (incorrectly) kmem_zalloc() failure.  With KM_SLEEP,
will never return NULL.

Free the data allocated for non-virtual kstats when deleting the object.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11767
2021-03-17 21:55:18 -07:00
Adam D. Moss 1daad98176 Linux: always check or verify return of igrab()
zhold() wraps igrab() on Linux, and igrab() may fail when the inode 
is in the process of being deleted.  This means zhold() must only be
called when a reference exists and therefore it cannot be deleted. 
This is the case for all existing consumers so add a VERIFY and a
comment explaining this requirement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11704
2021-03-16 16:33:34 -07:00
Dries Michiels 5f9d61d06b Update FreeBSD versions
Update supported FreeBSD versions in documentation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Dries Michiels <driesm.michiels@gmail.com>
Closes #11718
2021-03-16 15:03:28 -07:00
gldisater 07dff5cffe Hold and release permissions exist
The man page was missing these two permissions.
Add the missing permissions to the man page. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jeremy Faulkner <gldisater@gldis.ca>
Closes #11727
2021-03-16 15:01:21 -07:00
Ryan Moeller 5638803b6a ZTS: Add tests for DOS mode attributes
Create a new section of tests to run with acltype=off.

For now the only test we have is for the DOS mode READONLY attribute on
FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11734
2021-03-16 15:00:14 -07:00
Don Brady dd0b5c8559 Reference_tracking_enable should be a module param
To make use of zfs_refcount_held tunable it should be a module 
parameter in open-zfs.  Also, since the macros will auto-generate OS 
specific tunables, removed the existing zfs_refcount_held reference 
in module/os/freebsd/zfs/sysctl_os.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11753
2021-03-16 14:56:17 -07:00
Ryan Moeller 9305ff2edf ZTS: Fix incorrect use of libtest in user_run by xattr_003_neg
You can't use user_run to eval ksh functions defined in libtest unless
you include libtest in the user shell.

Fix xattr_003_neg by:
* include libtest in the user shell
* *then* run get_xattr
* assert this fails
* use variables for filenames so they don't change in the user's shell
* don't log the contents of /etc/passwd
* cleanup all byproducts

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11185
2021-03-12 16:17:30 -08:00
Ryan Moeller e0b53a5dbb ZTS: Use ksh and current environment for user_run
The current user_run often does not work as expected.  Commands are run
in a different shell, with a different environment, and all output is
discarded.

Simplify user_run to retain the current environment, eliminate eval,
and feed the command string into ksh.  Enhance the logging for
user_run so we can see out and err.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11185
2021-03-12 16:17:01 -08:00
Mariusz Zaborski e464f7c7cc FreeBSD: bring back possibility to rewind the checkpoint from bootloader
Add parsing of the rewind options.

When I was upstreaming the change [1], I omitted the part where we
detect that the pool should be rewind. When the FreeBSD repo has
synced with the OpenZFS, this part of the code was removed.

[1] FreeBSD repo: 277f38abffc6a8160b5044128b5b2c620fbb970c
[2] OpenZFS repo: f2c027bd6a

External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254152
Originally reviewed by: tsoome, allanjude
Originally reviewed by: kevans (ok from high-level overview)
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #11730
2021-03-12 16:12:14 -08:00
Ryan Moeller f845b2dd1c FreeBSD: Clean up zfsdev_close to match Linux
Resolve some oddities in zfsdev_close() which could result in a
panic and were not present in the equivalent function for Linux.

- Remove unused definition ZFS_MIN_MINOR
- FreeBSD: Simplify zfsdev state destruction
- Assert zs_minor is valid in zfsdev_close
- Make locking around zfsdev state match Linux

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11720
2021-03-12 16:09:15 -08:00
Mateusz Guzik e3e82dcc51 FreeBSD: switch teardown lock to rms
This deserializes otherwise non-contending operations.

The previous scheme of using 17 locks hashed by curthread runs into
conflicts very quickly. Check the pull request for sample results.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:48 -08:00
Mateusz Guzik 5ebe425a5b Macroify teardown lock handling
This will allow platforms to implement it as they see fit, in particular
in a different manner than rrm locks.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:39 -08:00
Mateusz Guzik 9847f77f01 FreeBSD: rename teardown inactive macros to mimick rrm convention
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:31 -08:00
Mateusz Guzik f9acd578f0 FreeBSD: remove 2 assertions that teardown lock is not held
They are not very useful and hard to implement in the rms routine
the code is about to start using.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:20 -08:00
Mateusz Guzik 300f68e017 FreeBSD: rework asserts in zfs_dd_lookup
1. even up ifdefs
2. drop the arguably useless teardown lock asserts -- nothing else
   checks for it

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:07 -08:00
Mateusz Guzik 446400346d Add branch prediction to ZFS_ENTER and ZFS_VERIFY_ZP macros
They are expected to fail only in corner cases.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11153
2021-03-12 15:51:03 -08:00
George Wilson 0936981d86 zpool import cachefile improvements
Importing a pool using the cachefile is ideal to reduce the time
required to import a pool. However, if the devices associated with
a pool in the cachefile have changed, then the import would fail.
This can easily be corrected by doing a normal import which would
then read the pool configuration from the labels.

The goal of this change is make importing using a cachefile more
resilient and auto-correcting. This is accomplished by having
the cachefile import logic automatically fallback to reading the
labels of the devices similar to a normal import. The main difference
between the fallback logic and a normal import is that the cachefile
import logic will only look at the device directories that were
originally used when the cachefile was populated. Additionally,
the fallback logic will always import by guid to ensure that only
the pools in the cachefile would be imported.

External-issue: DLPX-71980
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #11716
2021-03-12 15:42:27 -08:00
Martin Matuška b8fa03efbc Fix whitespace introduced in ecc277cff
The manual page change in ecc277c has introduced whitespace on
line ends.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11722
2021-03-11 19:42:04 -08:00
Ryan Moeller 35aa9dc6df FreeBSD: Fix scope of deadman tunables
A few deadman tunables ended up in the wrong sysctl node.

Move them to vfs.zfs.deadman.*

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11715
2021-03-11 19:23:24 -08:00
Adam D. Moss c94d648b1c Microoptimizations for VERIFY() and friends
Add branch hints and constify the intermediate evaluations of 
left/right params in VERIFY3*().

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11708
2021-03-11 17:16:09 -08:00
Allan Jude 92e8fb6395 Add missing files to Makefile
Some .h files that were added were missed in this Makefile. Since 
they are .h files, their being missing only resulted in them 
disappeared from the dist archive.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11705
2021-03-11 17:13:34 -08:00
George Melikov 8d534c37ac CI checkstyle: pin ubuntu version
Our checkstyle doesn't work well on Ubuntu 20.04,
temporary pin it to 18.04.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11713
2021-03-11 17:11:31 -08:00
Don Brady f5ada6538d Return finer grain errors in libzfs unmount_one
Added errno mappings to unmount_one() in libzfs.  Changed do_unmount() 
implementation to return errno errors directly like is done for 
do_mount() and others.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11681
2021-03-08 08:46:45 -08:00
Tony Hutter 4fdbd43450 vdev_id: Create symlinks even if no /dev/mapper/
vdev_id uses the /dev/mapper/ symlinks to resolve a UUID to a dm name
(like dm-1).  However on some multipath setups, there is no /dev/mapper/
entry for the UUID at the time vdev_id is called by udev.  However,
this isn't necessarily needed, as we may be able to resolve the dm
name from the $DEVNAME that udev passes us (like DEVNAME="/dev/dm-1").

This patch tries to resolve the dm name from $DEVNAME first, before
falling back to looking in /dev/mapper/.  This fixed an issue where the
by-vdev names weren't reliably showing up on one of our nodes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11698
2021-03-08 08:43:30 -08:00
Antonio Russo b2eebe3ae7 ZTS events_002: Improve speed and reliability
events_002 exercises the ZED, ensuring that it neither misses events,
nor reporting events twice.

On slow test hardware, some of the timeouts are insufficient to allow
the ZED to properly settle.  Conversely, on fast hardware these same
timeouts are too long, unnecessarily slowing the test run.

Instead of using a fixed timeout, wait for the expected final event
before returning.  Additionally, wait with a timeout for unexpected
events to avoid missing them if they show up late.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11703
2021-03-08 08:42:45 -08:00
Christian Schwarz 93e3658035 zvol: call zil_replaying() during replay
zil_replaying(zil, tx) has the side-effect of informing the ZIL that an
entry has been replayed in the (still open) tx.  The ZIL uses that
information to record the replay progress in the ZIL header when that
tx's txg syncs.

ZPL log entries are not idempotent and logically dependent and thus
calling zil_replaying() is necessary for correctness.

For ZVOLs the question of correctness is more nuanced: ZVOL logs only
TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical
dependencies between two records exist only if the write or discard
request had sync semantics or if the ranges affected by the records
overlap.

Thus, at a first glance, it would be correct to restart replay from
the beginning if we crash before replay completes. But this does not
address the following scenario:
Assume one log record per LWB.
The chain on disk is

    HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z")

where N:W(O, C) represents log entry number N which is a TX_WRITE of C
to offset A.
We replay 1, 2 and 3 in one txg, sync that txg, then crash.
Bit flips corrupt 2, 3, and 4.
We come up again and restart replay from the beginning because
we did not call zil_replaying() during replay.
We replay 1 again, then interpret 2's invalid checksum as the end
of the ZIL chain and call replay done.
The replayed zvol content is "AX".

If we had called zil_replaying() the HDR would have pointed to 3
and our resumed replay would not have replayed anything because
3 was corrupted, resulting in zvol content "BX".

If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's
contents.

This patch adds the zil_replaying() calls to the replay functions.
Since the callbacks in the replay function need the zilog_t* pointer
so that they can call zil_replaying() we open the ZIL while
replaying in zvol_create_minor(). We also verify that replay has
been done when on-demand-opening the ZIL on the first modifying
bio.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11667
2021-03-07 09:49:58 -08:00
Ryan Moeller b30cd70599 ZTS: Improve cleanup in zpool tests
* Restore original kern.corefile value after the test.
* Don't leave behind a frozen pool.
* Clean up leftover vdev files.
* Make zpool_002_pos and zpool_003_pos consistent in their handling of
core files while here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11694
2021-03-07 09:41:01 -08:00
manfromafar ecc277cff7 Clarify compressed zfs send/recv behavior
Docs for send and receive do not explain behavior when sending a 
compressed stream then receiving on a host that overrides compression 
with -o compress=value.

The data from the send stream is written as it was from the send is 
the compressed form but the compression algorithm set on the receiver 
is the overridden version which causes some confusion as to what 
algorithm was actually used.

Updated man docs to clarify behavior

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed By: Allan Jude <allanjude@freebsd.org>
Signed-off-by: manfromafar <manfromafar@outlook.com>
Closes #11690
2021-03-07 09:39:16 -08:00
Ryan Moeller 4b2e20824b Intentionally allow ZFS_READONLY in zfs_write
ZFS_READONLY represents the "DOS R/O" attribute.
When that flag is set, we should behave as if write access
were not granted by anything in the ACL.  In particular:
We _must_ allow writes after opening the file r/w, then
setting the DOS R/O attribute, and writing some more.
(Similar to how you can write after fchmod(fd, 0444).)

Restore these semantics which were lost on FreeBSD when refactoring
zfs_write.  To my knowledge Linux does not actually expose this flag,
but we'll need it to eventually so I've added the supporting checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11693
2021-03-07 09:31:52 -08:00
Brian Behlendorf e7a06356c1 Suppress cppcheck invalidSyntax warninigs
For some reason cppcheck 1.90 is generating an invalidSyntax warning
when the BF64_SET macro is used in the zstream source.  The same
warning is not reported by cppcheck 2.3, nor is their any evident
problem with the expanded macro.  This appears to be an issue with
this version of cppcheck.  This commit annotates the source to suppress
the warning.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11700
2021-03-05 17:56:35 -08:00
Brian Behlendorf 6bbb44e157 Initialize ZIL buffers
When populating a ZIL destination buffer ensure it is always
zeroed before its contents are constructed.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11687
2021-03-05 14:45:13 -08:00
Jorgen Lundman 8a6d444825 Fix abd_get_offset_struct() may allocate new abd
Even when supplied with an abd to abd_get_offset_struct(), the call
to abd_get_offset_impl() can allocate a different abd. Ensure to
call abd_fini_struct() on the abd that is not used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #11683
2021-03-05 12:22:57 -08:00
Ryan Moeller ba74de88c0 FreeBSD module --enable-debug --enable-invariants
Wire up the --enable-debug flag for configure to the FreeBSD module
build.  Add --enable-invariants.

The running FreeBSD kernel config is used to detect whether to enable
INVARIANTS if not explicitly specified with --enable-invariants or
--disable-invariants.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11678
2021-03-05 12:16:41 -08:00
Thomas Lamprecht fd1c366f82 zpool: use tab to intend continuation from removal status
Bring the output of the removal status in line with the other
"fields" that zpool status outputs, and thus allows an parser to
easier detect this as continuation of the 'remove:' output.

Before:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
    776K memory used for removed device mappings

Now:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
	776K memory used for removed device mappings

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Closes #11674
2021-03-05 12:15:35 -08:00
James Wah 92fb29b9f9 Don't bomb out when using keylocation=file://
Avoid following the error path when the operation in fact succeeded.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Wah <james@laird-wah.net>
Closes #11651
2021-03-03 08:28:49 -08:00
Christian Schwarz e439ee83c1 linux: zvol: avoid heap allocation for zvol_request_sync=1
The spl_kmem_alloc showed up in some flamegraphs in a single-threaded
4k sync write workload at 85k IOPS on an
Intel(R) Xeon(R) Silver 4215 CPU @ 2.50GHz.
Certainly not a huge win but I believe the change is clean and
easy to maintain down the road.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11666
2021-03-03 08:15:28 -08:00
Jake Howard 3242b5358e Add "zstd-fast" to help options for "compression" property
This value does work as expected, and is documented in the manpage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jake Howard <git@theorangeone.net>
Closes #11670
2021-03-03 08:14:19 -08:00
nssrikanth bedbc13daa Cancel TRIM / initialize on FAULTED non-writeable vdevs
When a device which is actively trimming or initializing becomes
FAULTED, and therefore no longer writable, cancel the active
TRIM or initialization.  When the device is merely taken offline
with `zpool offline` then stop the operation but do not cancel it.
When the device is brought back online the operation will be
resumed if possible.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Closes #11588
2021-03-02 10:27:27 -08:00
Andriy Gapon 2e160dee97 Fix assert in FreeBSD-specific dmu_read_pages
The function has three similar pieces of code: for read-behind pages,
requested pages and read-ahead pages.  All three pieces had an
assert to ensure that the page is not mapped.  Later the assert was
relaxed to require that the page is not mapped for writing.  But that
was done in two places out of three.  This change fixes the third piece,
read-ahead.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #11654
2021-02-27 17:23:09 -08:00
Brian Behlendorf 3e73ea0c10 ZTS: zpool_trim_start_and_cancel_pos.ksh
Several of the TRIM tests were based of the initialize tests and
then adapted for TRIM.  The zpool_trim_start_and_cancel_pos.ksh
test was intended to be one such test but it was overlooked and
actually never adapted.  Update it accordingly.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11649
2021-02-27 17:19:50 -08:00
Martin Matuška 03ef8f09e1 Add missing checks for unsupported features
After 35ec517 it has become possible to import ZFS pools witn an
active org.illumos:edonr feature on FreeBSD, leading to a panic.

In addition, "zpool status" reported all pools without edonr
as upgradable and "zpool upgrade -v" reported edonr in the list
of upgradable features.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #11653
2021-02-27 17:16:02 -08:00
Coleman Kane 778fa36ee7 Linux 5.12 compat: replace bio_*_io_acct with disk_*_io_acct
The bio_*_acct functions became GPL exports, which causes the
kernel modules to refuse to compile. This replaces code with
alternate function calls to the disk_*_io_acct interfaces, which
are not GPL exports. This change was added in kernel commit
99dfc43ecbf67f12a06512918aaba61d55863efc.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11639
2021-02-24 10:06:05 -08:00
Coleman Kane d939930fcc Linux 5.12 compat: bio->bi_disk member moved
The struct bio member bi_disk was moved underneath a new member named
bi_bdev. So all attempts to reference bio->bi_disk need to now become
bio->bi_bdev->bd_disk.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11639
2021-02-24 10:04:34 -08:00
Brian Behlendorf 8e43fa12c5 Fix vdev_rebuild_thread deadlock
The metaslab_disable() call may block waiting for a txg sync.
Therefore it's important that vdev_rebuild_thread release the
SCL_CONFIG read lock it is holding before this call.  Failure
to do so can result in the txg_sync thread getting blocked
waiting for this lock which results in a deadlock.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewd-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11647
2021-02-24 10:01:00 -08:00
Brian Behlendorf 75a089ed34 Fix overly broad locking in spa_vdev_config_exit()
Calling vdev_free() only requires the we acquire the spa config
SCL_STATE_ALL locks, not the SCL_ALL locks.  In particular, we need
need to avoid taking the SCL_CONFIG lock (included in SCL_ALL) as a
writer since this can lead to a deadlock.  The txg_sync_thread() may
block in spa_txg_history_init_io() when taking the SCL_CONFIG lock
as a reading when it detects there's a pending writer.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11585
2021-02-24 10:00:21 -08:00
Tony Hutter 3ee4e6d8b7 vdev_id: Fix partition regular expression
Given a DM device name, the old vdev_id script would extract any text
after a 'p' as the partition number.  It then appends "-part" + the
partition number to the name, giving a by-vdev name like "L0-part5".

This works fine if the DM name is like 'dm-2p5', but doesn't work if
the DM name is a multipath name like "mpatha".  In those cases it
incorrectly matches the 'p' in "mpatha", giving by-vdev names like
"L0-partatha".

This patch fixes the issue by making the partition regex match stricter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11637
2021-02-24 09:58:46 -08:00
Brian Behlendorf 1dfc82a14e Linux: increase max nvlist_src size
On Linux increase the maximum allowed size of the src nvlist which
can be passed to the /dev/zfs ioctl.  Originally, this was set
to a maximum of KMALLOC_MAX_SIZE (4M) because it was kmalloc'd.
Since that time it's been converted to a vmalloc so that's no
longer a hard limit, and it's desirable for `zfs send/recv` to
allow larger nvlists so more snapshots can be sent at once.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6572
Closes #11638
2021-02-24 09:57:18 -08:00
Prakash Surya f01eaed455 Add upper bound for slop space calculation
This change modifies the behavior of how we determine how much slop
space to use in the pool, such that now it has an upper limit. The
default upper limit is 128G, but is configurable via a tunable.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #11023
2021-02-24 09:52:43 -08:00
Ryan Moeller 5156862960 Wrap bare EINVAL returns with SET_ERROR
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11636
2021-02-24 09:51:10 -08:00
Ryan Moeller 94fa1c3d96 Force symlink creation for zpool.d compat links
gmake install fails when zpool.d compat links already exist.

Force the symlinks to be recreated if already present.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11633
2021-02-24 09:49:59 -08:00
Cedric Maunoury b9c07ec71b send_iterate_snap : doall send without fromsnap
The behavior of a NULL fromsnap was inadvertently changed for a doall
send when the send/recv logic in libzfs was updated.  Restore the
previous behavior by correcting send_iterate_snap() to include all
the snapshots in the nvlist for this case. 

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cedric Maunoury <cedric.maunoury@gmail.com>
Closes #11608
2021-02-24 09:48:58 -08:00
Adam D. Moss 9312e0fd1e Fix error message when zfs module are already unloaded
Using zfs-sh -u on linux will fail with inaccurate message when the
zfs modules are already unloaded.  Deal with the case where a module 
is already unloaded; its USE_COUNT will be the empty string

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11627
2021-02-20 20:23:10 -08:00
fbynite 11f2e9a491 vdev_ops: don't try to call vdev_op_hold or vdev_op_rele when NULL
This prevents a panic after a SLOG add/removal on the root pool followed
by a zpool scrub.

When a SLOG is removed, a hole takes its place - the vdev_ops for a hole
is vdev_hole_ops, which defines the handler functions of vdev_op_hold
and vdev_op_rele as NULL.

This bug has been reported in illumos and FreeBSD, a different trigger
in the FreeBSD report though.

Credit for this patch goes to Patrick Mooney <pmooney@pfmooney.com>

Obtained from: illumos-gate commit: c65bd18728f34725
External-issue: https://www.illumos.org/issues/12981
External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252396
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.fx907@gmail.com>
Closes #11623
2021-02-20 20:19:20 -08:00
Tony Hutter 7e56b05058 Better zfs_get_enclosure_sysfs_path() enclosure support
A multpathed disk will have several 'underlying' paths to the disk.  For
example, multipath disk 'dm-0' may be made up of paths:
/dev/{sda,sdb,sdc,sdd}.  On many enclosures those underlying sysfs
paths will have a symlink back to their enclosure device entry
(like 'enclosure_device0/slot1').  This is used by the
statechange-led.sh script to set/clear the fault LED for a disk, and
by 'zpool status -c'.

However, on some enclosures, those underlying paths may not all have
symlinks back to the enclosure device.  Maybe only two out of four
of them might.

This patch updates zfs_get_enclosure_sysfs_path() to favor returning
paths that have symlinks back to their enclosure devices, rather
than just returning the first path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11617
2021-02-20 20:17:45 -08:00
Brian Atkinson c0801bf35a Cleaning up uio headers
Making uio_impl.h the common header interface between Linux and FreeBSD
so both OS's can share a common header file. This also helps reduce code
duplication for zfs_uio_t for each OS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11622
2021-02-20 20:16:50 -08:00
Christian Schwarz 52cb284f7b ztest: propagate -o to the zdb child process
I think this is the behavior that most users expect.

Future work: have a separate flag, e.g., -O, to specify separate
set_global_vars for the zdb child than for the ztest children.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602
2021-02-19 22:45:16 -08:00
Christian Schwarz 2801e68d1b ztest: fix -o by calling set_global_var in child processes
Without set_global_var() in the child processes the -o option provides
little use.

Before this change set_global_var() was called as a side-effect of
getopt processing which only happens for the parent ztest process.

This change limits the set of options that can be set and makes them
available to the child through ztest_shared_opts_t.

Future work: support arbitrary option count and length.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602
2021-02-19 22:45:10 -08:00
Christian Schwarz edc508ac0b libzpool: set_global_var: refactor to not modify 'arg'
Also fixes leak of the dlopen handle in the error case.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602
2021-02-19 22:45:04 -08:00
Christian Schwarz b5fffa1d29 libzpool: set_global_var: fix endianness handling (fixes zdb -o )
Without this patch I get the error

  Setting global variables is only supported on little-endian systems

when using `zdb -o` on my amd64 machine.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11602
2021-02-19 22:44:05 -08:00
Ryan Moeller 64e0fe14ff Restore FreeBSD resource usage accounting
Add zfs_racct_* interfaces for platform-dependent read/write accounting.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11613
2021-02-19 22:34:33 -08:00
Don Brady 03e02e5b56 Checksum errors may not be counted
Fix regression seen in issue #11545 where checksum errors 
where not being counted or showing up in a zpool event.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #11609
2021-02-19 22:33:15 -08:00
Mark Johnston e7adccf7f5 FreeBSD: disable the use of hardware crypto offload drivers for now
First, the crypto request completion handler contains a bug in that it
fails to reset fs_done correctly after the request is completed.  This
is only a problem for asynchronous drivers.  Second, some hardware
drivers have input constraints which ZFS does not satisfy.  For
instance, ccp(4) apparently requires the AAD length for AES-GCM to be a
multiple of the cipher block size, and with qat(4) the AES-GCM AAD
length may not be longer than 240 bytes.  FreeBSD's generic crypto
framework doesn't have a mechanism to automatically fall back to a
software implementation if a hardware driver cannot process a request,
and ZFS does not tolerate such errors.

The plan is to implement such a fallback mechanism, but with FreeBSD
13.0 approaching we should simply disable the use hardware drivers for
now.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #11612
2021-02-18 15:51:20 -08:00
Andriy Gapon 778869fa13 Fix report_mount_progress never calling set_progress_header
That happens because of an off-by-one mistake.
share_mount_one_cb() calls report_mount_progress(current=sm_done) after
having incremented sm_done by one.  Then report_mount_progress()
increments the parameter again.  It appears that that logic became
obsolete after commit a10d50f999, parallel zfs mount.

On FreeBSD I observe that zfs mount -a -v prints, for example,
    (null): (201/248)
That happens because set_progress_header() is never called.

With this change the output becomes correct:
    Mounting ZFS filesystems: (209/248)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #11607
2021-02-18 13:53:05 -08:00
Ryan Libby bf156c966b Remove unused abd_alloc_scatter_offset_chunkcnt
Remove function that become unused after refactoring in
e2af2acce3.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11614
2021-02-17 21:39:13 -08:00
Colm 658fb8020f Add "compatibility" property for zpool feature sets
Property to allow sets of features to be specified; for compatibility
with specific versions / releases / external systems. Influences
the behavior of 'zpool upgrade' and 'zpool create'. Initial man
page changes and test cases included.

Brief synopsis:

zpool create -o compatibility=off|legacy|file[,file...] pool vdev...

compatibility = off : disable compatibility mode (enable all features)
compatibility = legacy : request that no features be enabled
compatibility = file[,file...] : read features from specified files.
Only features present in *all* files will be enabled on the
resulting pool. Filenames may be absolute, or relative to
/etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc
checked first).

Only affects zpool create, zpool upgrade and zpool status.

ABI changes in libzfs:

* New function "zpool_load_compat" to load and parse compat sets.
* Add "zpool_compat_status_t" typedef for compatibility parse status.
* Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum
* Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum

An initial set of base compatibility sets are included in
cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is
modified to install these in $pkgdatadir/compatibility.d and to
create symbolic links to a reasonable set of aliases.

Reviewed-by: ericloewe
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11468
2021-02-17 21:30:45 -08:00
Brian Behlendorf 35ec51796f FreeBSD: disable edonr in zfs_mod_supported_feature()
Rather than conditionally compiling out the edonr code for FreeBSD
update zfs_mod_supported_feature() to indicate this feature is
unsupported.  This ensures that all spa features are defined on
every platform, even if they are not supported.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11605 
Issue #11468
2021-02-17 08:14:51 -08:00
José Luis Salvador Rufo aef1830f93 Support uClibc for the tests compilations
There are two issues that don't allow ZFS to be compiled using uClibc.
`backtrace()`, and `program_invocation_short_name` as a `const`.
This patch adds uClibc to the conditionals in the same way there are
already for Glibc for `backtrace()`; and removes the external param
`program_invocation_short_name` because its only used here for the
whole project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes #11600
2021-02-16 21:51:46 -08:00
Ryan Moeller 436ab35a53 Make inline ABD predicates compatible with C++
FreeBSD's zfsd fails to build after e2af2acce3 due to strict type
checking errors from the implicit conversion between bool and boolean_t
in the inline predicate definitions in abd.h.

Use conditionals to return the correct value type from these functions.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11592
2021-02-15 10:15:50 -08:00
Brian Behlendorf c1c31a835a Linux 5.11 compat: META
Increase the Linux-Maximum version in the META file to 5.11.
All of the required compatibility patches have been merged.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11586
2021-02-10 10:11:21 -08:00
Arshad Hussain 677cdf7865 vdev_id: Support daisy-chained JBODs in multipath mode
Within function sas_handler() userspace commands like
'/usr/sbin/multipath' have been replaced with sourcing
device details from within sysfs which reduced a
significant amount of overhead and processing time.
Multiple JBOD enclosures and their order are sourced
from the bsg driver (/sys/class/enclosure) to isolate
chassis top-level expanders, which are then dynamically
indexed based on host channel of the multipath subordinate
disk member device being processed. Additionally added a
"mixed" mode for slot identification for environments where
a ZFS server system may contain SAS disk slots where there
is no expander (direct connect to HBA) while an attached
external JBOD with an expander have different slot identifier
methods.

How Has This Been Tested?
~~~~~~~~~~~~~~~~~~~~~~~~~

Testing was performed on a AMD EPYC based dual-server
high-availability multipath environment with multiple
HBAs per ZFS server and four SAS JBODs. The two primary
JBODs were multipath/cross-connected between the two
ZFS-HA servers. The secondary JBODs were daisy-chained
off of the primary JBODs using aligned SAS expander
channels (JBOD-0 expanderA--->JBOD-1 expanderA,
          JBOD-0 expanderB--->JBOD-1 expanderB, etc).
Pools were created, exported and re-imported, imported
globally with 'zpool import -a -d /dev/disk/by-vdev'.
Low level udev debug outputs were traced to isolate
and resolve errors.

Result:
~~~~~~~

Initial testing of a previous version of this change
showed how reliance on userspace utilities like
'/usr/sbin/multipath' and '/usr/bin/lsscsi' were
exacerbated by increasing numbers of disks and JBODs.
With four 60-disk SAS JBODs and 240 disks the time to
process a udevadm trigger was 3 minutes 30 seconds
during which nearly all CPU cores were above 80%
utilization. By switching reliance on userspace
utilities to sysfs in this version, the udevadm
trigger processing time was reduced to 12.2 seconds
and negligible CPU load.

This patch also fixes few shellcheck complains.

Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Jeff Johnson <jeff.johnson@aeoncomputing.com>
Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11526
2021-02-09 13:04:09 -08:00
khng300 fc273894d2 Rename zfs_inode_update to zfs_znode_update_vfs
zfs_znode_update_vfs is a more platform-agnostic name than
zfs_inode_update. Besides that, the function's prototype is moved to
include/sys/zfs_znode.h as the function is also used in common code.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ka Ho Ng <khng300@gmail.com>
Sponsored by: The FreeBSD Foundation
Closes #11580
2021-02-09 11:17:29 -08:00
Kleber Tarcísio 4f22619ae3 Add an assert to clarify code
The first time through the loop prevdb and prevhdl are NULL.  They 
are then both set, but only prevdb is checked.  Add an ASSERT to 
make it clear that prevhdl must be set when prevdb is.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kleber <klebertarcisio@yahoo.com.br>
Closes #10754
Closes #11575
2021-02-09 11:14:59 -08:00
Antonio Russo f8ce8aed0c Set file mode during zfs_write
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between
Linux and BSD.  After a successful write, the suid/sgid bits are reset,
and the mode to be written is stored in newmode.  On Linux, this was
propagated to both the in-memory inode and znode, which is then updated
with sa_update.

3d40b65 accidentally removed the initialization of newmode, which
happened to occur on the same line as the inode update (which has been
moved out of the function).

The uninitialized newmode can be saved to disk, leading to a crash on
stat() of that file, in addition to a merely incorrect file mode.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11474 
Closes #11576
2021-02-08 09:15:05 -08:00
наб 7c64ee9e77 zfs-import-{cache,scan}: change condition to FileNotEmpty
When all pools are exported ZFS will generate an empty cache file.
This will cause the import service to fail, which is sub-optimal, 
since this means that dracut fails, and it necessary to run 
`zpool import -a` to boot, delete the file, and regenerate+reinstall 
the initrd.

This resolves the issue by treating an zero-length cache files the
same as a missing cache file.  This aligns the behavior with that
of the `zpool` command itself.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11568
2021-02-05 11:25:22 -08:00
nssrikanth 32366649d3 Fixed issue with processing of EC_dev_remove event
The pool guid and vdev guid received by zfs_agent_post_event(),
which calls zfs_retire_recv(), are normally non-zero.  However,
later in this same method they may be unconditionally reset to
zero by the code which is intended to handle  multipath, spare
and l2arc vdevs.  This will result in the EC_dev_remove not
being handled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>\
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Closes #11564
2021-02-05 08:30:50 -08:00
Brian Behlendorf d66f017c17 zfs-list.8: clarify listing snapshots
Clarify how to include snapshots in the `zpool list` output by
referencing the full name of the `listsnapshots` pool property,
and the `zpool list -t snapshot` option.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11562
Closes #11565
2021-02-04 09:56:28 -08:00
Christian Schwarz 84268b099b Document monotonicity of dmu_tx_assign() and txg_hold_open()
Expand the comments to make it clear exactly what is guaranteed
by dmu_tx_assign() and txg_hold_open().  Additionally, update
the comment which refers to txg_exit() when it should reference
txg_rele_to_sync().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11521
2021-02-02 10:11:37 -08:00
George Melikov 9eee7fce3b zts-report.py: ignore some skipped tests in Github CI
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554
2021-02-02 09:37:23 -08:00
George Melikov aa743acc79 CI: add ubuntu-* functional tests runner
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554
2021-02-02 09:37:23 -08:00
George Melikov 1051499aef CI: rename zfs-tests workflow
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11554
2021-02-02 09:37:11 -08:00
Brian Behlendorf 96a6629872 Remove unused iov_iter_init_compat() wrapper
This compatibility code is no longer needed.  For it a while
iov_iter_init_compat() was used by zfs_uio_prefaultpages() but
this code should have been dropped as part of commit 83b91ae1.
Take care of that oversight and remove it.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11543
2021-01-30 10:06:14 -08:00
Matthew Ahrens 2d4bbd14fc The abd child/parent relationship does not need to be tracked
ABD's currently track their parent/child relationship.  This applies to
`abd_get_offset()` and `abd_borrow_buf()`.  However, nothing depends on
knowing this relationship, it's only used for consistency checks to
verify that we are not destroying an ABD that's still in use.  When we
are creating/destroying ABD's frequently, the performance impact of
maintaining these data structures (in particular the atomic
increment/decrement operations) can be measurable.

This commit removes this verification code on production builds, but
keeps it when ZFS_DEBUG is set.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11535
2021-01-30 10:04:42 -08:00
nssrikanth cf0a6dd3ed Added extra check to replace Faulted VDEV with Distributed Spare
In ZED zfs_retire agent added a check to handle Distributed Spare
replacement for Faulted VDEV also.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Mark Maybee <mark.maybee@hpe.com>
Closes #11354 
Closes #11355
2021-01-28 17:00:26 -08:00
Brian Atkinson 2993698eb3 Fixing gang ABD when adding another gang
I originally applied a fix in #11539 to fix a parent's child references
when a gang ABD is free'd. However, I did not take into account
abd_gang_add_gang(). We still need to make sure to update the child
references in this function as well. In order to resolve this I removed
decreasing the gang ABD's size in abd_free_gang() as well as moved back
the original placeent of zfs_refcount_remove_many() in abd_free().

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11542
2021-01-28 16:54:12 -08:00
George Melikov 9f8c7e6a76 ZTS: add userspace_send_encrypted.ksh to Makefile
All tests need to be included in the Makefiles.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11541
2021-01-28 13:39:38 -08:00
Matthew Ahrens f8c0d7e1f6 fix abd_nr_pages_off for gang abd
`__vdev_disk_physio()` uses `abd_nr_pages_off()` to allocate a bio with
a sufficient number of iovec's to process this zio (i.e.
`nr_iovecs`/`bi_max_vecs`).  If there are not enough iovec's in the bio,
then additional bio's will be allocated.  However, this is a sub-optimal
code path.  In particular, it requires several abd calls (to
`abd_nr_pages_off()` and `abd_bio_map_off()`) which will have to walk
the constituents of the ABD (the pages or the gang children) because
they are looking for offsets > 0.

For gang ABD's, `abd_nr_pages_off()` returns the number of iovec's
needed for the first constituent, rather than the sum of all
constituents (within the requested range).  This always under-estimates
the required number of iovec's, which causes us to always need several
bio's.  The end result is that `__vdev_disk_physio()` is usually O(n^2)
for gang ABD's (and occasionally O(n^3), when more than 16 bio's are
needed).

This commit fixes `abd_nr_pages_off()`'s handling of gang ABD's, to
correctly determine how many iovec's are needed, by adding up the number
of iovec's for each of the gang children in the requested range.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11536
2021-01-28 09:28:20 -08:00
George Amanakis 0ae184a6ba Avoid updating the L2ARC device header unnecessarily
If we do not write any buffers to the cache device and the evict hand
has not advanced do not update the cache device header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11522 
Closes #11537
2021-01-28 09:20:03 -08:00
Brian Atkinson 416015ef54 Removing ABD Parent Child Reference Before Freeing ABD
Moving the call to zfs_refcount_remove_many() in abd_free() to be called
before any of the ABD free variants are called. This is necessary
because abd_free_gang() adjusts the abd_size for the gang ABD. If the
parent's child references are removed after free'ing the gang ABD the
refcount is not adjusted correctly for the parent's children.

I also removed some stray abd_put() in comments and changed
abd_free_gang_abd() -> abd_free_gang().

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11539
2021-01-28 09:15:17 -08:00
Allan Jude 393e69241e Add zdb -r <dataset> <object-id | file> <output>
While you can use zdb -R poolname vdev:offset:[<lsize>/]<psize>[:flags] 
to extract individual DVAs from a vdev, it would be handy for be able 
copy an entire file out of the pool.

Given a file or object number, add support to copy the contents to a 
file. Useful for debugging and recovery.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #11027
2021-01-27 21:36:01 -08:00
Mark Maybee b2c5904a78 Revert special case code from pre-hashtable nvlist era
Before a hash table was added on top of the nvlist code, there were
cases where the nvlist allocation was changed from fnvlist_alloc()
to nvlist_alloc() to avoid expensive NV_UNIQUE_NAME checks. Now
this is no longer necessary. These changes should be reverted to be
consistent with other code. There are some cases where this change
will also reduce the number of iterations.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Maybee <mark.maybee@delphix.com>
Closes #11464
2021-01-27 21:31:51 -08:00
Paul Dagnelie 2921ad6cba Fix zrele race in zrele_async that can cause hang
There is a race condition in zfs_zrele_async when we are checking if 
we would be the one to evict an inode. This can lead to a txg sync 
deadlock.

Instead of calling into iput directly, we attempt to perform the atomic 
decrement ourselves, unless that would set the i_count value to zero. 
In that case, we dispatch a call to iput to run later, to prevent a 
deadlock from occurring.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11527 
Closes #11530
2021-01-27 21:29:58 -08:00
George Melikov b8e6401b79 ZTS: pool_state test check for pool existence in cleanup
If there is no scsi_debug module, then this test
must be skipped, in this case cleanup routine should
be prepared for absent pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11534
2021-01-27 17:33:30 -08:00
Alan Somers 6b2e7203ae Fix a resource leak in uu_avl_pool_destroy
Need to destroy the pthread mutex created in uu_avl_pool_create.

https://svnweb.freebsd.org/base?view=revision&revision=262912

Obtained from: FreeBSD
Sponsored by:	Spectra Logic Corporation
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11528
2021-01-26 19:39:28 -08:00
Alan Somers cf0977ad72 Parallelize vdev_validate
The runtime of vdev_validate is dominated by the disk accesses in
vdev_label_read_config.  Speed it up by validating all vdevs in
parallel using a taskq.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470
2021-01-26 19:36:51 -08:00
Alan Somers 67874d5487 Read all disk labels concurrently in vdev_label_read_config
This is similar to what we already do in vdev_geom_read_config.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470
2021-01-26 19:36:02 -08:00
Alan Somers a0e01997ec Parallelize vdev_load
metaslab_init is the slowest part of importing a mature pool, and it
must be repeated hundreds of times for each top-level vdev.  But its
speed is dominated by a few serialized disk accesses.  That can lead to
import times of > 1 hour for pools with many top-level vdevs on spinny
disks.

Speed up the import by using a taskqueue to parallelize vdev_load across
all top-level vdevs.

This also requires adding mutex protection to
metaslab_class_t.mc_historgram.  The mc_histogram fields were
unprotected when that code was first written in "Illumos 4976-4984 -
metaslab improvements" (OpenZFS
f3a7f6610f).  The lock wasn't added until
3dfb57a35e, though it's unclear exactly
which fields it's supposed to protect.  In any case, it wasn't until
vdev_load was parallelized that any code attempted concurrent access to
those fields.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470
2021-01-26 19:35:59 -08:00
Alan Somers dfb44c500e Fix a man page link in zfs-program.8
zfs-program.8 has an orphan link, fix it.

https://svnweb.freebsd.org/base?view=revision&revision=360080

Obtained from: FreeBSD
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11529
2021-01-26 16:17:11 -08:00
Brian Behlendorf 0e6c493fec cppcheck: integrete cppcheck
In order for cppcheck to perform a proper analysis it needs to be
aware of how the sources are compiled (source files, include
paths/files, extra defines, etc).  All the needed information is
available from the Makefiles and can be leveraged with a generic
cppcheck Makefile target.  So let's add one.

Additional minor changes:

* Removing the cppcheck-suppressions.txt file.  With cppcheck 2.3
  and these changes it appears to no longer be needed.  Some inline
  suppressions were also removed since they appear not to be
  needed.  We can add them back if it turns out they're needed
  for older versions of cppcheck.

* Added the ax_count_cpus m4 macro to detect at configure time how
  many processors are available in order to run multiple cppcheck
  jobs.  This value is also now used as a replacement for nproc
  when executing the kernel interface checks.

* "PHONY =" line moved in to the Rules.am file which is included
  at the top of all Makefile.am's.  This is just convenient becase
  it allows us to use the += syntax to add phony targets.

* One upside of this integration worth mentioning is it now allows
  `make cppcheck` to be run in any directory to check that subtree.

* For the moment, cppcheck is not run against the FreeBSD specific
  kernel sources.  The cppcheck-FreeBSD target will need to be
  implemented and testing on FreeBSD to support this.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:12:26 -08:00
Brian Behlendorf a06ba74a1e cppcheck: return value always 0
Identical condition and return expression 'rc', return value is
always 0.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:12:18 -08:00
Brian Behlendorf 2cdd75bed7 cppcheck: remove redundant ASSERTs
The ASSERT that the passed pointer isn't NULL appears after the
pointer has already been dereferenced.  Remove the redundant check.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:12:10 -08:00
Brian Behlendorf 6fc1ce0723 cppcheck: resolve double free
The double free reported for the realloc() failure branch is a
false positive.  It should be resolved in cppcheck 2.4 but for
the benefit of older versions we supress the warning.

    https://trac.cppcheck.net/ticket/9292

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:12:02 -08:00
Brian Behlendorf 7454d2bb8d cppcheck: zpool_main.c possible null pointer dereference
Explicitly check for NULL to satisfy cppcheck that "val" can never
be NULL when passed to printf().  This looks like a false positive
since is_blank_str() can never take the false conditional branch
when passed a NULL.  But there's no harm in adding the extra check.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11508
2021-01-26 16:11:46 -08:00
Matthew Ahrens 62d4287f27 RAIDZ2/3 fails to heal silently corrupted parity w/2+ bad disks
When scrubbing, (non-sequential) resilvering, or correcting a checksum
error using RAIDZ parity, ZFS should heal any incorrect RAIDZ parity by
overwriting it.  For example, if P disks are silently corrupted (P being
the number of failures tolerated; e.g. RAIDZ2 has P=2), `zpool scrub`
should detect and heal all the bad state on these disks, including
parity.  This way if there is a subsequent failure we are fully
protected.

With RAIDZ2 or RAIDZ3, a block can have silent damage to a parity
sector, and also damage (silent or known) to a data sector.  In this
case the parity should be healed but it is not.

The problem can be noticed by scrubbing the pool twice.  Assuming there
was no damage concurrent with the scrubs, the first scrub should fix all
silent damage, and the second scrub should be "clean" (`zpool status`
should not report checksum errors on any disks).  If the bug is
encountered, then the second scrub will repair the silently-damaged
parity that the first scrub failed to repair, and these checksum errors
will be reported after the second scrub.  Since the first scrub repaired
all the damaged data, the bug can not be encountered during the second
scrub, so subsequent scrubs (more than two) are not necessary.

The root cause of the problem is some code that was inadvertently added
to `raidz_parity_verify()` by the DRAID changes.  The incorrect code
causes the parity healing to be aborted if there is damaged data
(`rc_error != 0`) or the data disk is not present (`!rc_tried`).  These
checks are not necessary, because we only call `raidz_parity_verify()`
if we have the correct data (which may have been reconstructed using
parity, and which was verified by the checksum).

This commit fixes the problem by removing the incorrect checks in
`raidz_parity_verify()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11489 
Closes #11510
2021-01-26 16:05:05 -08:00
Will Andrews d7265b3309 ZTS: zpool_export test improvements
- refactor cleanup routines into common kshlib zpool_export_cleanup func
- don't require physical disks to test, just use files

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Will Andrews <will@firepipe.net>
Closes #11518
2021-01-26 13:14:04 -08:00
Lorenz Hüdepohl caedada66e dracut: Fix race condition between load-key and import
zfs-load-key.sh is called by the dracut-pre-mount.service unit which has
no explicit 'After' dependency on zfs-import.target. That way it can be
that the pool has not yet been imported and the zfs-load-key.sh finishes
without ever seeing the relevant pool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes #11500
2021-01-26 12:14:22 -08:00
Will Andrews f4f50a7048 spa_export_common: refactor common exit points
Create a common exit point for spa_export_common (a very long 
function), which avoids missing steps on failure.  This work
is helpful for the planned forced pool export changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Will Andrews <will@firepipe.net>
Closes #11514
2021-01-25 15:04:11 -08:00
Will Andrews 35ac0ed1fd ZTS: improve output clarity of check_prop_source
Instead of just failing, indicate the expected and actual value and
source as a NOTE.  Tests using this failed in an earlier version of
the changeset and this information helped find the cause.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Will Andrews <will@firepipe.net>
Closes #11517
2021-01-25 14:39:58 -08:00
Will Andrews a57acbb627 ZTS: remove duplicate check_prop_source from zfs_receive
There is an identical definition in zfs_set_common.kshlib already.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Will Andrews <will@firepipe.net>
Closes #11516
2021-01-25 14:38:19 -08:00
Will Andrews de1a6cc387 logapi: cat output file instead of printing
This avoids globbing together multiple lines in the log, if you happen
to specify LOGAPI_DEBUG because you want to see it.

Signed-off-by:	Will Andrews <will@firepipe.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11515
2021-01-25 14:33:57 -08:00
Matthew Macy a4134da2b2 spl-taskq: Make sure thread tsd hash entry is cleared
Like any other thread created by thread_create() we need to call
thread_exit() to properly clean it up.  In particular, this ensures the
tsd hash for the thread is cleared.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11512
2021-01-25 11:18:28 -08:00
Alan Somers fd95af8dd4 Speed up "zpool import" in the presence of many zvols
By default, FreeBSD does not allow zpools to be backed by zvols (that
can be changed with the "vfs.zfs.vol.recursive" sysctl). When that
sysctl is set to 0, the kernel does not attempt to read zvols when
looking for vdevs. But the zpool command still does. This change brings
the zpool command into line with the kernel's behavior. It speeds "zpool
import" when an already imported pool has many zvols, or a zvol with
many snapshots.

https://svnweb.freebsd.org/base?view=revision&revision=357235
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241083
https://reviews.freebsd.org/D22077

Obtained from: FreeBSD
Reported by: Martin Birgmeier <d8zNeCFG@aon.at>
Sponsored by: Axcient
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11502
2021-01-24 16:02:45 -08:00
наб 8887760aef zfsprops.8: fix mispluralisation in "Default values is"
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11509
2021-01-24 15:57:51 -08:00
Ryan Moeller a4acc47e4b ZTS: Use swapctl to list swap devices on FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11503
2021-01-24 15:56:59 -08:00
Arshad Hussain e2870fb24a vdev_id: Add error message when $CONFIG is missing
It was observed that vdev_id exists silently when
the $CONFIG file is missing.

This patch adds error message in case vdev_id is
called without default $CONFIG or '-c'. This makes
end user observe the exit message more easily.

Before Patch:
~~~~~~~~~~~~~
$ ./cmd/vdev_id/vdev_id
$

After Patch:
~~~~~~~~~~~~
$ ./cmd/vdev_id/vdev_id
Error: Config file "/etc/zfs/vdev_id.conf" not found
$

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes #11498
2021-01-23 15:52:29 -08:00
Colm 4a90d4d6fc Fix two minor lint errors (cppcheck)
Fix two minor errors reported by cppcheck:

In module/zfs/abd.c (abd_get_offset_impl), add non-NULL
assertion to prevent NULL dereference warning.

In module/zfs/arc.c (l2arc_write_buffers), change 'try'
variable to 'pass' to avoid C++ reserved word.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #11507
2021-01-23 15:49:32 -08:00
Alexander Motin 5aa69a57da Relax special_small_blocks assertion.
Follow up for commit 624222a, value asserted <= SPA_OLD_MAXBLOCKSIZE
instead of SPA_MAXBLOCKSIZE as it should be after the previous change.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11501
2021-01-23 15:45:27 -08:00
Matthew Macy 716408f560 Add basic io_uring test
Provide a basic test coverage for io_uring I/O.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11497
2021-01-23 15:42:42 -08:00
Ryan Moeller 1c94345103 FreeBSD: upstream changes to VFS interface
Set VIRF_MOUNTPOINT flag on snapshot mountpoint.

Authored-by: Mateusz Guzik <mjg@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11458
2021-01-23 15:40:43 -08:00
Matt Macy 0e9bcd5d4f FreeBSD: fix HEAD build, conditionally remove FDSYNC defines
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11458
2021-01-23 15:39:55 -08:00
Brian Behlendorf 76e1f78d4b Only add supported features during pool creation
When creating a pool only features supported by both user and
kernel space should be enabled.  Furthermore, improve the error
messages when attempting to create, or add, a dRAID vdev when
the dRAID feature is not supported by the kernel modules.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11492
2021-01-22 09:47:06 -08:00
Matthew Ahrens aa755b3549 Set aside a metaslab for ZIL blocks
Mixing ZIL and normal allocations has several problems:

1. The ZIL allocations are allocated, written to disk, and then a few
seconds later freed.  This leaves behind holes (free segments) where the
ZIL blocks used to be, which increases fragmentation, which negatively
impacts performance.

2. When under moderate load, ZIL allocations are of 128KB.  If the pool
is fairly fragmented, there may not be many free chunks of that size.
This causes ZFS to load more metaslabs to locate free segments of 128KB
or more.  The loading happens synchronously (from zil_commit()), and can
take around a second even if the metaslab's spacemap is cached in the
ARC.  All concurrent synchronous operations on this filesystem must wait
while the metaslab is loading.  This can cause a significant performance
impact.

3. If the pool is very fragmented, there may be zero free chunks of
128KB or more.  In this case, the ZIL falls back to txg_wait_synced(),
which has an enormous performance impact.

These problems can be eliminated by using a dedicated log device
("slog"), even one with the same performance characteristics as the
normal devices.

This change sets aside one metaslab from each top-level vdev that is
preferentially used for ZIL allocations (vdev_log_mg,
spa_embedded_log_class).  From an allocation perspective, this is
similar to having a dedicated log device, and it eliminates the
above-mentioned performance problems.

Log (ZIL) blocks can be allocated from the following locations.  Each
one is tried in order until the allocation succeeds:
1. dedicated log vdevs, aka "slog" (spa_log_class)
2. embedded slog metaslabs (spa_embedded_log_class)
3. other metaslabs in normal vdevs (spa_normal_class)

The space required for the embedded slog metaslabs is usually between
0.5% and 1.0% of the pool, and comes out of the existing 3.2% of "slop"
space that is not available for user data.

On an all-ssd system with 4TB storage, 87% fragmentation, 60% capacity,
and recordsize=8k, testing shows a ~50% performance increase on random
8k sync writes.  On even more fragmented systems (which hit problem #3
above and call txg_wait_synced()), the performance improvement can be
arbitrarily large (>100x).

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11389
2021-01-21 15:12:54 -08:00
Lorenz Hüdepohl 984362a71e dracut: Support /usr/bin as 'systemctl' path
On openSUSE the initrd has systemctl in /usr/bin, check this path as
well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes #11487
2021-01-21 12:59:24 -08:00
Antonio Russo 0ae733c7a4 Install zgenhostid to sbindir
zgenhostid(8) is used to modify or create /etc/hostid.  This
administrative tool is currently installed to bindir.  System utilities
are typically placed in sbin.

Modify the installation directory for zgenhostid.  Additionally, track
this change in its use in dracut and the rpm installation.

Authored-by: наб <nabijaczleweli@nabijaczleweli.xyz>
Authored-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11485
2021-01-21 12:58:24 -08:00
Alan Somers 2d8f72d76c zpool: speed up importing large pools (#11469)
The ZFS_IOC_POOL_TRYIMPORT ioctl returns an nvlist from the kernel to a
preallocated buffer in  userland.  Userland must guess how large the
buffer should be.  If it undersizes it, it must reallocate and try
again.  That can cost a lot of time for large pools.

OpenZFS commit 28b40c8a6e set the guess at "zc.zc_nvlist_conf_size * 4"
without explanation.  On my system, that is too small.  From experiment,
x 32 is a better multiplier.  But I don't know how to calculate it
theoretically.

Sponsored by:	Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by:	Alan Somers <asomers@gmail.com>
Closes #11469
2021-01-21 12:55:54 -08:00
Alan Somers e50b5217e7 libzutil: optimize zpool_read_label with AIO
Read all labels in parallel instead of sequentially.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=b49e9abcf44cafaf5cfad7029c9a6adbb28346e8

Obtained from: FreeBSD
Sponsored by: Spectra Logic, Axcient
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11467
2021-01-21 11:24:35 -08:00
Alan Somers ec40ce8405 libzutil: don't read extraneous data in zpool_read_label
zpool_read_label doesn't need the full labels including uberblocks.  It
only needs the vdev_phys_t.  This reduces by half the amount of data
read to check for a label, speeding up "zpool import", "zpool
labelclear", etc.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=63f8025d6acab1b334373ddd33f940a69b3b54cc

Obtained from: FreeBSD
Sponsored by: Spectra Logic Corp, Axcient
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11467
2021-01-21 11:18:32 -08:00
Brian Behlendorf 83b91ae1a4 Linux 5.10 compat: restore custom uio_prefaultpages()
As part of commit 1c2358c1 the custom uio_prefaultpages() code
was removed in favor of using the generic kernel provided
iov_iter_fault_in_readable() interface.  Unfortunately, it
turns out that up until the Linux 4.7 kernel the function would
only ever fault in the first iovec of the iov_iter.  The result
being uiomove_iov() may hang waiting for the page.

This commit effectively restores the custom uio_prefaultpages()
pages code for Linux 4.9 and earlier kernels which contain the
troublesome version of iov_iter_fault_in_readable().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11463 
Closes #11484
2021-01-21 10:43:39 -08:00
Brian Atkinson d0cd9a5cc6 Extending FreeBSD UIO Struct
In FreeBSD the struct uio was just a typedef to uio_t. In order to
extend this struct, outside of the definition for the struct uio, the
struct uio has been embedded inside of a uio_t struct.

Also renamed all the uio_* interfaces to be zfs_uio_* to make it clear
this is a ZFS interface.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11438
2021-01-20 21:27:30 -08:00
Matthew Ahrens e2af2acce3 allow callers to allocate and provide the abd_t struct
The `abd_get_offset_*()` routines create an abd_t that references
another abd_t, and doesn't allocate any pages/buffers of its own.  In
some workloads, these routines may be called frequently, to create many
abd_t's representing small pieces of a single large abd_t.  In
particular, the upcoming RAIDZ Expansion project makes heavy use of
these routines.

This commit adds the ability for the caller to allocate and provide the
abd_t struct to a variant of `abd_get_offset_*()`.  This eliminates the
cost of allocating the abd_t and performing the accounting associated
with it (`abdstat_struct_size`).  The RAIDZ/DRAID code uses this for
the `rc_abd`, which references the zio's abd.  The upcoming RAIDZ
Expansion project will leverage this infrastructure to increase
performance of reads post-expansion by around 50%.

Additionally, some of the interfaces around creating and destroying
abd_t's are cleaned up.  Most significantly, the distinction between
`abd_put()` and `abd_free()` is eliminated; all types of abd_t's are
now disposed of with `abd_free()`.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Issue #8853 
Closes #11439
2021-01-20 11:24:37 -08:00
sterlingjensen 03f036cbcc Re-apply path sanitizer, as mount(8) still mangles it
Prior to util-linux 2.36.2, if a file or directory in the
current working directory was named 'dataset' then mount(8)
would prepend the current working directory to the dataset.

Eventually, we should be able to drop this workaround.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11295 
Closes #11462
2021-01-19 11:57:31 -08:00
Antonio Russo f8c4d63a26 ZTS: avoid piping to special devices
As described in #11445, the kernel interface kernel_{read,write} no
longer act on special devices.  In the ZTS, zfs send and receive are
tested by piping to these devices, leading to spurious failures (for
positive tests) and may mask errors (for negative tests).

Until a more permanent mechanism to address this deficiency is
developed, clean up the output from the ZTS by avoiding directly piping
to or from /dev/null and /dev/zero.

For /dev/zero input, simply use a pipe: `cat </dev/zero |` .

However, for /dev/null output, the shell semantics for pipe failures
means that zfs send error codes will be masked by the successful
`| cat >/dev/null` command execution.  In that case, use a temporary
file under $TEST_BASE_DIR for output in favor.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11478
2021-01-19 11:53:35 -08:00
Ryan Moeller 60a2434b29 libzfs_sendrecv: Use fnv* to verify nvlist/nvpair*
Use verified variants of nvlist/nvpair functions where applicable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11460
2021-01-14 09:53:09 -08:00
Ryan Moeller a9eaae065b ztest: Clean up use of ASSERT and VERIFY
Try to use more appropriate ASSERT and VERIFY variants in ztest.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11454
2021-01-12 17:21:01 -08:00
Antonio Russo 8752f7e320 ZTS: avoid race to unmount in zfs_rollback_001
The zfs_rollback_001 test modifies files in a temporary, test dataset
repeatedly.  Before each iteration, any preexisting dataset is removed,
after unmounted with umount -f, if necessary.

Add a short delay after the forced unmount, avoiding a race that can
prevent zfs destroy from succeeding, leading to a test failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11451
2021-01-12 17:20:02 -08:00
Ryan Moeller bea3fc71ae ztest: Use fnvlist_* instead of VERIFYing nvlist_*
Simplify ztest by using fnvlist functions to verify success.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11441
2021-01-11 09:30:33 -08:00
Matthew Ahrens 2ac90457f5 record ioctl elapsed time in zpool history
Each zfs ioctl that changes on-disk state (e.g. set property, create
snapshot, destroy filesystem) is recorded in the zpool history, and is
printed by `zpool history -i`.

For performance diagnostic purposes, it would be useful to know how long
each of these ioctls took to run.  This commit adds that functionality,
with a new `ZPOOL_HIST_ELAPSED_NS` member of the history nvlist.

Additionally, the time recorded in this history log is currently the
time that the history record is written to disk.  But in many cases (CLI
args logging and ioctl logging), this happens asynchronously,
potentially many seconds after the operation completed.  This commit
changes the timestamp to reflect when the history event was created,
rather than when it was written to disk.

Reviewed-by: Mark Maybee <mmaybee@cray.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11440
2021-01-11 09:29:25 -08:00
Matthew Ahrens dc303dcf5b assertion failed in arc_wait_for_eviction()
If the system is very low on memory (specifically,
`arc_free_memory() < arc_sys_free/2`, i.e. less than 1/16th of RAM
free), `arc_evict_state_impl()` will defer wakups.  In this case, the
arc_evict_waiter_t's remain on the list, even though `arc_evict_count`
has been incremented past their `aew_count`.

The problem is that `arc_wait_for_eviction()` assumes that if there are
waiters on the list, the count they are waiting for has not yet been
reached.  However, the deferred wakeups may violate this, causing
`ASSERT(last->aew_count > arc_evict_count)` to fail.

This commit resolves the issue by having new waiters use the greater of
`arc_evict_count` and the last `aew_count`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11285
Closes #11397
2021-01-07 20:06:32 -08:00
Matthew Macy f11b09dec3 FreeBSD: minor_t needs to be signed so that -1 is recognized as such
zfsdev_close sets zs_minor to -1 to avoid duplicate calls to
destroy. This doesn't mix well with the current u_int used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11437
2021-01-07 10:41:27 -08:00
Brian Behlendorf 06346cc5b5 Autoconf 2.70 compatibility
Several m4 macros have been retired in autoconf 2.70.  Update the
the build system to use the new macros provided to replace them.

* Replaced AC_HELP_STRING with AS_HELP_STRING.

* Replaced AC_TRY_COMPILE with AC_COMPILE_IFELSE/AC_LANG_PROGRAM.

* Replaced AC_CANONICAL_SYSTEM with AC_CANONICAL_TARGET

* Replaced AC_PROG_LIBTOOL with LT_INIT

* $CPP is not defined in ZFS_AC_KERNEL and really shouldn't be
  directly used like this.  Replace it with an $AWK command
  to extract the kernel source version.

Reviewed-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11413 
Closes #11419
2021-01-02 16:55:55 -08:00
Toomas Soome 4ba8c6b584 zfs_mount_all_mountpoints: cleanup_all should leave pool root mounted
if pool root is not mounted, then zpool umount in next test will leave
dataset mountpoint directory around and next zfs mount -a will fail
with error: cannot mount '/testpool': directory is not empty

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11417
2021-01-02 16:54:53 -08:00
Konstantin Khorenko 064c2cf40e VZ 7 kernel compat: introduce ITER-enabled .direct_IO() via IOVECs
Virtuozzo 7 kernels starting 3.10.0-1127.18.2.vz7.163.46
have the following configuration:

  * no HAVE_VFS_RW_ITERATE
  * HAVE_VFS_DIRECT_IO_ITER_RW_OFFSET

=> let's add implementation of zpl_direct_IO() via
zpl_aio_{read,write}() in this case.

https://bugs.openvz.org/browse/OVZ-7243

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Closes #11410 
Closes #11411
2020-12-30 14:18:29 -08:00
Matthew Ahrens 808e681238 Memory leak in zdb:import_checkpointed_state()
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2020-12-28 10:05:55 -08:00
Matthew Ahrens f014700a37 Memory leak in ztest_dmu_objset_own()
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2020-12-28 10:05:44 -08:00
Matthew Ahrens a0316ad268 Memory leak in ztest_vdev_attach_detach()
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2020-12-28 10:05:32 -08:00
Matthew Ahrens b6722b871b nvlist leaked in zpool_find_config()
In `zpool_find_config()`, the `pools` nvlist is leaked.  Part of it (a
sub-nvlist) is returned in `*configp`, but the callers also leak that.

Additionally, in `zdb.c:main()`, the `searchdirs` is leaked.

The leaks were detected by ASAN (`configure --enable-asan`).

This commit resolves the leaks.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11396
2020-12-28 10:05:31 -08:00
Toomas Soome 40ab927ae8 implicit conversion from 'boolean_t' to 'ds_hold_flags_t'
Build error on illumos with gcc 10 did reveal:

In function 'dmu_objset_refresh_ownership':
../../common/fs/zfs/dmu_objset.c:857:25: error: implicit conversion
from 'boolean_t' to 'ds_hold_flags_t' {aka 'enum ds_hold_flags'}
[-Werror=enum-conversion]
      857 |  dsl_dataset_disown(ds, decrypt, tag);
          |                         ^~~~~~~
cc1: all warnings being treated as errors

libzfs_input_check.c: In function 'zfs_ioc_input_tests':
libzfs_input_check.c:754:28: error: implicit conversion from
'enum dmu_objset_type' to 'enum lzc_dataset_type'
[-Werror=enum-conversion]
  754 |  err = lzc_create(dataset, DMU_OST_ZFS, NULL, NULL, 0);
      |                            ^~~~~~~~~~~
cc1: all warnings being treated as errors

The same issue is present in openzfs, and also the same issue about
ds_hold_flags_t, which currently defines exactly one valid value.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #11406
2020-12-27 16:31:02 -08:00
Brian Behlendorf c449d4b06d Linux 5.11 compat: blk_{un}register_region()
As of 5.11 the blk_register_region() and blk_unregister_region()
functions have been retired. This isn't a problem since add_disk()
has implicitly allocated minor numbers for a very long time.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:46 -08:00
Brian Behlendorf 19697e4545 Linux 5.11 compat: revalidate_disk_size()
Both revalidate_disk_size() and revalidate_disk() have been removed.
Functionally this isn't a problem because we only relied on these
functions to call zvol_revalidate_disk() for us and to perform any
additional handling which might be needed for that kernel version.
When neither are available we know there's no additional handling
needed and we can directly call zvol_revalidate_disk().

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:40 -08:00
Brian Behlendorf 72ba4b2a4c Linux 5.11 compat: bdev_whole()
The bd_contains member was removed from the block_device structure.
Callers needing to determine if a vdev is a whole block device should
use the new bdev_whole() wrapper.  For older kernels we provide our
own bdev_whole() wrapper which relies on bd_contains for compatibility.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:33 -08:00
Brian Behlendorf a970f0594e Linux 5.11 compat: bio_start_io_acct() / bio_end_io_acct()
The generic IO accounting functions have been removed in favor of the
bio_start_io_acct() and bio_end_io_acct() functions which provide a
better interface.  These new functions were introduced in the 5.8
kernels but it wasn't until the 5.11 kernel that the previous generic
IO accounting interfaces were removed.

This commit updates the blk_generic_*_io_acct() wrappers to provide
and interface similar to the updated kernel interface.  It's slightly
different because for older kernels we need to pass the request queue
as well as the bio.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:24 -08:00
Brian Behlendorf b7281c88bc Linux 5.11 compat: lookup_bdev()
The lookup_bdev() function has been updated to require a dev_t
be passed as the second argument. This is actually pretty nice
since the major number stored in the dev_t was the only part we
were interested in. This allows to us avoid handling the bdev
entirely.  The vdev_lookup_bdev() wrapper was updated to emulate
the behavior of the new lookup_bdev() for all supported kernels.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:08 -08:00
Brian Behlendorf c347fac586 Linux 5.11 compat: conftest
Update the ZFS_LINUX_TEST_PROGRAM macro to always set the module
license.  As of the 5.11 kernel not setting a license has been
converted from a warning to an error.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:15:19 -08:00
Ryan Moeller 8fbd31d761 dbufstat: Fix warnings with Python 3.8
Replace "is" with "==" and "is not" with "!=".

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11394
2020-12-23 15:10:35 -08:00
Brian Behlendorf 4879c4d198 Linux 5.10 compat: META
Increase the Linux-Maximum version in the META file to 5.10.
All of the required compatibility patches have been merged.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11391
2020-12-23 08:55:02 -08:00
Brian Behlendorf 0c763f76b1 Remove unused check from dmu_tx_count_write()
Individual transactions may not be larger than DMU_MAX_ACCESS.
This is enforced by the assertions in dmu_tx_hold_write() and
dmu_tx_hold_write_by_dnode().  There's an additional check in
dmu_tx_count_write() however it has no effect and only sets a
local err variable.  We could enable this check, however since
it's already enforced by ASSERTs elsewhere I opted to remove it
instead.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3731 
Closes #11384
2020-12-21 20:17:13 -08:00
Christian Schwarz af9593903e zfs-kmods: install to /lib/modules instead of /usr/lib/modules
Before this patch, dracut wouldn't find zfs.ko for inclusion in
initramfs. This was caused by the packages installing in to
/lib/modules instead of /usr/lib/modules.  Correcting this allows
dracut to do the right thing, even without

    # /etc/dracut.conf
    add_drivers+=" zfs "

Notably, rpm/redhat/zfs-kmod.spec.in does not contain the definition of
the `prefix` macro that this commit removes in the generic kmod spec.
And https://rpmfusion.org/Packaging/KernelModules/Kmods2 does not
mention `prefix` at all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11381
2020-12-21 20:14:32 -08:00
Kjeld Schouten-Lebbing 7bbe563a0e Forward questions to github discussions
Instead of creating issues with type "question"
Forward to the GitHub Discussion system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #11383
2020-12-21 20:09:02 -08:00
Andy Fiddaman 39372fa25b Dangling reference from dmu_objset_upgrade
After porting the fix for https://github.com/openzfs/zfs/issues/5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andy Fiddaman <andy@omnios.org>
Closes #11368
2020-12-21 10:13:23 -08:00
Brian Behlendorf 6152014d1a Linux 4.18.0-257.el8 compat: blk_alloc_queue()
The CentOS stream 4.18.0-257 kernel appears to have backported
the Linux 5.9 change to make_request_fn and the associated API.
To maintain weak modules compatibility the original symbol was
retained and the new interface blk_alloc_queue_rh() was added.

Unfortunately, blk_alloc_queue() was replaced in the blkdev.h
header by blk_alloc_queue_bh() so there doesn't seem to be a way
to build new kmods against the old interfces.  Even though they
appear to still be available for weak module binding.

To accommodate this a configure check is added for the new _rh()
variant of the function and used if available.  If compatibility
code gets added to the kernel for the original blk_alloc_queue()
interface this should be fine.  OpenZFS will simply continue to
prefer the new interface and only fallback to blk_alloc_queue()
when blk_alloc_queue_rh() isn't available.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11374
2020-12-21 10:11:56 -08:00
Brian Behlendorf 8947fa4495 Fix maybe uninitialized variable warning
Commit 1c2358c12 restructured this code and introduced a warning
about the variable maybe not being initialized.  This cannot happen
with the updated code but we should initialize the variable anyway
to silence the warning.

    zpl_file.c: In function ‘zpl_iter_write’:
    zpl_file.c:324:9: warning: ‘count’ may be used uninitialized
        in this function [-Wmaybe-uninitialized]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11373
2020-12-20 09:50:13 -08:00
Brian Behlendorf 9ac535e662 Remove iov_iter_advance() from iter_read
There's no need to call iov_iter_advance() in zpl_iter_read().
This was preserved from the previous code where it wasn't needed
but also didn't cause any problems.  Now that the iter functions
also handle pipes that's no longer the case.  When fully reading a
pipe buffer iov_iter_advance() may results in the pipe buf release
function being called which will not be registered resulting in
a NULL dereference.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11375 
Closes #11378
2020-12-20 09:49:29 -08:00
Christian Schwarz 49c482fde3 dsl_pool: extend comment on DSL Pool Configuration Lock
Based on a conversation with Matt on the OpenZFS Slack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11370
2020-12-19 18:04:05 -08:00
Michael D Labriola 1c0bbd52c3 Linux 5.10 compat: also zvol_revalidate_disk()
Commit 59b68723 added a configure check for 5.10, which removed
revalidate_disk(), and conditionally replaced it's usage with a call to
the new revalidate_disk_size() function.  However, the old function also
invoked the device's registered callback, in our case
zvol_revalidate_disk().  This commit adds a call to zvol_revalidate_disk()
in zvol_update_volsize() to make sure the code path stays the same.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Michael D Labriola <michael.d.labriola@gmail.com>
Closes #11358
2020-12-18 09:36:19 -08:00
Prawn 77f09e2932 ZED/zfs-list-cacher.sh: don't exit on ignored event type
Check for the history_event type instead.

The zfs-list-cacher.sh script currently respects the event types
excluded from syslog(!) in ZED_SYSLOG_SUBCLASS_EXCLUDE.
This makes little sense in this single-purpose script and
silently breaks when history_events are excluded from syslog,
which is the default since 13d65987a9.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11164
Closes #11347
2020-12-18 09:34:10 -08:00
Brian Behlendorf 1c2358c12a Linux 5.10 compat: use iov_iter in uio structure
As of the 5.10 kernel the generic splice compatibility code has been
removed.  All filesystems are now responsible for registering a
->splice_read and ->splice_write callback to support this operation.

The good news is the VFS provided generic_file_splice_read() and
iter_file_splice_write() callbacks can be used provided the ->iter_read
and ->iter_write callback support pipes.  However, this is currently
not the case and only iovecs and bvecs (not pipes) are ever attached
to the uio structure.

This commit changes that by allowing full iov_iter structures to be
attached to uios.  Ever since the 4.9 kernel the iov_iter structure
has supported iovecs, kvecs, bvevs, and pipes so it's desirable to
pass the entire thing when possible.  In conjunction with this the
uio helper functions (i.e uiomove(), uiocopy(), etc) have been
updated to understand the new UIO_ITER type.

Note that using the kernel provided uio_iter interfaces allowed the
existing Linux specific uio handling code to be simplified.  When
there's no longer a need to support kernel's older than 4.9, then
it will be possible to remove the iovec and bvec members from the
uio structure and always use a uio_iter.  Until then we need to
maintain all of the existing types for older kernels.

Some additional refactoring and cleanup was included in this change:

- Added checks to configure to detect available iov_iter interfaces.
  Some are available all the way back to the 3.10 kernel and are used
  when available.  In particular, uio_prefaultpages() now always uses
  iov_iter_fault_in_readable() which is available for all supported
  kernels.

- The unused UIO_USERISPACE type has been removed.  It is no longer
  needed now that the uio_seg enum is platform specific.

- Moved zfs_uio.c from the zcommon.ko module to the Linux specific
  platform code for the zfs.ko module.  This gets it out of libzfs
  where it was never needed and keeps this Linux specific code out
  of the common sources.

- Removed unnecessary O_APPEND handling from zfs_iter_write(), this
  is redundant and O_APPEND is already handled in zfs_write();

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11351
2020-12-18 08:48:26 -08:00
Brian Behlendorf 2844ad60d4 ZTS: Simplify zpool_initialize_verify_initialized
Consider the test to be a success as long as the initializing pattern
is found at least once per metaslab.  This indicates that at least
part of the free space was initialized.  Ideally we'd check that the
pattern was written to all free space but that's much trickier so this
check is a reasonable compromise.

Using a here-string to feed the loop in this test causes an empty
string to still trigger the loop so we miss the `spacemaps=0` case.
Pipe into the loop instead.

While here, we can use `zpool wait -t initialize $TESTPOOL` to wait for
the pool to initialize.

Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11365
2020-12-18 08:42:59 -08:00
Matthew Ahrens 71e4ce0e52 special device removal space accounting fixes
The space in special devices is not included in spa_dspace (or
dsl_pool_adjustedsize(), or the zfs `available` property).  Therefore
there is always at least as much free space in the normal class, as
there is allocated in the special class(es).  And therefore, there is
always enough free space to remove a special device.

However, the checks for free space when removing special devices did not
take this into account.  This commit corrects that.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11329
2020-12-17 12:11:56 -08:00
Ryan Moeller 1531506d23 Avoid extra work updating ARC kstats and tunables
After e357046 it should not be necessary to periodically update ARC
kstats and tunables.  Tunable updates are applied when modified, and
kstats are updated on demand.

Update kstats in `arc_evict_cb_check()` for `ZFS_DEBUG` builds only.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11237
2020-12-17 11:16:42 -08:00
sterlingjensen fb188409f1 Use the correct return type for getopt
Use the correct return type for getopt otherwise clang complains
about tautological-constant-out-of-range-compare.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11359
2020-12-17 10:19:30 -08:00
Matthew Ahrens be5c6d9653 Only examine best metaslabs on each vdev
On a system with very high fragmentation, we may need to do lots of gang
allocations (e.g. most indirect block allocations (~50KB) may need to
gang). Before failing a "normal" allocation and resorting to ganging, we
try every metaslab.  This has the impact of loading every metaslab (not
a huge deal since we now typically keep all metaslabs loaded), and also
iterating over every metaslab for every failing allocation. If there are
many metaslabs (more than the typical ~200, e.g. due to vdev expansion
or very large vdevs), the CPU cost of this iteration can be very
impactful.  This iteration is done with the mg_lock held, creating long
hold times and high lock contention for concurrent allocations,
ultimately causing long txg sync times and poor application performance.

To address this, this commit changes the behavior of "normal" (not
try_hard, not ZIL) allocations.  These will now only examine the 100
best metaslabs (as determined by their ms_weight).  If none of these
have a large enough free segment, then the allocation will fail and
we'll fall back on ganging.

To accomplish this, we will now (normally) gang before doing a
`try_hard` allocation.  Non-try_hard allocations will only examine the
100 best metaslabs of each vdev.  In summary, we will first try normal
allocation.  If that fails then we will do a gang allocation.  If that
fails then we will do a "try hard" gang allocation.  If that fails then
we will have a multi-layer gang block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11327
2020-12-16 14:40:05 -08:00
Alexander Motin f8020c9363 Make metaslab class rotor and aliquot per-allocator.
Metaslab rotor and aliquot are used to distribute workload between
vdevs while keeping some locality for logically adjacent blocks.  Once
multiple allocators were introduced to separate allocation of different
objects it does not make much sense for different allocators to write
into different metaslabs of the same metaslab group (vdev) same time,
competing for its resources.  This change makes each allocator choose
metaslab group independently, colliding with others only sporadically.

Test including simultaneous write into 4 files with recordsize of 4KB
on a striped pool of 30 disks on a system with 40 logical cores show
reduction of vdev queue lock contention from 54 to 27% due to better
load distribution.  Unfortunately it won't help much ZVOLs yet since
only one dataset/ZVOL is synced at a time, and so for the most part
only one allocator is used, but it may improve later.

While there, to reduce the number of pointer dereferences change
per-allocator storage for metaslab classes and groups from several
separate malloc()'s to variable length arrays at the ends of the
original class and group structures.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11288
2020-12-15 10:55:44 -08:00
gregory-lee-bartholomew e2d952cda0 DKMS: Disable weak modules
Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #9891
Closes #11128
Closes #11242
Closes #11335
2020-12-15 09:22:30 -08:00
Ryan Libby d8a09b3a04 lua: avoid gcc -Wreturn-local-addr bug
Avoid a bug with gcc's -Wreturn-local-addr warning with some
obfuscation.  In buggy versions of gcc, if a return value is an
expression that involves the address of a local variable, and even if
that address is legally converted to a non-pointer type, a warning may
be emitted and the value of the address may be replaced with zero.
Howerver, buggy versions don't emit the warning or replace the value
when simply returning a local variable of non-pointer type.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90737

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11337
2020-12-15 09:20:48 -08:00
Ryan Libby 956f94010f spa: avoid type narrowing warning
Building the spa module for i386 caused gcc to emit
-Wint-to-pointer-cast "cast to pointer from integer of different size"
because spa.spa_did was uint64_t but pthread_join (via thread_join in
spa_deactivate) takes a pointer (32-bit on i386).  Define spa_did to be
pointer-size instead.  For now spa_did is in fact never non-zero and the
thread_join could instead be ifdef'd out, but changing the size of
spa_did may be more useful for the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11336
2020-12-15 09:20:06 -08:00
Ryan Libby c7500ded3e FreeBSD libzfs: gcc requires __thread after static
Building libzfs with gcc on FreeBSD failed because gcc is picky about
the order of keywords in declarations with __thread, whereas clang is
more relaxed.

https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes #11331
2020-12-14 09:28:24 -08:00
Matthew Macy 923d730329 dmu_zfetch: fix memory leak
The last change caused the read completion callback to not be called
if the IO was still in progress. This change restores allocation
of the arc buf callback, but in the callback path checks the new
acb_nobuf field to know to skip buffer allocation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11324
2020-12-12 16:00:00 -08:00
George Amanakis c76a40bfda Fix reporting of CKSUM errors in indirect vdevs
When removing and subsequently reattaching a vdev, CKSUM errors may
occur as vdev_indirect_read_all() reads from all children of a mirror
in case of a resilver.

Fix this by checking whether a child is missing the data and setting a
flag (ic_error) which is then checked in vdev_indirect_repair() and
suppresses incrementing the checksum counter.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11277
2020-12-11 12:15:37 -08:00
Brian Behlendorf 1ad07b01bc Remove draid.d symlink from zfs_helpers.sh
In an earlier revision of dRAID there existed an /etc/zfs/draid.d
directory.  This was removed before the final version was integrated
but a little bit was accidentally overlooked in the zfs_helpers.sh
script.  Remove this remnant. 

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11326
2020-12-11 11:00:58 -08:00
Ryan Moeller 695ac5850b arc_summary3: Handle overflowing value width
Some tunables shown by arc_summary3 have string values that may exceed
the normal line length, leaving a negative offset between the name and
value fields.  The negative space is of course not valid and Python
rightly barfs up an exception traceback.

Handle an overflowing value field width by ignoring the line length
and separating the name from the value by a single space instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11270
2020-12-11 10:29:53 -08:00
Ryan Moeller 439dc034e9 FreeBSD: Implement sysctl for fletcher4 impl
There is a tunable to select the fletcher 4 checksum implementation on
Linux but it was not present in FreeBSD.

Implement the sysctl handler for FreeBSD and use ZFS_MODULE_PARAM_CALL
to provide the tunable on both platforms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11270
2020-12-11 10:29:01 -08:00
Matthew Ahrens ba67d82142 Improve zfs receive performance with lightweight write
The performance of `zfs receive` can be bottlenecked on the CPU consumed
by the `receive_writer` thread, especially when receiving streams with
small compressed block sizes.  Much of the CPU is spent creating and
destroying dbuf's and arc buf's, one for each `WRITE` record in the send
stream.

This commit introduces the concept of "lightweight writes", which allows
`zfs receive` to write to the DMU by providing an ABD, and instantiating
only a new type of `dbuf_dirty_record_t`.  The dbuf and arc buf for this
"dirty leaf block" are not instantiated.

Because there is no dbuf with the dirty data, this mechanism doesn't
support reading from "lightweight-dirty" blocks (they would see the
on-disk state rather than the dirty data).  Since the dedup-receive code
has been removed, `zfs receive` is write-only, so this works fine.

Because there are no arc bufs for the received data, the received data
is no longer cached in the ARC.

Testing a receive of a stream with average compressed block size of 4KB,
this commit improves performance by 50%, while also reducing CPU usage
by 50% of a CPU.  On a per-block basis, CPU consumed by receive_writer()
and dbuf_evict() is now 1/7th (14%) of what it was.

Baseline: 450MB/s, CPU in receive_writer() 40% + dbuf_evict() 35%
New: 670MB/s, CPU in receive_writer() 17% + dbuf_evict() 0%

The code is also restructured in a few ways:

Added a `dr_dnode` field to the dbuf_dirty_record_t.  This simplifies
some existing code that no longer needs `DB_DNODE_ENTER()` and related
routines.  The new field is needed by the lightweight-type dirty record.

To ensure that the `dr_dnode` field remains valid until the dirty record
is freed, we have to ensure that the `dnode_move()` doesn't relocate the
dnode_t.  To do this we keep a hold on the dnode until it's zio's have
completed.  This is already done by the user-accounting code
(`userquota_updates_task()`), this commit extends that so that it always
keeps the dnode hold until zio completion (see `dnode_rele_task()`).

`dn_dirty_txg` was previously zeroed when the dnode was synced.  This
was not necessary, since its meaning can be "when was this dnode last
dirtied".  This change simplifies the new `dnode_rele_task()` code.

Removed some dead code related to `DRR_WRITE_BYREF` (dedup receive).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11105
2020-12-11 10:26:02 -08:00
Paul Dagnelie 7d4b365ce3 Fix kernel panic induced by redacted send
In the redaction list traversal code, there is a bug in the binary search
logic when looking for the resume point. Maxbufid can be decremented to -1,
causing us to read the last possible block of the object instead of the one we
wanted. This can cause incorrect resume behavior, or possibly even a hang in
some cases. In addition, when examining non-last blocks, we can treat the
block as being the same size as the last block, causing us to miss entries in
the redaction list when determining where to resume. Finally, we were ignoring
the case where the resume point was found in the buffer being searched, and
resuming from minbufid. All these issues have been corrected, and the code has
been significantly simplified to make future issues less likely.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11297
2020-12-11 10:22:29 -08:00
Ryan Moeller 8c5606ca0b FreeBSD: Fix format of vfs.zfs.arc_no_grow_shift
vfs.zfs.arc_no_grow_shift has an invalid type (15) and this causes
py-sysctl to format it as a bytearray when it should be an integer.

"U" is not a valid format, it should be "I" and the type should match
the variable type, int.  We can return EINVAL if the value is set below
zero.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318
2020-12-10 15:28:56 -08:00
Ryan Moeller 513c196200 FreeBSD: Update usage of py-sysctl
py-sysctl now includes the CTLTYPE_NODE type nodes in the list returned
by sysctl.filter() on FreeBSD head.  It also provides descriptions now.

Eliminate the subprocess call to get descriptions, and filter out the
nodes so we only deal with values.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318
2020-12-10 15:28:31 -08:00
Brian Behlendorf e5f732edbb Fix possibly uninitialized 'root_inode' variable warning
Resolve an uninitialized variable warning when compiling.

    In function ‘zfs_domount’:
    warning: ‘root_inode’ may be used uninitialized in this
        function [-Wmaybe-uninitialized]
    sb->s_root = d_make_root(root_inode);

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11306
2020-12-10 15:23:26 -08:00
Paul Dagnelie 60a4c7d2a2 Implement memory and CPU hotplug
ZFS currently doesn't react to hotplugging cpu or memory into the 
system in any way. This patch changes that by adding logic to the ARC 
that allows the system to take advantage of new memory that is added 
for caching purposes. It also adds logic to the taskq infrastructure 
to support dynamically expanding the number of threads allocated to a 
taskq.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11212
2020-12-10 14:09:23 -08:00
Brian Behlendorf f483daa870 CI: add zloop workflow
Run ztest via zloop for 20 minutes, total run time is ~30 minutes.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11319
2020-12-10 10:55:53 -08:00
George Melikov 5053dfa08c CI: add zloop workflow
Run ztest via zloop for 20 minutes, total run time is ~30 minutes.

Signed-off-by: George Melikov <mail@gmelikov.ru>
2020-12-10 20:46:15 +03:00
Ryan Moeller e0716250bf FreeBSD: Do zcommon_init sooner to avoid FPU panic
There has been a panic affecting some system configurations where the
thread FPU context is disturbed during the fletcher 4 benchmarks,
leading to a panic at boot.

module_init() registers zcommon_init to run in the last subsystem
(SI_SUB_LAST).  Running it as soon as interrupts have been configured
(SI_SUB_INT_CONFIG_HOOKS) makes sure we have finished the benchmarks
before we start doing other things.

While it's not clear *how* the FPU context was being disturbed, this
does seem to avoid it.

Add a module_init_early() macro to run zcommon_init() at this earlier
point on FreeBSD.  On Linux this is defined as module_init().

Authored by: Konstantin Belousov <kib@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11302
2020-12-09 21:29:00 -08:00
Attila Fülöp b9916b4064 ZTS: three small follow up fixes for #11167
Follow up fix for 0cb40fa3. Remove unused variables, don't source
unused libs and add missed cleanup.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11311
2020-12-09 21:27:12 -08:00
Érico Nogueira Rolim 957f9681eb mount_zfs: print strerror instead of errno for error reporting
Tracking down an error message with the errno value can be difficult,
using strerror makes the error message clearer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11303
2020-12-09 21:24:59 -08:00
sterlingjensen 1e4667af32 Drop path prefix workaround
Canonicalization, the source of the trouble, was disabled in 9000a9f.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11295
2020-12-09 21:24:26 -08:00
Orivej Desh ab4fb9b74e Delete rw_semaphore.wait_lock configure check
Last use of wait_lock was removed in "Linux 5.3 compat: retire
rw_tryupgrade()" (e7a99dab2b).

Fixes the issue reported in
https://github.com/openzfs/zfs/issues/11097#issuecomment-714532367

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Orivej Desh <orivej@gmx.fr>
Closes #11309
2020-12-09 21:22:54 -08:00
Matthew Macy 1e4732cbda Decouple arc_read_done callback from arc buf instantiation
Add ARC_FLAG_NO_BUF to indicate that a buffer need not be
instantiated.  This fixes a ~20% performance regression on
cached reads due to zfetch changes.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11220 
Closes #11232
2020-12-09 15:05:06 -08:00
Brian Behlendorf edb20ff3ba Fix optional "force" arg handing in zfs_ioc_pool_sync()
The fnvlist_lookup_boolean_value() function should not be used
to check the force argument since it's optional.  It may not be
provided or may have been created with the wrong flags.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11281
Closes #11284
2020-12-09 14:52:45 -08:00
George Melikov 1a735e763a CI: add new zfs-tests-sanity workflow
Run zfs-tests with sanity.run for brief results.  Timeouts
are rare, so minimize false positives by increasing the
default from 60 to 180 seconds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11304
2020-12-08 09:53:45 -08:00
George Melikov 8e8fdce682 ZTS: zpool_trim tests throttle trim process
Otherwise trim may finish before progress checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11296
2020-12-07 10:06:10 -08:00
Brian Behlendorf 83b698dc42 Reduce fletcher4 and raidz benchmark times
During module load time all of the available fetcher4 and raidz
implementations are benchmarked for a fixed amount of time to
determine the fastest available.  Manual testing has shown that this
time can be significantly reduced with negligible effect on the final
results.

This commit changes the benchmark time to 1ms which can reduce the
module load time by over a second on x86_64.  On an x86_64 system
with sse3, ssse3, and avx2 instructions the benchmark times are:

    Fletcher4    603ms   -> 15ms
    RAIDZ        1,322ms -> 64ms

Reviewed-by: Matthew Macy <mmacy@freebsd.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11282
2020-12-06 09:57:20 -08:00
Alexander Motin 8136b9d73b Avoid some spa_has_pending_synctask() calls.
Since 8c4fb36a24 (PR #7795) spa_has_pending_synctask() started to
take two more locks per write inside txg_all_lists_empty().  I am
surprised those pool-wide locks are not contended, but still their
operations are visible in CPU profiles under contended vdev lock.

This commit slightly changes vdev_queue_max_async_writes() flow to
not call the function if we are going to return max_active any way
due to high amount of dirty data.  It allows to save some CPU time
exactly when the pool is busy.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11280
2020-12-06 09:55:02 -08:00
Alexander Motin 6366ef2240 Bring consistency to ABD chunk count types.
With both abd_size and abd_nents being uint_t it makes no sense for
abd_chunkcnt_for_bytes() to return size_t.  Random mix of different
types used to count chunks looks bad and makes compiler more difficult
to optimize the code.

In particular on FreeBSD this change allows compiler to completely
optimize out abd_verify_scatter() when built without debug, removing
pointless 64-bit division and even more pointless empty loop.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11279
2020-12-06 09:53:40 -08:00
Brian Behlendorf eed2bfe06a Enable ABI checks for the checkstyle workflow
Extend the CI checkstyle workflow to perform the library ABI
checks in the master branch.  The intent is not to prevent any
ABI changes but to detect them immediately so when they're
made it's done intentionally.

When the changing the ABI the `make storeabi` target can be
used to generate a new .abi file which can be included with
the commit.  This depends on the libabigail utility which is
available from the majority of distribution package managers.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11287
2020-12-06 09:50:47 -08:00
Brian Behlendorf 0484e8722f ZTS: adjust zpool_import_012_pos timeout
When running in the CI the zpool_import_012_pos test case occasionally
takes longer than the maximum 600 seconds.  When this happens the test
case is considered to have failed but always completes a few minutes
latter.  Since the logs suggest nothing has actually failed this commit
increases timeout and removes the exception.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11286
2020-12-06 09:48:36 -08:00
Brian Behlendorf 81638c999d ZTS: Update zfs_share_concurrent_shares.ksh
Occasionally an out of memory error is hit by this test case
when mounting the filesystems.  Try and reduce the likelihood
of this occurring by reducing the thread count from 100 to 50.
It also has the advantage of slightly speeding up the test.

    cannot mount 'testpool/testfs3/79': Cannot allocate memory
        filesystem successfully created, but not mounted

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11283
2020-12-06 09:47:33 -08:00
George Amanakis d1d47691c2 Fix raw sends on encrypted datasets when copying back snapshots
When sending raw encrypted datasets the user space accounting is present
when it's not expected to be. This leads to the subsequent mount failure
due a checksum error when verifying the local mac.
Fix this by clearing the OBJSET_FLAG_USERACCOUNTING_COMPLETE and reset
the local mac. This allows the user accounting to be correctly updated
on first mount using the normal upgrade process.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10523 
Closes #11221
2020-12-04 14:34:29 -08:00
Attila Fülöp 0cb40fa389 zpool: Dryrun fails to list some devices
`zpool create -n` fails to list cache and spare vdevs.
`zpool add -n` fails to list spare devices.
`zpool split -n` fails to list `special` and `dedup` labels.
`zpool add -n` and `zpool split -n` shouldn't list hole devices.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11122
Closes #11167
2020-12-04 14:04:39 -08:00
Ryan Moeller 4b6e2a5a33 Add -u option to 'zfs create'
Add -u option to 'zfs create' that prevents file system from being
automatically mounted. This is similar to the 'zfs receive -u'.

Authored by: pjd <pjd@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@35c58230e2

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11254
2020-12-04 14:01:42 -08:00
Brian Behlendorf 8f158ae6ad Add sanity.run file
This run file contains a subset of functional tests which exercise
as much functionality as possible while still executing relatively
quickly.  The included tests should take no more than a few seconds
each to run at most.  This provides a convenient way to sanity test a
change before committing to a full test run which takes several hours.

    $ ./scripts/zfs-tests.sh -r sanity
    ...
    Results Summary
    PASS	 813

    Running Time:	00:14:42
    Percent passed:	100.0%

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11271
2020-12-03 10:49:39 -08:00
melak 766e06695f Fix trivial typo in zfs-diff.8
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tamas TEVESZ <ice@extreme.hu>
Closes #11268 
Closes #11272
2020-12-03 10:18:26 -08:00
Alexander Motin dcf7044522 Fix for "Reduce latency effects of non-interactive I/O"
It was found that setting min_active tunables for non-interactive I/Os
makes them stuck.  It is caused by zfs_vdev_nia_delay, that can never
be reached if we never issue any I/Os due to min_active set to zero.

Fix this by issuing at least one non-interactive I/O at a time when
there are no interactive I/Os.  When there are interactive I/Os, zero
min_active allows to completely block any non-interactive I/O.  It may
min_active starvation in some scenarios, but who we are to deny foot
shooting?

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11261
2020-12-03 10:02:39 -08:00
qzdanis 9109b89cd7 Add compatibility for busybox mktemp
Busybox's mktemp requires at least six X's in the template, causing
the current sed --in-place check to fail because the file does not
exist. This change adds additional X's to mktemp templates that do
not already have at least six X's in them.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Quentin Zdanis <zdanisq@gmail.com>
Closes #11269
2020-12-03 10:01:16 -08:00
Ryan Moeller 0aacde2e9a FreeBSD: notify userspace when a vdev is removed
This is needed for zfsd to autoreplace vdevs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11260
2020-12-02 10:20:02 -08:00
Finix1979 ec50cd24ba Avoid unneccessary zio allocation and wait
In function dmu_buf_hold_array_by_dnode, the usage of zio is only for 
the reading operation. Only create the zio and wait it in the reading 
scenario as a performance optimization.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Finix Yan <yancw@info2soft.com>
Closes #11251 
Closes #11256
2020-12-02 09:28:55 -08:00
Andrew Sun 95a78a035a Make zpool status "remove:" label print in bold
When ZFS_COLOR is set, zpool status shows row headings in bold,
except for the "remove:" heading. This is a quick fix that makes
it print in bold too.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Sun <me@andrewsun.com>
Closes #11255
2020-12-01 15:22:51 -08:00
George Melikov aa2778d100 CI: simplify checkstyle runner
Remove excess steps.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11262
2020-12-01 12:15:55 -08:00
Pavel Snajdr 52c8537513 zpool_influxdb: move to libexec dir
Move the zpool_influxdb command to /usr/libexec/zfs,
and include the /usr/libexec/zfs path in the system search
directory when running the test suite.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #11156
Closes #11160
Closes #11224
2020-11-28 11:15:57 -08:00
Brian Behlendorf b2a54a28b5 Verify zfs module loaded before starting services
Extend the change made in ae12b02 to verify the zfs kernel
modules are loaded to the rest of the OpenZFS services.  If
the modules aren't loaded the neither the share, volume, or
and zed services can be started.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11243
2020-11-28 11:11:18 -08:00
Đoàn Trần Công Danh 16692e6ba0 dracut: use /bin/sh instead of bash as the intepreter
Despite that dracut has a hard dependency on bash,
its modules doesn't, dracut only has a hard dependency on bash for
module-setup (on a fully usable machine). Inside initramfs, dracut
allows users choose from a list of handful other shells, e.g. bash,
busybox, dash, mkfsh.

In fact, my local machine's initramfs is being built with dash,
and it's functional for a very long time.

Before 64025fa3a (Silence 'make checkbashisms', 2020-08-20), we also
allows our users to have that right, too.

Let's fix the problem 'make checkbashisms' reported and allows our users
to have that right, again.

For 'plymouth' case, let's simply run the command inside the if instead
of checking for the existence of command before running it, because the
status is also failture if plymouth is unavailable.

While we're at it, let's remove an unnecessary fork for grep in
zfs-generator.sh.in and its following complicated 'if elif fi' with
a simple 'case ... esac'.

To support this change, also exclude 90zfs from "make checkbashisms"
because the current CI infrastructure ships an old version of
"checkbashisms", which complains about "command -v", while the current
latest "checkbashisms" thinks it's fine. In the near future, we can
revert that change to "Makefile.am" when CI infrastructure is updated.

Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Closes #11244
2020-11-28 11:02:08 -08:00
Brian Behlendorf 04a82e043d Remove incorrect assertion
Commit 85703f6 added a new ASSERT to zfs_write() as part of the
cleanup which isn't correct in the case where multiple processes
are concurrently extending a file.  The `zp->z_size` is updated
atomically while holding a range lock on only a portion of the
file.  Therefore, it's possible for the file size to increase
after a same check is performed earlier in the loop causing this
ASSERT to fail.  The code itself handles this case correctly so
only the invalid ASSERT needs to be removed.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11235
2020-11-24 09:28:42 -08:00
Alexander Motin 6f5aac3ca0 Reduce latency effects of non-interactive I/O
Investigating influence of scrub (especially sequential) on random read
latency I've noticed that on some HDDs single 4KB read may take up to 4
seconds!  Deeper investigation shown that many HDDs heavily prioritize
sequential reads even when those are submitted with queue depth of 1.

This patch addresses the latency from two sides:
 - by using _min_active queue depths for non-interactive requests while
   the interactive request(s) are active and few requests after;
 - by throttling it further if no interactive requests has completed
   while configured amount of non-interactive did.

While there, I've also modified vdev_queue_class_to_issue() to give
more chances to schedule at least _min_active requests to the lowest
priorities.  It should reduce starvation if several non-interactive
processes are running same time with some interactive and I think should
make possible setting of zfs_vdev_max_active to as low as 1.

I've benchmarked this change with 4KB random reads from ZVOL with 16KB
block size on newly written non-fragmented pool.  On fragmented pool I
also saw improvements, but not so dramatic.  Below are log2 histograms
of the random read latency in milliseconds for different devices:

4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before:
0, 0, 2,  1,  12,  21,  19,  18, 10, 15, 17, 21
after:
0, 0, 0, 24, 101, 195, 419, 250, 47,  4,  0,  0
, that means maximum latency reduction from 2s to 500ms.

4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before:
0, 0,  2,  31,  38,  28,  18,  12, 17, 20, 24, 10, 3
after:
0, 0, 55, 247, 455, 470, 412, 181, 36,  0,  0,  0, 0
, i.e. from 4s to 250ms.

1 SAS HDD SEAGATE ST14000NM0048 before:
0,  0,  29,   70, 107,   45,  27, 1, 0, 0, 1, 4, 19
after:
1, 29, 681, 1261, 676, 1633,  67, 1, 0, 0, 0, 0,  0
, i.e. from 4s to 125ms.

1 SAS SSD SEAGATE XS3840TE70014 before (microseconds):
0, 0, 0, 0, 0, 0, 0, 0,  70, 18343, 82548, 618
after:
0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844,  90

I've also measured scrub time during the test and on idle pools.  On
idle fragmented pool I've measured scrub getting few percent faster
due to use of QD3 instead of QD2 before.  On idle non-fragmented pool
I've measured no difference.  On busy non-fragmented pool I've measured
scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11166
2020-11-24 09:26:42 -08:00
Brian Behlendorf f67bebbc34 Obsolete earlier packages due to version bump
In order for package managers such as dnf to upgrade cleanly after
the package SONAME bump the obsolete package names must be known.
Update the new packages to correctly obsolete the old ones.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11230 
Closes #11233
2020-11-24 09:24:24 -08:00
Matthew Macy cd44f5be37 FreeBSD: decouple ZFS_DEBUG from kernel debug settings
Reviewed-by: Martelli Nikola @martellini
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11213
2020-11-24 09:16:46 -08:00
Brian Behlendorf 0657326f9c Update dRAID short feature description
The documentation describes dRAID as a distributed spare, not
parity, RAID implementation.  Update the short feature description
to match the rest of the documentation.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11229
2020-11-23 14:49:17 -08:00
Antonio Russo d45267183f libzfsbootenv: do not depend on libnvpair
We do not build libnvpair.pc.  Moreover, it is automatically pulled in
by libzfs.pc, so no additional specific dependency is required.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11227
2020-11-22 15:16:42 -08:00
cragw dc6d39a85e pam_zfs_key: accommodate different dataset naming scheme
Name of dataset for user home directory may vary from the expected
$homes_prefix/$username, if different naming scheme is being used.

We can use property mountpoint to specify the dataset for $username
as long as its value is identical to passwd's pw_dir.

For example:
    NAME                       PROPERTY     VALUE
    rpool/home/myuser_123456   mountpoint   /home/myuser

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Crag Wang <crag0715@gmail.com>
Closes #11165
2020-11-22 09:32:34 -08:00
Brian Behlendorf f1ece319fd Include the ABI with dist tarball
The ABI should be included when generating the `make dist` tarball
since it's required by the `make checkabi` target.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11225
2020-11-21 10:44:52 -08:00
Brian Behlendorf 4d0ba94113 Correct missing zil_claim() DTL updates
Commit a1d477c2 accidentally disabled DTL updates for the zil_claim()
case described at the end of vdev_stat_update() by unconditionally
disabling all DTL updates when loading.  This was done to avoid
a deadlock on the vd_dtl_lock when loading the DTLs from disk.

    vdev_dtl_contains <--- Takes vd->vd_dtl_lock
    vdev_mirror_child_missing
    vdev_mirror_io_start
    zio_vdev_io_start
    __zio_execute
    arc_read
    dbuf_issue_final_prefetch
    dbuf_prefetch_impl
    dbuf_prefetch
    dmu_prefetch
    space_map_iterate
    space_map_load_length
    space_map_load
    vdev_dtl_load <--- Takes vd->vd_dtl_lock
    vdev_load
    spa_ld_load_vdev_metadata
    spa_tryimport

The missing DTL updates can be restored by moving the space_map_load()
call outside the vd_dtl_lock.  A private range tree is populated by
reading the space map and then merged in to the DTL_MISSING tree
under the lock.

Furthermore, the SPA_LOAD_NONE check in vdev_dtl_contains() leads to an
additional problem.  Any resilvering which occurs before SPA_LOAD_NONE
is set will incorrectly determine that there's nothing to repair.  This
can result in full redundancy not being restored for some blocks.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11218
2020-11-20 13:14:45 -08:00
Antonio Russo 8434017a27 Track SONAME version bump in packaging
RPM and DEB packages are named after the SONAME version of the library
they contain.  After bumping this version, the packaging should be
renamed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11219
2020-11-19 16:25:24 -08:00
наб 567e4d0dfa dracut/mount-zfs.sh: quote expansion on zpool test
Bring over some of the improvements from dracut/zfs-load-key.sh,
shellcheck is slightly quieter as well

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11198
2020-11-19 16:20:57 -08:00
наб 2d9f82d891 dracut/zfs-load-key.sh: simplify import loop, quote variable assignments
The loop now has a less confusing condition and properly uses
systemctl(1) is-failed's return code instead of that entire mess

The assignments could turn into "var=val program" if encryptionroot
or keylocation had whitespace in them

As a bonus, this (mostly) silences shellcheck

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11198
2020-11-19 16:20:42 -08:00
Ryan Moeller 85703f616d Reduce confusion in zfs_write
Is this block when abuf != NULL ever reached? Yes, it is.

Add asserts and comments to prove that when we get here, we have a full
block write at an aligned offset extending past EOF.

Simplify by removing the check that tx_bytes == max_blksz, since we can
assert that it is always true.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11191
2020-11-18 15:06:59 -08:00
Matthew Macy 0ca45cb310 Fix problems in zvol_set_volmode_impl
- Don't leave fstrans set when passed a snapshot
- Don't remove minor if volmode already matches new value
- (FreeBSD) Wait for GEOM ops to complete before trying
  remove (at create time GEOM will be "tasting" in parallel)
- (FreeBSD) Don't leak zvol_state_lock on open if zv == NULL
- (FreeBSD) Don't try to unlock zv->zv_state lock if zv == NULL

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11199
2020-11-17 09:50:52 -08:00
Brian Behlendorf 82611cdfe5 Add ABI snapshot
Add a snapshot of the current ABI using libabigail-1.7-2.  The
included ABI passes `make checkabi` for CentOS 7, Fedora 33,
Debian 10, and Ubuntu 20.04.  This covers a fairly wide range
of glibc, gcc, and libabigail versions plus other changes which
are platform specific.

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11144
2020-11-17 09:21:39 -08:00
Antonio Russo 14c34c3d49 Library ABI tracking with abigail
Provide two make targets: checkabi and storeabi.

storeabi uses libabigail to generate a reference copy of the ABI for the
public libraries.

checkabi compares such a reference to the compiled version, failing if
they are not compatible.  No ABI is generated for libzpool.so, it is
only used by ztest and zdb and not external consumers.

Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11144
2020-11-17 09:18:52 -08:00
наб e6c59cd171 zpool: correctly align columns with -p
zpool_expand_proplist() now ignores pl_fixed if its new literal
argument is true.  The rest is a consequence of needing to pass
that down.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiao?=~Dska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11202
2020-11-16 09:26:20 -08:00
наб fd654e412e zpool(8): fix pool-wi[sd]e typo
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11202
2020-11-16 09:26:16 -08:00
loli10K 4072f465bc Fix 'zfs userspace' for received datasets in encrypted root
For encrypted receives, where user accounting is initially disabled on
creation, both 'zfs userspace' and 'zfs groupspace' fails with
EOPNOTSUPP: this is because dmu_objset_id_quota_upgrade_cb() forgets to
set OBJSET_FLAG_USERACCOUNTING_COMPLETE on the objset flags after a
successful dmu_objset_space_upgrade().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9501 
Closes #9596
2020-11-16 09:10:29 -08:00
George Amanakis 2c210f6818 Fix ASSERT logic in l2arc_evict()
In case of cache device removal it is possible that at the end of
l2arc_evict() we have l2ad_hand = l2ad_evict. This can lead to the
following panic in case of a debug build:

VERIFY3(dev->l2ad_hand < dev->l2ad_evict) failed (321920512 < 321920512)
Call Trace:
 dump_stack+0x66/0x90
 spl_panic+0xef/0x117 [spl]
 l2arc_remove_vdev+0x11d/0x290 [zfs]
 spa_load_l2cache+0x275/0x5b0 [zfs]
 spa_vdev_remove+0x4a5/0x6e0 [zfs]
 zfs_ioc_vdev_remove+0x59/0xa0 [zfs]
 zfsdev_ioctl_common+0x5b3/0x630 [zfs]
 zfsdev_ioctl+0x53/0xe0 [zfs]
 do_vfs_ioctl+0x42e/0x6b0
 ksys_ioctl+0x5e/0x90
 do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

In case of cache device removal it also possible that l2ad_hand +
distance > l2ad_end since we do not iterate l2arc_evict() and l2ad_hand
is not reset. This has no functional consequence however as the cache
device is about to be removed.

Fix this by omitting the ASSERT in case of device removal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11205
2020-11-16 09:08:11 -08:00
Érico Rolim 24c12b48a1 config/dracut/90zfs: handle cases where hostid(1) returns all zeros
On systems with musl libc, hostid(1) always prints "00000000", which
will cause improper behavior when the 90zfs module is configured in a
dracut initramfs. Work around this by copying the host /etc/hostid if
the file exists, and otherwise only write /etc/hostid if hostid(1)
returns something meaningful. This avoids zgenhostid creating a random
/etc/hostid for the initramfs, which could lead to errors when trying to
import the pool if spl_hostid isn't defined in the kernel command line.

Furthermore, tag the /etc/hostid file as hostonly, since it is system
specific and shouldn't be taken into account when trying to use an
initramfs generated in one system to boot into a different system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Co-authored-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11174
Closes #11189
2020-11-14 17:21:54 -08:00
Érico Rolim 9c4b6dbb31 zgenhostid: accept hostid arguments equal to zero.
A common usage pattern for zgenhostid, including in the ZFS dracut
module, is running it as:

  zgenhostid $(hostid)

However, zgenhostid only accepted hostid arguments greater than 0, which
meant that, when the output of hostid(1) was "00000000", zgenhostid
would error out, even though 0 is a possible return value for the
gethostid(3) function used by hostid(1):

- On current musl libc, gethostid(3) is a stub that always returns 0.
- On glibc, gethostid(3) will return 0 if /etc/hostid exists but is
  smaller than 4 bytes.

In these cases, it makes more sense for zgenhostid to treat a value of 0
as other parts of the zfs codebase do, meaning that a hostid value
couldn't be determined; therefore, it should attempt to generate a
random value to write into /etc/hostid.

The manpage and usage output have been updated to reflect this.

Whitespace has also been fixed in the usage output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11174
Closes #11189
2020-11-14 17:20:54 -08:00
Brian Behlendorf 4352edaafb Linux: Fix ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP usage
The ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP macros should not be used
in the Linux zpl_*.c source files.  They return a positive error
value which is correct for the common code, but not for the Linux
specific kernel code which expects a negative return value.  The
ZPL_ENTER/ZPL_EXIT/ZPL_VERFY_ZP macros should be used instead.

Furthermore, the ZPL_EXIT macro has been updated to not call the
zfs_exit_fs() function.  This prevents a possible deadlock which
can occur when a snapshot is automatically unmounted because the
zpl_show_devname() must never wait on in progress automatic
snapshot unmounts.

Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11169 
Closes #11201
2020-11-14 10:19:00 -08:00
Matthew Ahrens d66aab7c08 Assertion failure when logging large output of channel program
The output of ZFS channel programs is logged on-disk in the zpool
history, and printed by `zpool history -i`.  Channel programs can use
10MB of memory by default, and up to 100MB by using the `zfs program -m`
flag.  Therefore their output can be up to some fraction of 100MB.

In addition to being somewhat wasteful of the limited space reserved for
the pool history (which for large pools is 1GB), in extreme cases this
can result in a failure of `ASSERT(length <= DMU_MAX_ACCESS);` in
`dmu_buf_hold_array_by_dnode()`.

This commit limits the output size that will be logged to 1MB.  Larger
outputs will not be logged, instead a entry will be logged indicating
the size of the omitted output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11194
2020-11-14 10:17:16 -08:00
Ryan Moeller 7e3617de35 Return EFAULT at the end of zfs_write() when set
FreeBSD's VFS expects EFAULT from zfs_write() if we didn't complete
the full write so it can retry the operation.  Add some missing
SET_ERRORs in zfs_write().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11193
2020-11-14 10:16:26 -08:00
Brian Behlendorf b2255edcc0 Distributed Spare (dRAID) Feature
This patch adds a new top-level vdev type called dRAID, which stands
for Distributed parity RAID.  This pool configuration allows all dRAID
vdevs to participate when rebuilding to a distributed hot spare device.
This can substantially reduce the total time required to restore full
parity to pool with a failed device.

A dRAID pool can be created using the new top-level `draid` type.
Like `raidz`, the desired redundancy is specified after the type:
`draid[1,2,3]`.  No additional information is required to create the
pool and reasonable default values will be chosen based on the number
of child vdevs in the dRAID vdev.

    zpool create <pool> draid[1,2,3] <vdevs...>

Unlike raidz, additional optional dRAID configuration values can be
provided as part of the draid type as colon separated values. This
allows administrators to fully specify a layout for either performance
or capacity reasons.  The supported options include:

    zpool create <pool> \
        draid[<parity>][:<data>d][:<children>c][:<spares>s] \
        <vdevs...>

    - draid[parity]       - Parity level (default 1)
    - draid[:<data>d]     - Data devices per group (default 8)
    - draid[:<children>c] - Expected number of child vdevs
    - draid[:<spares>s]   - Distributed hot spares (default 0)

Abbreviated example `zpool status` output for a 68 disk dRAID pool
with two distributed spares using special allocation classes.

```
  pool: tank
 state: ONLINE
config:

    NAME                  STATE     READ WRITE CKSUM
    slag7                 ONLINE       0     0     0
      draid2:8d:68c:2s-0  ONLINE       0     0     0
        L0                ONLINE       0     0     0
        L1                ONLINE       0     0     0
        ...
        U25               ONLINE       0     0     0
        U26               ONLINE       0     0     0
        spare-53          ONLINE       0     0     0
          U27             ONLINE       0     0     0
          draid2-0-0      ONLINE       0     0     0
        U28               ONLINE       0     0     0
        U29               ONLINE       0     0     0
        ...
        U42               ONLINE       0     0     0
        U43               ONLINE       0     0     0
    special
      mirror-1            ONLINE       0     0     0
        L5                ONLINE       0     0     0
        U5                ONLINE       0     0     0
      mirror-2            ONLINE       0     0     0
        L6                ONLINE       0     0     0
        U6                ONLINE       0     0     0
    spares
      draid2-0-0          INUSE     currently in use
      draid2-0-1          AVAIL
```

When adding test coverage for the new dRAID vdev type the following
options were added to the ztest command.  These options are leverages
by zloop.sh to test a wide range of dRAID configurations.

    -K draid|raidz|random - kind of RAID to test
    -D <value>            - dRAID data drives per group
    -S <value>            - dRAID distributed hot spares
    -R <value>            - RAID parity (raidz or dRAID)

The zpool_create, zpool_import, redundancy, replacement and fault
test groups have all been updated provide test coverage for the
dRAID feature.

Co-authored-by: Isaac Huang <he.huang@intel.com>
Co-authored-by: Mark Maybee <mmaybee@cray.com>
Co-authored-by: Don Brady <don.brady@delphix.com>
Co-authored-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mmaybee@cray.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10102
2020-11-13 13:51:51 -08:00
Matthew Ahrens a724db0374 Channel program may spuriously fail with "memory limit exhausted"
ZFS channel programs (invoked by `zfs program`) are executed in a LUA
sandbox with a limit on the amount of memory they can consume.  The
limit is 10MB by default, and can be raised to 100MB with the `-m` flag.
If the memory limit is exceeded, the LUA program exits and the command
fails with a message like `Channel program execution failed: Memory
limit exhausted.`

The LUA sandbox allocates memory with `vmem_alloc(KM_NOSLEEP)`, which
will fail if the requested memory is not immediately available.  In this
case, the program fails with the same message, `Memory limit exhausted`.
However, in this case the specified memory limit has not been reached,
and the memory may only be temporarily unavailable.

This commit changes the LUA memory allocator `zcp_lua_alloc()` to use
`vmem_alloc(KM_SLEEP)`, so that we won't spuriously fail when memory is
temporarily low.  Instead, we rely on the system to be able to free up
memory (e.g. by evicting from the ARC), and we assume that even at the
highest memory limit of 100MB, the channel program will not truly
exhaust the system's memory.

External-issue: DLPX-71924
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11190
2020-11-11 17:16:15 -08:00
Brian Behlendorf c08d442e45 Linux: Fix mount/unmount when dataset name has a space
The custom zpl_show_devname() helper should translate spaces in
to the octal escape sequence \040.  The getmntent(2) function
is aware of this convention and properly translates the escape
character back to a space when reading the fsname.

Without this change the `zfs mount` and `zfs unmount` commands
incorrectly detect when a dataset with a name containing spaces
is mounted.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11182 
Closes #11187
2020-11-11 17:14:24 -08:00
Mateusz Guzik 18ca574f0a G/C data_alloc_arena
It is a leftover from illumos always set to NULL and introducing a
spurious difference between zio_buf and zio_data_buf.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11188
2020-11-11 17:11:32 -08:00
Tony Perkins 9bd14b8724 Start snapdir_iterate traversals to begin wtih the value of zero.
The microzap hash can sometimes be zero for single digit snapnames.
The zap cursor can then have a serialized value of two (for . and ..),
and skip the first entry in the avl tree for the .zfs/snapshot directory
listing, and therefore does not return all snapshots.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: Tony Perkins <tperkins@datto.com>
Closes #11039
2020-11-11 17:06:16 -08:00
Adrian Chadd 3fcd737478 Fix compiling on FreeBSD + gcc - don't assume illmnos bits
This looks like it was once from the illumnos compat code.
FreeBSD doesn't have cmn_err as a compiler format attribute, so
it definitely errors out.

It doesn't show up on LLVM because it doesn't trigger at all.

Add in the format flags but keep them behind #if 0 for now;
there are too many format issues that trigger when one does
format checking in the shared code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes #11068
Closes #11069
2020-11-10 15:54:12 -08:00
Adrian Chadd 79a357c2a1 Fix pointer-is-uint64_t-sized assumption in the ioctl path
This shows up when compiling freebsd-head on amd64 using gcc-6.4.
The lib32 compat build ends up tripping over this assumption.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes #11068
Closes #11069
2020-11-10 15:53:13 -08:00
sterlingjensen a4ae4998cb Fix memleak in cmd/mount_zfs.c
Convert dynamic allocation to static buffer, simplify parse_dataset
function return path. Add tests specific to the mount helper.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11098
2020-11-10 15:50:44 -08:00
наб b60ae3a5dc zpoolprops.8: clarify vdev expansion rules
Remove reference to EFI(?), explain that the new space
is beyond the GPT for whole-disk vdevs, and add section noting how it
behaves with partition vdevs in terms of how the user is most likely to
encounter it ‒ the previous phrasing was confusing
and seemed to indicate that "zpool online -e" will be able to claim

  GPT[whatever, ZFS, free space, whatever]

into

  GPT[whatever, ZFS, whatever]
but that's not the case, as it'll only be able to do so after manually
resizing the ZFS partition to include the free space beforehand, i.e.:
  GPT[whatever, ZFS, free space, whatever]
  GPT[whatever, [ZFS + free space], potentially left-overs, whatever]
  # zpool online -e
  GPT[whatever, ZFS, whatever]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11158
2020-11-10 12:48:26 -08:00
Mateusz Guzik 1a0b4f566c G/C struct znode -> z_moved
The field is yet another leftover from unsupported zfs_znode_move.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11186
2020-11-10 12:42:47 -08:00
Pavel Zakharov 0f66201dbc initramfs: zfsunlock hook breaks /usr/bin
The copy_exec() function expects that the full path of the target 
file is passed rather than just the directory, and will take care 
of creating the underlying directories if they don't exist.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Closes #11162
2020-11-10 11:12:07 -08:00
Ryan Moeller 7b42f09049 FreeBSD: Simplify zvol_geom_open and zvol_cdev_open
We can consolidate the unlocking procedure into one place by starting
with drop_suspend set to B_FALSE and moving the open count check up.

While here, a little code cleanup. Match the out labels between
zvol_geom_open and zvol_cdev_open, and add a missing period in some
comments.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175
2020-11-10 11:08:10 -08:00
Ryan Moeller 2186ed33f1 FreeBSD: Avoid spurious EINTR in zvol_cdev_open
zvol_first_open can fail with EINTR if spa_namespace_lock is not held
and cannot be taken without waiting.

Apply the same logic that was done for zvol_geom_open to take
spa_namespace_lock if not already held on first open in zvol_cdev_open.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175
2020-11-10 11:07:25 -08:00
Ryan Moeller d1dd72a2c5 Simplify offset and length limit in zfs_write
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:58:59 -08:00
Ryan Moeller 9a764716fc Const some unchanging variables in zfs_write
Show that these values will not be changing later.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:58:59 -08:00
Ryan Moeller 2074dfd0e9 FreeBSD: Move uio_prefaultpages def to uio.h
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:58:59 -08:00
Ryan Moeller 8a9634e2f3 Remove redundant oid parameter to update_pages
The oid comes from the znode we are already passing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:54:30 -08:00
Ryan Moeller eec6646ea9 Factor uid, gid, and projid out of loop in zfs_write
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:53:19 -08:00
Alexander Motin daabddaac1 Fix dmu_tx_dirty_throttle after arc_c reduction
After initial arc_c was reduced to arc_c_min it became possible that
on datasets with primarycache=metadata or none dirty data make up most
of ARC capacity and easily more than configured 50% of initial arc_c,
that causes forced txg commits by arc_tempreserve_space() and periodic
very long write delays.

This patch makes arc_tempreserve_space() to use arc_c only after ARC
warmed up once and arc_c really means something, but use arc_c_max
before that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11178
2020-11-10 10:39:26 -08:00
Matthew Macy 570d7038d0 Fix dnode refcount tracking
Fix a couple of places where the wrong tag is passed
to dnode_{hold, rele}

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11184
2020-11-10 10:37:10 -08:00
Ryan Moeller 52e585a822 ZTS: Add L1 corruption test
Add a new test case which corrupts all level 1 block in a file.
Then verifies that corruption is detected and repaired.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141
2020-11-05 17:16:25 -08:00
Ryan Moeller cce66dfa8e ZTS: Output all block copies in list_file_blocks
The second part of list_file_blocks transforms the object description
output by zdb -ddddd $ds $objnum into a stream of lines of the form
"level path offset length" for the indirect blocks in the given file.
The current code only works for the first copy of L0 blocks.  L1 and
L2 indirect blocks have more than one copy on disk.

Add one more -d to the zdb command so we get all block copies and
rewrite the transformation to match more than L0 and output all DVAs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141
2020-11-05 17:16:21 -08:00
Ryan Moeller 94deb47872 ZTS: Fix list_file_blocks for mirror vdevs, level > 0
The first part of list_file_blocks transforms the pool configuration
output by zdb -C $pool into shell code to set up a shell variable,
VDEV_MAP, that maps from vdev id to the underlying vdev path. This
variable is a simple indexed array. However, the vdev id in a DVA is
only the id of the top level vdev.

When the pool is mirrored, the top level vdev is a mirror and its
children are the mirrored devices. So, what we need is to map from
the top level vdev id to a list of the underlying vdev paths.
ist_file_blocks does not need to work for raidz vdevs, so we can
disregard that case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141
2020-11-05 17:16:16 -08:00
Mariusz Zaborski ae37ceadaa FreeBSD: Prevent a NULL reference in zvol_cdev_open
Check if the ZVOL has been written before calling zil_async_to_sync.
The ZIL will be opened on the first write, not earlier.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #11152
2020-11-05 17:02:19 -08:00
khng300 a4246bce50 FreeBSD: Prevent NULL pointer dereference of resid
spa_config_load() passes NULL into resid when doing zfs_file_read().
This would trip over when vfs.zfs.autoimport_disable=0.

Sponsored by: The FreeBSD Foundation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ka Ho Ng <khng@freebsdfoundation.org>
Closes #11149
2020-11-04 16:50:08 -08:00
Antonio Russo 71ae6a9d23 Synchronize library ABI levels
Bump library SOVERSION under Linux to match FreeBSD's.

Additionally, this bump properly accounts for the ABI changes relative
to ZoL 0.8.5 for the Linux build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Issue #11144
2020-11-03 09:24:43 -08:00
Ryan Moeller 181b2adc2a FreeBSD: zvol_os: Use SET_ERROR more judiciously
SET_ERROR is useful to trace errors, so use it where the errors occur
rather than factored out to the end of a function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11146
2020-11-03 09:21:09 -08:00
Brian Behlendorf 2d7843401a ZTS: zdb_block_size_histogram increase variance
The expected variance for this test case was originally set at 10%
based on local testing.  Additional testing via the CI has show it
can be as large as 11%.  Increase the expected maximum to 12% to
prevent this test from incorrectly failing.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11148
2020-11-03 09:20:34 -08:00
Brian Behlendorf 123d2803bf ZTS: Wait on all events in events_001_pos.ksh
The events_001_pos.ksh test case can fail because it's possible,
and correct, for the config_sync event to be posted after the last
"expected" event.  To accommodate this the run_and_verify() function
has been updated to wait for all non-history events, not just the
last event.  This does not increase the run time of the test as
long as all the events do get generated.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11147
2020-11-03 09:19:45 -08:00
Coleman Kane 59b6872327 Linux 5.10 compat: revalidate_disk_size() added
A new function was added named revalidate_disk_size() and the old
revalidate_disk() appears to have been deprecated. As the only ZFS
code that calls this function is zvol_update_volsize, swapping the
old function call out for the new one should be all that is required.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-02 22:01:19 +00:00
Coleman Kane ae15f1c1d8 Linux 5.10 compat: check_disk_change() removed
Kernel 5.10 removed check_disk_change() in favor of callers using
the faster bdev_check_media_change() instead, and explicitly forcing
bdev revalidation when they desire that behavior. To preserve prior
behavior, I have wrapped this into a zfs_check_media_change() macro
that calls an inline function for the new API that mimics the old
behavior when check_disk_change() doesn't exist, and just calls
check_disk_change() if it exists.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-02 22:01:19 +00:00
Coleman Kane 838a249012 Linux 5.10 compat: percpu_ref added data member
Kernel commit 2b0d3d3e4fcfb brought in some changes to the struct
percpu_ref structure that moves most of its fields into a member
struct named "data" of type struct percpu_ref_data. This includes
the "count" member which is updated by vdev_blkg_tryget(), so update
this function to chase the API change, and detect it via configure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-02 22:01:19 +00:00
Brian Behlendorf 8c7d604c62 Linux 5.10 compat: frame.h renamed objtool.h
In Linux 5.10 the linux/frame.h header was renamed linux/objtool.h.
Add a configure check to detect and use the correctly named header.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11085
2020-11-02 22:01:10 +00:00
Sebastian Gottschall 7eefaf0ca0 Optimize locking checks in mempool allocator
Avoid checking the whole array of objects each time by removing the self
organized memory reaping. this can be managed by the global memory reap
callback which is called every 60 seconds. this will reduce the use if
locking operations significant.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes #11126
2020-11-02 12:10:07 -08:00
Christian Schwarz ab8c935ea6 zfs_vnops: make zfs_get_data OS-independent
Move zfs_get_data() in to platform-independent code. The only
platform-specific aspect of it is the way we release an inode 
(Linux) / vnode_t (FreeBSD). I am not aware of a platform that
could be supported by ZFS that couldn't implement zfs_rele_async 
itself. It's sibling zvol_get_data already is platform-independent.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #10979
2020-11-02 12:07:07 -08:00
Mateusz Guzik 09eb36ce3d Introduce CPU_SEQID_UNSTABLE
Current CPU_SEQID users don't care about possibly changing CPU ID, but
enclose it within kpreempt disable/enable in order to fend off warnings
from Linux's CONFIG_DEBUG_PREEMPT.

There is no need to do it. The expected way to get CPU ID while allowing
for migration is to use raw_smp_processor_id.

In order to make this future-proof this patch keeps CPU_SEQID as is and
introduces CPU_SEQID_UNSTABLE instead, to make it clear that consumers
explicitly want this behavior.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11142
2020-11-02 11:51:12 -08:00
Matthew Macy 8583540c6e Consolidate zfs_holey and zfs_access
The zfs_holey() and zfs_access() functions can be made common
to both FreeBSD and Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11125
2020-10-31 09:40:08 -07:00
Ryan Moeller 2f94e8f09e Remove duplicate cond_resched() definition
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11131
2020-10-31 09:37:56 -07:00
Ryan Moeller 65a343bbd3 zvol_os: Fix handling of zvol private data
zvol private data is supposed to be nulled by zvol_clear_private before
zvol_free is called as an indicator that the zvol is going away.

Implement zvol_clear_private for volmode=dev.

Assert that zvol_clear_private has been called before zvol_free.

Check that zvol_clear_private has not been called when updating
volsize.  If it has, fail with ENXIO.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:49 -07:00
Ryan Moeller 277884ab42 zvol_os: Don't leak doi in cdev error path
Make sure to free doi in zvol_create_minor impl when make_dev_s fails.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:43 -07:00
Ryan Moeller 9a0ef216e5 zvol_os: Properly ignore error in volmode lookup
We fall back to a default volmode and continue when looking up a zvol's
volmode property fails.  After this we should set the error to 0 to
ensure we take the success paths in the out section.

While here, make sure we only log that the zvol was created on success.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:36 -07:00
Ryan Moeller 1a6a75ac07 zvol_os: Code cleanup in zvol_create_minor_impl
Nonfunctional changes for readability and consistency.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:30 -07:00
Ryan Moeller 260f6a28af zvol_os: Keep better track of open count in close
zvol_geom_close gets a count of the number of close operations to do.

Make sure we're always using this count to check if this will be the
last close operation performed on the zvol.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:23 -07:00
Ryan Moeller 0b32d81783 zvol_os: Tidy up asserts
Using more specific assert variants gives better messages on failure.

No functional change.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:15 -07:00
Mateusz Guzik c4ede65bdf zstd: track allocator statistics
Note that this only tracks sizes as requested by the caller.
Actual allocated space will almost always be bigger (e.g., rounded up to
the next power of 2 or page size). Additionally the allocated buffer may
be holding other areas hostage. Nonetheless, this is a starting point
for tracking memory usage in zstd.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11129
2020-10-30 15:26:10 -07:00
Attila Fülöp e8beeaa111 ICP: gcm: Allocate hash subkey table separately
While evaluating other assembler implementations it turns out that
the precomputed hash subkey tables vary in size, from 8*16 bytes
(avx2/avx512) up to 48*16 bytes (avx512-vaes), depending on the
implementation.

To be able to handle the size differences later, allocate
`gcm_Htable` dynamically rather then having a fixed size array, and
adapt consumers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11102
2020-10-30 15:24:21 -07:00
Attila Fülöp d9655c5b37 Add some missing cfi frame info in aesni-gcm-x86_64.S
While preparing #9749 some .cfi_{start,end}proc directives
were missed. Add the missing ones.

See upstream https://github.com/openssl/openssl/commit/275a048f

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #11101
2020-10-30 15:23:18 -07:00
Mateusz Guzik 115216cc92 FreeBSD: catch up with 1300124 version bump
- use cache_vop_mkdir
- cache_rename -> cache_vop_rename

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11136
2020-10-30 15:22:04 -07:00
Ryan Moeller d1e4ded7bc FreeBSD: Fix 12.2-STABLE after AT_BENEATH MFC
AT_BENEATH was merged to stable/12, where kern_unlinkat takes a 
non-const path.  DECONST the path passed to kern_unlinkat in the 
case where AT_BENEATH is defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11139
2020-10-30 15:19:02 -07:00
Matthew Macy 5fa356ea44 Remove UIO_ZEROCOPY functions structures
The original xuio zero copy functionality has always been unused 
on Linux and FreeBSD.  Remove this disabled code to avoid any
confusion and improve readability.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11124
2020-10-30 10:00:33 -07:00
Alexander Motin 1199c3e8fb Yield periodically when rebuilding L2ARC
L2ARC devices of several terabytes filled with 4KB blocks may take 15
minutes to rebuild.  Due to the way L2ARC log reading is implemented
it is quite likely that for all that time rebuild thread will never
sleep.  At least on FreeBSD kernel threads have absolute priority and
can not be preempted by threads with lower priorities.  If some thread
is also bound to that specific CPU it may not get any CPU time for all
the 15 minutes.

Reviewed-by: Cedric Berger <cedric@precidata.com>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11116
2020-10-30 08:57:54 -07:00
Ryan Moeller 76d04993a6 Update references to nonexistent man pages in code
Refer to the correct section or alternative for FreeBSD and Linux.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11132
2020-10-30 08:55:59 -07:00
Alexander Motin e3a6ac8d06 FreeBSD: Remove BIO_ORDERED flag from BIO_FLUSH
ZFS always waits for the write completion before flushing the cache.
That is why it does not require explicit ordering fences around it,
which are pretty difficult to implement for NVMe, since one has no
internal concept of strict request ordering.

This was already removed from FreeBSD once, but got resurrected
by mistake during OpenZFS merge.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11130
2020-10-30 08:50:57 -07:00
Tony Hutter f829227f49 ZTS: Fix xattr_004_pos failure, don't use tmpfs
Previously, xattr_004_pos would create files with xattrs on both
tmpfs and ext2, and then copy them to zfs to verify that their
xattrs were preserved.  However tmpfs doesn't support xattrs.

This was never noticed until Fedora 33.  In Fedora 32 and older,
/tmp was on the root partition (like ext4), whereas on Fedora 33
/tmp is actually tmpfs.  That caused this test to fail on Fedora 33.

This fix updates the test to only create the file on ext2, not tmpfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #11133
2020-10-30 08:47:42 -07:00
Mateusz Guzik 973ba682f5 Linux: g/c leftover fence in zfs_znode_alloc
The port removed provisions for zfs_znode_move but the cleanup missed
this bit. To quote the original:

[snip]
    list_insert_tail(&zfsvfs->z_all_znodes, zp);
    membar_producer();
    /*
     * Everything else must be valid before assigning z_zfsvfs makes the
     * znode eligible for zfs_znode_move().
     */
    zp->z_zfsvfs = zfsvfs;
[/snip]

In the current code it is immediately followed by unlock which issues
the same fence, thus plays no role in correctness.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11115
2020-10-29 09:54:20 -07:00
Mateusz Guzik 082ff328f2 FreeBSD: g/c unused zfs_znode_move support
The allocator does not provide the functionality to begin with.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11114
2020-10-29 09:52:50 -07:00
Brian Behlendorf 4ce728d028 Use known license string for zlua
The Linux kernel MODULE_LICENSE macro only recognizes a handful of
license strings and "MIT" is not one of the them.  Update the macro
to use "Dual MIT/GPL" which is recognized and what the kernel expects
MIT licensed modules to use.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11112
Closes #11113
2020-10-27 09:43:36 -07:00
Ryan Moeller 5c810ac499 FreeBSD: Skip RAW kstat sysctls by default
These kstats are often expensive to compute so we want to avoid them
unless specifically requested.

The following kstats are affected by this change:

kstat.zfs.${pool}.multihost
kstat.zfs.${pool}.misc.state
kstat.zfs.${pool}.txgs
kstat.zfs.misc.fletcher_4_bench
kstat.zfs.misc.vdev_raidz_bench
kstat.zfs.misc.dbufs
kstat.zfs.misc.dbgmsg

In FreeBSD 13, sysctl(8) has been updated to still list the
names/description/type of skipped sysctls so they are still
discoverable.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11099
2020-10-26 14:34:28 -07:00
Mateusz Guzik 01a65c5861 FreeBSD: catch up with 1300123 version bump
- removed thread argument from VOP_INACTIVE
- removed cred argument from VOP_VPTOCNP

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11104
2020-10-26 14:32:17 -07:00
Cy Schubert 3928ec5339 Restore identification of VDEVs using non-native block size
NAME         STATE     READ WRITE CKSUM
dsk02        ONLINE       0     0     0
  mirror-0   ONLINE       0     0     0
    ada1s4a  ONLINE       0     0     0
    ada2s4a  ONLINE       0     0     0  block size: 512B configured, 4096B native

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Toomas Soome <tsoome@me.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed off by: Cy Schubert <cy@FreeBSD.org>
Closes #11088
2020-10-22 12:15:17 -07:00
xtouqh 1e36af8c7b Properly format NAME subsection of zfs/zpool subcommands
Use proper names (i.e. zfs-allow and zpool-add) in NAME subsections
of zfs/zpool subcommands instead of current "pretty-printed" ones as
makewhatis utilities (or some implementations of it, namely the one
from mandoc suite used in FreeBSD) look not only at the document title
but also in NAME subsection, adding zfs(8)/zpool(8) to search results
which is not correct. (Common sense and other utilities splitting
subcommands in multiple man pages, e.g. git, do the same.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: xtouqh <xtouqh@hotmail.com>
Closes #11086
2020-10-22 11:28:10 -07:00
Ryan Moeller eb02a4c6fb Add missing zfs_arc_evict_batch_limit tunable
It's even documented already.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11094
2020-10-22 10:18:26 -07:00
Ryan Moeller 2aaab887bb arcstat: Add -a and -p options from FreeNAS
Added -a option to automatically print all valid statistics.
Added -p option to suppress scaling of printed data.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored by: Nick Principe <32284693+powernap@users.noreply.github.com>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11090
2020-10-21 14:09:14 -07:00
Matthew Macy e53d678d4a Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD
The zfs_fsync, zfs_read, and zfs_write function are almost identical
between Linux and FreeBSD.  With a little refactoring they can be
moved to the common code which is what is done by this commit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11078
2020-10-21 14:08:06 -07:00
Adam D. Moss 666aa69f32 Non-l2arc pool reads shouldn't be l2arc misses
The current l2_misses accounting behavior treats all reads to pools 
without a configured l2arc as an l2arc miss, IFF there is at least 
one other pool on the system which does have an l2arc configured.

This makes it extremely hard to tune for an improved l2arc hit/miss 
ratio because this ratio will be modulated by reads from pools which 
do not (and should not) have l2arc devices; its upper limit will 
depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools.

This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is 
the only relevant one) where the target spa doesn't have an l2arc.

Includes new test - l2arc_l2miss_pos.ksh

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #10921
2020-10-20 11:39:52 -07:00
Kyle Evans 241c62bdd7 Makefile.bsd: remove directory that no longer exists
This was removed in a reorganization of directories preparing for the
merge of FreeBSD support, 006e9a4088 by mmacy. While llvm is perfectly
happy with the nonexistent -I directory, the gcc6 and gcc9 we can elect
to use as cross-toolchains both trip over it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #11077
2020-10-20 11:34:59 -07:00
Matthew Macy ff2f54246d FreeBSD: delete unreferenced file
zfs_onexit_os.c was not deleted when it was removed from the build

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11079
2020-10-20 08:53:16 -07:00
Ryan Moeller 777b8ccc35 Fix commitcheck on FreeBSD
Convert from bash to sh, avoid Perl regexes and \s, prune unused
functions.

Reviewed-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11070
2020-10-20 08:35:53 -07:00
Don Brady 13d65987a9 zed syslog entries drop important info
ZED will log zevents summaries to the syslog, however the log entries 
tend to drop event details that can be useful for diagnosis. This is 
especially true for ereport events, like io, checksum, and delay.

Update the all-syslog.sh script to log additional event information.

Add an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc
for choosing GUIDs over names for pool and vdev.

Change the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event 
events. These events tend to be frequent, convey no meaningful info, 
and are already logged in the zpool history.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #10967
2020-10-19 11:01:00 -07:00
Ryan Moeller ab6a0e236e Ignore zpool_influxdb binary
This was requested but forgotten in #10786.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11071
2020-10-16 13:21:28 -07:00
Mateusz Guzik 41e2b3de13 FreeBSD: add missing fplookup_vexec handler to special vop vectors
Otherwise lookup can fail with EOPNOTSUPP or panic.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11066
2020-10-15 14:49:06 -07:00
Mateusz Guzik 34cda44af6 FreeBSD: g/c unused vop vector zfsctl_ops_shares_dir
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11066
2020-10-15 14:48:20 -07:00
Don Brady dff71c7936 Ignore special vdev ashift for spa ashift min/max
The removal of a vdev in the normal class would fail if there was a 
special or deup vdev that had a different ashift than the vdevs in 
the normal class.

Moved the initialization of spa_min_ashift / spa_max_ashift from 
vdev_open so that it occurs after the vdev allocation bias was 
initialized (i.e. after vdev_load).

Caveat -- In order to remove a special/dedup vdev it must have the 
same ashift as the normal pool vdevs.  This could perhaps be lifted 
in the future (i.e. for the case where there is ample space in any 
surviving special class vdevs)

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #9363
Closes #9364
Closes #11053
2020-10-15 14:45:16 -07:00
Christian Schwarz 15a4ca4620 Fix crash caused by invalid snapshot names in redactnvl
This is a follow up fix for commit 0fdd6106bb.  The VERIFY is
only true when we haven't hit an error code path.  See added
test case for a reproducer.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11048
2020-10-14 14:04:19 -07:00
Paul Dagnelie 6a60ef80e2 Fix incorrect deletion order in range_tree_add_impl gap case
After a side-effectful call like add or remove, references to range 
segs stored in btrees can no longer be used safely.  We move the 
remove call to just before the reinsertion call so that the seg 
remains valid for as long as we need it.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11044 
Closes #11056
2020-10-14 08:59:54 -07:00
Mateusz Guzik 47a7e99939 FreeBSD: fix panic due to tqid overflow
The 32-bit counter eventually wraps to 0 which is a sentinel for invalid
id.

Make it 64-bit on LP64 platforms and 0-check otherwise.

Note: Linux counterpart uses id stored per queue instead of a global.
I did not check going that way is feasible with the goal being the
minimal fix doing the job.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11059
2020-10-14 08:57:03 -07:00
Ryan Moeller 485b50bb9e Cross-platform acltype
The acltype property is currently hidden on FreeBSD and does not
reflect the NFSv4 style ZFS ACLs used on the platform.  This makes it
difficult to observe that a pool imported from FreeBSD on Linux has a
different type of ACL that is being ignored, and vice versa.

Add an nfsv4 acltype and expose the property on FreeBSD.

Make the default acltype nfsv4 on FreeBSD.

Setting acltype to an unhanded style is treated the same as setting
it to off.  The ACLs will not be removed, but they will be ignored.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10520
2020-10-13 21:25:48 -07:00
Warner Losh b302185a92 FreeBSD: make adjustments for the standalone environment
In FreeBSD, there are three compile environments that are supported:
user land, the kernel and the bootloader / standalone. Adjust the
headers to compile in the standalone environment. Limit kernel-only
items from view when _STANDALONE is defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Warner Losh <imp@FreeBSD.org>
Closes #10998
2020-10-13 21:05:49 -07:00
Matthew Macy 57dc5d42b1 dmu_zfetch: don't leak unreferenced stream when zfetch is freed
Currently streams are only freed when:
  - They have no referencing zfetch and and their I/O references
    go to zero.
  - They are more than 2s old and a new I/O request comes in on
    the same zfetch.

This means that we will leak unreferenced streams when their zfetch
structure is freed.

This change checks the reference count on a stream at zfetch free
time. If it is zero we free it immediately. If it has remaining
references we allow the prefetch callback to free it at I/O
completion time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11052
2020-10-13 21:03:36 -07:00
Warner Losh 6ba2e72b78 aarch64: Use proper guards for NEON instructions
The zstd code assumes that if you are on aarch64, you have NEON
instructions. This is not necessarily true. In a boot loader, where
you might not have the VFP properly initialized, these instructions
may not be available. It's also an error to include arm_neon.h when
the NEON insturctions aren't enabled. Change the guards for using the
NEON instructions from __aarch64__ to __ARM_NEON which is the standard
symbol for knowing if they are available.

__ARM_NEON is the proper symbol, defined in ARM C Language Extensions
Release 2.1 (https://developer.arm.com/documentation/ihi0053/d/). Some
sources suggest __ARM_NEON__, but that's the obsolete spelling from
prior versions of the standard.

Updated based on zstd pull request https://github.com/facebook/zstd/pull/2356

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@bsdimp.com>
Closes #11055
2020-10-13 21:01:40 -07:00
Adam D. Moss 92286311f8 Add zfs.sh module unload error message
If modules fail to unload because of outstanding users, don't 
consider this a success.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #11042
2020-10-13 16:51:54 -07:00
Christian Schwarz 701f656b97 dmu.h: remove stale declaration dmu_objset_snapshot_tmp
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11047
2020-10-13 16:46:00 -07:00
Mateusz Guzik 2ce14cdf68 FreeBSD: use cache_rename if available
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11045
2020-10-13 16:41:26 -07:00
Mathieu Velten b3a216fba0 blkg_tryget config test: initialize struct
Missing struct initialization in a config test results in the
interface being incorrectly detected.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Mathieu Velten <matmaul@gmail.com>
Closes #10713 
Closes #11049
2020-10-13 16:36:36 -07:00
Kjeld Schouten-Lebbing 4b59c195ff Increase Supported Linux Kernel to 5.9
This increases the Linux kernel version to 5.9 from 5.8
as most compatibility fixes should already be included.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #11050
2020-10-13 09:51:13 -07:00
Ryan Moeller e191b60ddc FreeBSD: Improve libzfs_error_init messages
It is a common mistake to have failed to autoload the module due to
permission issues when running a ZFS command as a user.  "Operation
not permitted" is an unhelpfully vague error message.

Use a thread-local message buffer to format a nicer error message.
We can infer that loading the kernel module failed if the module is
not loaded.  This can be extended with heuristics for other errors
in the future.

While looking at this stuff, remove an unused thread-local message
buffer found in libspl and remove some inaccurate verbiage from the
comment on libzfs_load_module.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11033
2020-10-13 09:38:40 -07:00
Ryan Moeller 7dfc56d866 Expose zfetch_max_idistance tunable
FreeBSD had this value tunable before the switch to the new OpenZFS.
The tunable name has changed, breaking legacy compat.

Restore legacy compat for this tunable, properly expose the tunable
with the new name on all platforms, and document it in
zfs-module-parameters(5).

While here, clean up the documentation for zfetch_max_distance a bit.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11038
2020-10-13 09:32:34 -07:00
Christian Schwarz 61868bb14d zil_parse: make callback parameters const
Code cleanup, a follow up commit to 4d55ea81.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Ryan Moeller <ryan@freqlabs.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11020
2020-10-09 09:34:54 -07:00
Richard Elling e9527d44e6 Add zpool_influxdb command
A zpool_influxdb command is introduced to ease the collection
of zpool statistics into the InfluxDB time-series database.
Examples are given on how to integrate with the telegraf
statistics aggregator, a companion to influxdb.

Finally, a grafana dashboard template is included to show
how pool latency distributions can be visualized in a
ZFS + telegraf + influxdb  + grafana environment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #10786
2020-10-09 09:29:21 -07:00
Ryan Moeller b7ab7ae241 Linux: Initialize zp in zfs_setattr_dir
The value of zp is used without having been initialized under some
conditions.  Initialize the pointer to NULL.

Add a regression test case using chown in acl/posix.  However, this is
not enough because the setup sets xattr=sa, which means zfs_setattr_dir
will not be called.  Create a second group of acl tests in acl/posix-sa
duplicating the acl/posix tests with symlinks, and remove xattr=sa from
the original acl/posix tests.  This provides more coverage for the
default xattr=on code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10043
Closes #11025
2020-10-09 09:27:14 -07:00
Brian Behlendorf d0249a4bd0 Replace ZFS on Linux references with OpenZFS
This change updates the documentation to refer to the project
as OpenZFS instead ZFS on Linux.  Web links have been updated
to refer to https://github.com/openzfs/zfs.  The extraneous
zfsonlinux.org web links in the ZED and SPL sources have been
dropped.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11007
2020-10-08 20:10:13 -07:00
Jacob Adams 07f5d4d663 Fix Linux modules uninstall
A missing semicolon between kmoddir variable declaration and the
uninstall for loop caused modules_uninstall-Linux to fail with:

    Syntax error: "do" unexpected

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jacob Adams <jacob@tookmund.com>
Closes #11032
2020-10-08 20:07:10 -07:00
Ryan Moeller 84cfe5d4f8 ZTS: Fix path to /dev/null in nopwrite_recsize
Don't direct stdout and stderr of dd to $TEST_BASE_DIR/null,
direct it to /dev/null.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11026
2020-10-08 16:39:23 -07:00
Chuck Tuffli a8fc1b8743 Fix ubsan: shift exponent is too large
When running libzpool with the Undefined Behavior Sanitizer (ubsan)
enabled, a zpool create causes a run-time error:

    module/zfs/vdev_label.c:600:14: runtime error: shift exponent 64 is
    too large for 64-bit type 'long long unsigned int'`

in vdev_config_generate()

Fix is to convert vdev_removal_max_span to its base-2 logarithm, using
highbit64(), and then compare the "shifts".

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Chuck Tuffli <ctuffli@gmail.com>
Closes #9744
Closes #11024
2020-10-08 16:37:27 -07:00
Christian Schwarz 36482bf607 libzfs_sendrecv: zfs_send: remove unused pipefd and tid variables
fixup of 196bee4

On gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), the code removed
caused `-Wmaybe-uninitialized` errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #11021
2020-10-08 09:43:51 -07:00
Ryan Moeller 73989f4b9e Make dbufstat work on FreeBSD
With procfs_list kstats implemented for FreeBSD, dbufs are now exposed
as kstat.zfs.misc.dbufs.

On FreeBSD, dbufstats can use the sysctl instead of procfs when no
input file has been given.

Enable the dbufstats tests on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11008
2020-10-08 09:40:23 -07:00
Ryan Moeller 82b81a2acd FreeBSD: Sort and dedup includes in kmod_core
Code cleanup. Sort includes, remove duplicates, and drop
some extra blank lines in kmod_core.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #11000
2020-10-08 09:37:56 -07:00
George Melikov b0c9239f17 docs: update README's installation link
OpenZFS is a cross-OS project now.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11022
2020-10-08 09:33:53 -07:00
George Amanakis a76e4e6761 Make L2ARC tests more robust
Instead of relying on arbitrary timers after pool export/import or cache
device off/online rely on arcstats. This makes the L2ARC tests more
robust. Also cleanup some functions related to persistent L2ARC.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10983
2020-10-05 15:29:05 -07:00
Toomas Soome 4e84f67a96 zdb should not output binary data on terminal
The zdb is interpreting byte array as textual string in dump_zap,
but there are also binary arrays and we should not output binary
data on terminal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
External-issue: https://www.illumos.org/issues/12012
External-issue: https://www.illumos.org/issues/11713
Closes #11006
2020-10-05 14:05:28 -07:00
Ryan Moeller 79f0935fab FreeBSD: Sort out kernel FPU headers for 12.1-REL
We were missing an include for kernel FPU functions, breaking the build
on FreeBSD 12.1-RELEASE.  This was apparently being pulled in from
elsewhere on stable/12 and head.

Sorted the other includes in these files while here.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11005
2020-10-02 17:48:45 -07:00
Alan Somers a132c2b413 Fix EIO after resuming receive of new dataset over an existing one
When resuming an interrupted ZFS send stream that creates a new dataset
with the same name as an existing dataset, if the existing dataset is
accessed after the failed receive, then after the subsequent successful
receive it will return EIO. This happens because nothing mounts the new
dataset, leaving the old, no longer valid dataset still mounted.

This commit fixes zfs receive to always unmount and remount the
destination, regardless of whether the stream is a new stream or a
resumed stream.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249579
Closes #10995
Closes #10999
2020-10-02 17:47:09 -07:00
Ryan Moeller 4d55ea811d Throw const on some strings
In C, const indicates to the reader that mutation will not occur.
It can also serve as a hint about ownership.

Add const in a few places where it makes sense.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10997
2020-10-02 17:44:10 -07:00
John Poduska 5b525165e9 Mismatched nvlist names in zfs_keys_send_space
This causes "zfs send -vt ..." to fail with:

    cannot resume send: Unknown error 1030

It turns out that some of the name/value pairs in the verification
list for zfs_ioc_send_space(), zfs_keys_send_space, had the wrong
name, so the ioctl got kicked out in zfs_check_input_nvpairs().
Update the names accordingly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #10978
2020-10-02 17:40:46 -07:00
Brian Behlendorf 266d1121c3 Fix buggy procfs_list_seq_next warning
The kernel seq_read() helper function expects ->next() to update
the passed position even there are no more entries.  Failure to
do so results in the following warning being logged.

    seq_file: buggy .next function procfs_list_seq_next [spl]
    did not update position index

Functionally there is no issue with the way procfs_list_seq_next()
is implemented and the warning is harmless.  However, we want to
silence this some what scary incorrect warning.  This commit
updates the Linux procfs code to advance the position even for
the last entry.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10984 
Closes #10996
2020-09-30 13:27:51 -07:00
Ryan Moeller d688beb191 FreeBSD: Fix legacy compat for platform IOCs
The request number is out of bounds of the platform table.

Subtract the starting offset to get the correct subscript.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10994
2020-09-30 13:25:50 -07:00
Matthew Macy 1cb8202b1b Eliminate gratuitous bzeroing in dbuf_stats_hash_table_data
`dbuf_stats_hash_table_data` can take much longer than it needs to
by repeatedly bzeroing its buffer when in fact the buffer only needs
to be NULL terminated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10993
2020-09-30 13:24:38 -07:00
Sebastian Gottschall 8a171ccd92 do a cyclic seek for unused memory objects in pool
In non regular use cases allocated memory might stay persistent in memory
pool. This small patch checks every minute if there are old objects which
can be released from memory pool.

Right now with regular use, the pool is checked for old objects on each
allocation attempt from this pool. so basically polling by its use. Now
consider what happens if someone writes a lot of files and stops use of
the volume or even unmounts it. So the code will no longer check if
objects can be released from the pool. Already allocated objects will
still stay in pool cache. this is no big issue for common use. But
someone discovered this issue while doing tests. personally i know this
behavior and I'm aware of it. Its no big issue. just a enhancement

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes #10938 
Closes #10969
2020-09-30 13:22:34 -07:00
Ryan Moeller c0bd2e0fe2 Drop references when skipping dmu_send due to EXDEV
When an invalid incremental send is requested where the "to" ds is
before the "from" ds, make sure to drop the reference to the pool
and the dataset before returning the error.

Add an assert on FreeBSD to make sure we don't hold any locks after
returning from an ioctl.

Add some test coverage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10919
2020-09-30 13:19:49 -07:00
Kjeld Schouten-Lebbing 68cdafdbb8 Add intel_QAT patches
Add community compatibility patches for Intel QAT
Due to incompatibility with higher kernel versions.

Also includes basic instructions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10961 
Closes #10962
2020-09-30 13:17:30 -07:00
Brian Behlendorf 5aa3e3d3be Use known license string for zzstd
The Linux kernel MODULE_LICENSE macro only recognizes a handful of
license strings and "BSD" is not one of the them.  Update the macro
to use "Dual BSD/GPL" which is recognized and what the kernel expects
BSD licensed module to use.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10982
Closes #10992
2020-09-28 18:43:27 -07:00
Brian Behlendorf 4c5159cc5c Fix CONFIG_DEBUG_LOCK_ALLOC configure check
This check was accidentally broken when the kABI checks were updated
to run in parallel, commit 608f874.  The check must be for the
config_debug_lock_alloc_license name to determine if the symbol
is license compatible.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10991
2020-09-28 16:42:54 -07:00
Brian Behlendorf 96951e0327 Fix objtool configure check
The m4 objtool configure check can incorrectly fail because of a
missing header in the test.  This appears to be the result of a
recent kernel change and was observed on the Fedora 5.8.11-200
kernel.

  In file included from /home/fedora/zfs/build/objtool/objtool.c:75:
  ./arch/x86/include/asm/frame.h:100:57: error: 'struct pt_regs'
      declared inside parameter list will not be visible outside
      of this definition or declaration [-Werror]

The consequence of this is that the "stack_frame_non_standard"
check is never run and HAVE_STACK_FRAME_NON_STANDARD is set
incorrectly which results in a build failure.  This change adds
the appropriate header to the "objtool" check so it now behaves
as intended.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10990
2020-09-28 16:40:50 -07:00
grodik 53245fe337 Note that keys must be loaded for 'zpool remove'
The error returned by `zpool remove` when the encryption keys aren't
loaded isn't very helpful.  Furthermore, the man pages make no
mention that the keys need to be loaded. This change doesn't resolve
the error message but it does update the man page to mention this
requirement.

Authored-by: grodik <pat@litke.dev>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10939
Closes #10948
2020-09-28 16:11:57 -07:00
Kjeld Schouten-Lebbing 595b7e99d3 Document branching structure
This change documents the currently used branching structure.
It has been cut down to not include any controversial changes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10976
2020-09-28 13:23:49 -07:00
Matthew Macy af20b97078 zfetch: Don't issue new streams when old have not completed
The current dmu_zfetch code implicitly assumes that I/Os complete
within min_sec_reap seconds. With async dmu and a readonly workload
(and thus no exponential backoff in operations from the "write
throttle") such as L2ARC rebuild it is possible to saturate the drives
with I/O requests. These are then effectively compounded with prefetch
requests.

This change reference counts streams and prevents them from being
recycled after their min_sec_reap timeout if they still have
outstanding I/Os.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10900
2020-09-27 17:08:38 -07:00
Allan Jude cf2667759f zfs userspace: use zfs_path_to_zhandle so argument can be a path
Change zfs userspace subcommand to use zfs_path_to_zhandle() so that
the provided dataset can be a path (/usr) or a dataset (rpool/usr).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #8915
2020-09-25 14:37:10 -07:00
Adam D. Moss acfd2d4641 Add DB_RF_NOPREFETCH to dbuf_read()s in dnode.c
Prefetching of dnodes in dbuf_read() can cause significant mutex 
contention for some workloads and isn't very helpful.  This is  
because we already get 32 dnodes for each block read, and when 
iterating over a directory we prefetch the dnodes in the directory.
Disable this prefetching to prevent the lock contention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Submitted-by: Adam Moss <c@yotes.com>
Submitted-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Adam Moss <c@yotes.com>
Closes #10877 
Closes #10953
2020-09-25 13:49:22 -07:00
Brian Behlendorf 2e407941a2 Fix PREEMPTION=y and BLK_CGROUP=y config on arm64
With PREEMPTION=y and BLK_CGROUP=y preempt_schedule_notrace() is being
used on arm64 which is a GPL-only function and hence the build of the
DKMS kernel module fails.

Fix that by redefining preempt_schedule_notrace() to preempt_schedule()
which should be safe as long as tracing is not used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Closes #8545 
Closes #9948 
Closes #10416 
Closes #10973
2020-09-25 13:28:35 -07:00
Mateusz Guzik f6bb7c029c FreeBSD: update cache_purgevfs usage after 1300117 version bump
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Nick Wolff <darkfiberiru@gmail.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #10970
2020-09-25 13:23:43 -07:00
Ryan Moeller fa1912e80f FreeBSD: Code cleanup in zio_crypt
Address some unused value and control flow issues flagged by Coverity.

Unreachable code is pruned and unused values are avoided.
Some scattered sections are reordered for coherence.

We can assume kmem_alloc(n, KM_SLEEP) doesn't fail, so there is no need
to check if it returned NULL.  The allocated memory doesn't need to be
zeroed, other than the last iovec (the MAC).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10884
2020-09-25 13:12:35 -07:00
Ryan Moeller 863e38453e Prune dead branch reported by Coverity
wkey is NULL at every `goto error;`.
dcp is never NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10884
2020-09-25 13:11:53 -07:00
George Wilson bd76bcb36d zpool command complains about /etc/exports.d
If the /etc/exports.d directory does not exist, then we should only
create it when we're performing an action which already requires root
privileges.

This commit moves the directory creation to the enable/disable code
path which ensures that we have the appropriate privileges.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #10785
Closes #10934
2020-09-25 13:09:40 -07:00
Christian Schwarz a5c77dc4d5 zfs_log_write: simplify data copying code for WR_COPIED records
lr_write_t records that are WR_COPIED have the record data directly
appended to them (see lr_write_t type definition).

The data is copied from the debuf using dmu_read_by_dnode.

This function was called, only for WR_COPIED records, as part of a
short-circuiting if-statement's if-expression.

I found this side-effectful call to dmu_read_by_dnode pretty
hard to spot.
This patch improves readability by moving the call to its own line.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #10956
2020-09-25 13:06:34 -07:00
Matthew Macy 7b8363d7f0 FreeBSD: Add support for procfs_list
The procfs_list interface is required by several kstats. Implement
this functionality for FreeBSD to provide access to these kstats.
                           
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10890
2020-09-23 16:43:51 -07:00
Matthew Macy 3dad29fb4b FreeBSD: Don't save user FPU context in kernel threads
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10899
2020-09-23 11:09:48 -07:00
Kjeld Schouten-Lebbing eb267f08cf Update issue templates, commitcheck and Contributing.md
- Removes OpenZFS ports from commit check
- Removes OpenZFS ports from CONTRIBUTING.md
- Adds mailings lists and IRC to issue template selector
- Remove blank issue option from issue creator

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10965
2020-09-23 09:53:26 -07:00
Paul Dagnelie 20dfe8cd3b Don't set numobjs to UINT64_MAX or near it
Resolves an issue with `zfs send` streams from 0.8.4 which
prevents them from being received by versions < 0.7.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10911 
Closes #10916
2020-09-22 16:16:07 -07:00
наб e865e7809e contrib/initramfs: fix shellcheck and checkbashisms errors with shebang
Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #10908 
Closes #10917
2020-09-22 16:10:09 -07:00
George Amanakis c6f5e9d92f Restore clearing of L2CACHE flag in arc_read_done()
Commit 45152dc removed clearing of L2CACHE flag in arc_read_done() and
moved related code in l2arc_write_eligible(). After careful code
inspection arc_read_done() is not bypassed in the case of prefetches.
Thus restore the old behavior.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: adam moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10951
2020-09-22 16:08:05 -07:00
Mark Johnston 0daa0320e9 Fix a logic bug in the FreeBSD getpages VOP
In commit cd32b4f5b7 ("Fix a deadlock in the FreeBSD getpages VOP") I
introduced a bug while porting the patch originally committed to
FreeBSD: the rangelock pointer may be NULL if the try operation failed,
so we must avoid calling zfs_rangelock_unlock() in that case.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reported-by: Steve Wills <swills@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10519 
Closes #10960
2020-09-22 16:05:52 -07:00
Ryan Moeller 5f8a9e6a02 FreeBSD: Reduce stack usage of Lua
Use the same reduced buffer size for lauxlib that is used on Linux.

Fixes panic on HEAD in lua gsub test designed to exhaust stack space.

With this we can remove the special case to reserve more stack space
on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kyle Evans <kevans@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10959
2020-09-22 16:03:11 -07:00
Mark Johnston 6bdb09510b Annontate FreeBSD sysctls with CTLFLAG_MPSAFE
Without this, the sysctl system calls will acquire a global lock before
invoking the handler.  This is noticeable in some situations when
running top(1).  The global lock is mostly vestigal but continues to see
some use and so contention is still a problem; until the default sense
of the MPSAFE flag changes, we have to annotate each and every handler.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10836
2020-09-21 09:49:50 -07:00
Mark Johnston c50f3c902f Fix switch statement indentation in the FreeBSD kstat code
This is in preparation for some functional changes.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10950
2020-09-21 09:49:12 -07:00
George Amanakis 37b00fb039 Update documentation of l2arc_mfuonly
with regard to evicted_l2_eligibile_mru. Even if l2arc_mfuonly is
enabled, this is not reflected in evicted_l2_eligible_mru as this
information is useful for deciding whether to toggle l2arc_mfuonly
depending on the current workload.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10945
2020-09-21 09:26:24 -07:00
George Wilson c494aa7f57 vdev_ashift should only be set once
== Motivation and Context

The new vdev ashift optimization prevents the removal of devices when
a zfs configuration is comprised of disks which have different logical
and physical block sizes. This is caused because we set 'spa_min_ashift'
in vdev_open and then later call 'vdev_ashift_optimize'. This would
result in an inconsistency between spa's ashift calculations and that
of the top-level vdev.

In addition, the optimization logical ignores the overridden ashift
value that would be provided by '-o ashift=<val>'.

== Description

This change reworks the vdev ashift optimization so that it's only
set the first time the device is configured. It still allows the
physical and logical ahsift values to be set every time the device
is opened but those values are only consulted on first open.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-Issue: DLPX-71831
Closes #10932
2020-09-18 12:13:47 -07:00
Allan Jude 908d43d0a9 libzfs: Don't leak buf if nvlist is too large
Resolves FreeBSD Coverity defect:
CID 1432398:  Resource leaks  (RESOURCE_LEAK)

libzfs: don't leak hdl if there is an error reading env var

Resolves FreeBSD Coverity defect:
CID 1432395:  Resource leaks  (RESOURCE_LEAK)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #10882
2020-09-18 10:23:29 -07:00
George Wilson 8e82ffba7b pool may become suspended during device expansion
When expanding a device zfs needs to rescan the partition table to
get the correct size. This can only happen when we're in the kernel
and requires the device to be closed. As part of the rescan, udev is
notified and the device links are removed and recreated. This leave a
window where the vdev code may try to reopen the device before udev
has recreated the link. If that happens, then the pool may end up in
a suspended state.

To correct this, we leverage the BLKPG_RESIZE_PARTITION ioctl which
allows the partition information to be modified even while it's in use.
This ioctl also does not remove the device link associated with the zfs
data partition so it eliminates the race condition that can occur in
the kernel.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #10897
2020-09-17 20:03:10 -07:00
Matthew Ahrens a57f954226 zdb leak detection fails with in-progress device removal
When a device removal is in progress, there are 2 locations for the data
that's already been moved: the original location, on the device that's
being removed; and the new location, which is pointed to by the indirect
mapping.  When doing leak detection, zdb needs to know about both
locations.  To determine what's already been copied, we load the
spacemaps of the removing vdev, omit the blocks that are yet to be
copied, and then use the vdev's remap op to find the new location.

The problem is with an optimization to the spacemap-loading code in zdb.
When processing the log spacemaps, we ignore entries that are not
relevant because they are past the point that's been copied.  However,
entries which span the point that's been copied (i.e. they are partly
relevant and partly irrelevant) are processed normally.  This can lead
to an illegal spacemap operation, for example if offsets up to 100KB
have been copied, and the spacemap log has the following entries:

	ALLOC 50KB-150KB (partly relevant)
	FREE 50KB-100KB (entirely relevant)
	FREE 100KB-150KB (entirely irrlevant - ignored)
	ALLOC 50KB-150KB (partly relevant)

Because the entirely irrelevant entry was ignored, its space remains in
the spacemap.  When the last entry is processed, we attempt to add it to
the spacemap, but it partially overlaps with the 100-150KB entry that
was left over.

This problem was discovered by ztest/zloop.

One solution would be to also ignore the irrelevant parts of
partially-irrelevant entries (i.e. when processing the ALLOC 50-150, to
only add 50-100 to the spacemap).  However, this commit implements a
simpler solution, which is to remove this optimization entirely.  I.e.
to process the entire spacemap log, without regard for the point that's
been copied.  After reconstructing the entire allocatable range tree,
there's already code to remove the parts that have not yet been copied.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-71820
Closes #10920
2020-09-17 10:55:30 -07:00
Ryan Moeller 3c7566cb0d FreeBSD: Do not copy vp into f_data for DTYPE_VNODE files
https://reviews.freebsd.org/D26346

Do not copy vp into f_data for DTYPE_VNODE files.  The vnode pointer is
already stored in f_vnode.  Use that so f_data can be reused.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10929
2020-09-17 10:54:14 -07:00
John Poduska 5bed68bdc4 Need a long hold in zpl_mount_impl
In zpl_mount_impl, there is:
    dmu_objset_hold	; returns with pool & ds held
    dsl_pool_rele

    sget

    dsl_dataset_rele

As spelled out in the "DSL Pool Configuration Lock" in dsl_pool.c,
this requires a long hold.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #10936
2020-09-17 10:53:02 -07:00
Toomas Soome 741b20ce0c libzfsbootenv: lzbe_nvlist_set needs to store bootenv version VB_NVLIST
A small bug did slip into initial libzfsbootenv; while storing nvlist
in nvlist, we should make sure the bootenv is using VB_NVLIST format.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10937
2020-09-17 10:51:09 -07:00
Ryan Moeller 7ead2be3d2 Rename acltype=posixacl to acltype=posix
Prefer acltype=off|posix, retaining the old names as aliases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10918
2020-09-16 12:26:06 -07:00
Georgy Yakovlev 9cc177baa0 cmd/zgenhostid: replace with simple c implementation
It was discovered that dracut scripts and zgenhostid
always generate little-endian /etc/hostid.

This commit provides simple endianess-aware binary
and updates the scripts to use it.

New features include:
 -f flag to force overwrite.
 -o flag to write to different file (for dracut)
 accepting both 0x01234567 and 01234567 values as input

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #10887 
Closes #10925
2020-09-16 12:25:12 -07:00
Pavel Snajdr 9569c31161 Fix stack frame size: dnode_dirty_l1range()
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-15 15:55:55 -07:00
Pavel Snajdr a1c5578ce0 dmu_redact_snap: fix possible memleak
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-15 15:55:45 -07:00
Pavel Snajdr 8c0b16e6e9 Fix stack frame size: dmu_redact_snap()
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-15 15:55:35 -07:00
Pavel Snajdr c95625769d Fix stack frame size: spa_livelist_delete_cb()
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10879
2020-09-15 15:55:03 -07:00
наб 5797d7cc4c zpoolprops.8: fix raidz par[i]ty typo
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #10923
2020-09-15 15:43:42 -07:00
Toomas Soome 1db9e6e4e4 zfs label bootenv should store data as nvlist
nvlist does allow us to support different data types and systems.

To encapsulate user data to/from nvlist, the libzfsbootenv library is
provided.

Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10774
2020-09-15 15:42:27 -07:00
Ryan Moeller 37325e4749 Linux: Prevent destruction while showing mount devname
Use ZFS_ENTER and ZFS_EXIT to protect datasets while their mount
devname is being retrieved.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10892
Closes #10927
2020-09-15 15:40:03 -07:00
George Amanakis 085321621e Add L2ARC arcstats for MFU/MRU buffers and buffer content type
Currently the ARC state (MFU/MRU) of cached L2ARC buffer and their
content type is unknown. Knowing this information may prove beneficial
in adjusting the L2ARC caching policy.

This commit adds L2ARC arcstats that display the aligned size
(in bytes) of L2ARC buffers according to their content type
(data/metadata) and according to their ARC state (MRU/MFU or
prefetch). It also expands the existing evict_l2_eligible arcstat to
differentiate between MFU and MRU buffers.

L2ARC caches buffers from the MRU and MFU lists of ARC. Upon caching a
buffer, its ARC state (MRU/MFU) is stored in the L2 header
(b_arcs_state). The l2_m{f,r}u_asize arcstats reflect the aligned size
(in bytes) of L2ARC buffers according to their ARC state (based on
b_arcs_state). We also account for the case where an L2ARC and ARC
cached MRU or MRU_ghost buffer transitions to MFU. The l2_prefetch_asize
reflects the alinged size (in bytes) of L2ARC buffers that were cached
while they had the prefetch flag set in ARC. This is dynamically updated
as the prefetch flag of L2ARC buffers changes.

When buffers are evicted from ARC, if they are determined to be L2ARC
eligible then their logical size is recorded in
evict_l2_eligible_m{r,f}u arcstats according to their ARC state upon
eviction.

Persistent L2ARC:
When committing an L2ARC buffer to a log block (L2ARC metadata) its
b_arcs_state and prefetch flag is also stored. If the buffer changes
its arcstate or prefetch flag this is reflected in the above arcstats.
However, the L2ARC metadata cannot currently be updated to reflect this
change.
Example: L2ARC caches an MRU buffer. L2ARC metadata and arcstats count
this as an MRU buffer. The buffer transitions to MFU. The arcstats are
updated to reflect this. Upon pool re-import or on/offlining the L2ARC
device the arcstats are cleared and the buffer will now be counted as an
MRU buffer, as the L2ARC metadata were not updated.

Bug fix:
- If l2arc_noprefetch is set, arc_read_done clears the L2CACHE flag of
  an ARC buffer. However, prefetches may be issued in a way that
  arc_read_done() is bypassed. Instead, move the related code in
  l2arc_write_eligible() to account for those cases too.

Also add a test and update manpages for l2arc_mfuonly module parameter,
and update the manpages and code comments for l2arc_noprefetch.
Move persist_l2arc tests to l2arc.

Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10743
2020-09-14 10:10:44 -07:00
Harald van Dijk d0cea309e7 config/zfs-build.m4: never define _initramfs in RPM_DEFINE_UTIL
The zfs-initramfs package has never worked as no RPM-based distribution
uses initramfs-tools, which is listed as a dependency of zfs-initramfs.

This would not ordinarily be a problem, as it is only enabled when
/usr/share/initramfs-tools is present, which should not normally be the
case on RPM-based distributions. However, other packages may install
unused files there even if initramfs-tools is not used, so remove this
auto-detection for the rpm-utils target.

This does not fully remove the logic for the zfs-initramfs package. This
splits it out into a separate rpm-utils-initramfs target so that the
Debian builds can still use it.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Harald van Dijk <harald@gigawatt.nl>
Closes #10898
2020-09-12 08:22:07 -07:00
Matthew Ahrens 9b77c57d5a libzutil depends on libnvpair
libzutil depends on libnvpair, but this dependency is undeclared in the
build system.  Therefore it isn't possible to make a new command that
depends on libzutil, but does not (directly) depend on libnvpair.

This commit makes this dependency explicit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reivewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10915
2020-09-12 08:19:48 -07:00
Mateusz Guzik 8e7fe49b25 FreeBSD: convert teardown inactive lock to a read-mostly sleepable lock
The lock is taken all the time and as a regular read-write lock
avoidably serves as a mount point-wide contention point.

This forward ports FreeBSD revision r357322.

To quote aforementioned commit:

Sample result doing an incremental -j 40 build:
before: 173.30s user 458.97s system 2595% cpu 24.358 total
after:  168.58s user 254.92s system 2211% cpu 19.147 total

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #10896
2020-09-09 10:15:52 -07:00
xdch47 c2c7ca0d6d Force the use of '.' as decimal separator.
This solves issues occurring with a different decimal operator and
keeps the command line interface consistent for all locales .
E.g. `zfs set quota=0.5T`

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Felix Neumärker <xdch47@posteo.de>
Closes #10878
2020-09-09 10:14:04 -07:00
Olaf Faaland a74259cea0 Initialize mmp_last_write when the mmp thread starts
A great deal of time may go by between when mmp_init() is called and
the MMP thread starts, particularly if there are bad devices, because
there is I/O checking configs etc.  If this time is too long,

    (gethrtime() - mmp_last_write) > mmp_fail_ns

at the time the MMP thread starts.  If MMP is configured to suspend
the pool, the pool will be suspended immediately.

This can be seen in issue #10838

The value of mmp_last_write doesn't matter before the mmp thread
starts.  To give the MMP thread time to issue and land MMP writes,
initialize mmp_last_write when the MMP thread starts.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #10873
2020-09-09 10:12:54 -07:00
Ryan Moeller cba6d86638 FreeBSD: drop dependency on cryptodev module
We only need the kernel interfaces in crypto, not the device node in
cryptodev.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10901
2020-09-09 10:10:32 -07:00
George Amanakis feb3a7eef1 Introduce ZFS module parameter l2arc_mfuonly
In certain workloads it may be beneficial to reduce wear of L2ARC
devices by not caching MRU metadata and data into L2ARC. This commit
introduces a new tunable l2arc_mfuonly for this purpose.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10710
2020-09-08 11:44:37 -07:00
Ryan Moeller ebc4b52369 Avoid possibility of division by zero
When hz > 1000, msec / (1000 / hz) results in division by zero.

I found somewhere in FreeBSD using howmany(msec * hz, 1000) to convert
ms to ticks, avoiding the potential for a zero in the divisor.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10894
2020-09-08 11:39:16 -07:00
Toomas Soome 189272f78a dnode_special_open() error: unchecked function return 'zrl_tryenter'
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10876
2020-09-08 11:36:52 -07:00
Peter Dave Hello 5a877a894e Add a missing option prefix - in zfs-tests.sh usage()
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Peter Dave Hello <hsu@peterdavehello.org>
Closes #10893
2020-09-08 09:04:36 -07:00
Fabio Buso 5266cf4826 Display pbkdf2iters property as plain number
The pbkdf2iters property is an iteration counter
and should be displayed as plain number rather
than in binary unit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fabio Buso <buso.fabio@gmail.com>
Closes #10871
2020-09-08 08:49:55 -07:00
alaviss 75bf636cd0 libshare: Add missing headers for nfs.c
On musl libc, zfs failed to compile due to the missing <fcntl.h>
include, which is required for `open()` per POSIX.

This commit add the missing <fcntl.h> include.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes #10880
2020-09-04 12:03:57 -07:00
Matthew Macy 7432d29760 FreeBSD: reduce priority of ZIO_TASKQ_ISSUE writes by a larger value
On FreeBSD, if priorities divided by four (RQ_PPQ) are equal then
a difference between them is insignificant. In other words,
incrementing pri by only one as on Linux is insufficient.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10872
2020-09-04 11:13:27 -07:00
Ryan Moeller ef55446a9c Spruce up pkg-config files for libzfs/libzfs_core
Several of the listed library dependencies are not relevant on FreeBSD.
Have ./configure save libraries that are found via pkg-config as
${LIB}_PC and use the configured automake variables instead of hard
coded names so we only get what was actually needed.

While here, update the URL to point at the OpenZFS Github repo.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10869
2020-09-04 11:11:18 -07:00
Ryan Moeller 3ec88ab205 man: Cross-reference zfs-load-key(8) for ENCRYPTION mention
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Submitted-by: Harry Schmalzbauer
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10866
2020-09-04 10:53:59 -07:00
Ryan Moeller 782b1c1249 man: Add zfs rename -r to zfs-rename(8) SYNOPSIS
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10866
2020-09-04 10:53:22 -07:00
Brian Behlendorf dce63135ae Sequential scrub and resilver updated comments
Commit d4a72f2 which introduced multi-phase scrubs and resilvers
continued the work presented by Nexenta at the 2016 ZFS developer
summit.  Update the source to reflect their contribution.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-09-04 10:51:51 -07:00
Don Brady 4f07282786 Avoid posting duplicate zpool events
Duplicate io and checksum ereport events can misrepresent that 
things are worse than they seem. Ideally the zpool events and the 
corresponding vdev stat error counts in a zpool status should be 
for unique errors -- not the same error being counted over and over. 
This can be demonstrated in a simple example. With a single bad 
block in a datafile and just 5 reads of the file we end up with a 
degraded vdev, even though there is only one unique error in the pool.

The proposed solution to the above issue, is to eliminate duplicates 
when posting events and when updating vdev error stats. We now save 
recent error events of interest when posting events so that we can 
easily check for duplicates when posting an error. 

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #10861
2020-09-04 10:34:28 -07:00
Matthew Ahrens 3808032489 nowait synctask must succeed
If a `zfs_space_check_t` other than `ZFS_SPACE_CHECK_NONE` is used with
`dsl_sync_task_nowait()`, the sync task may fail due to ENOSPC.
However, there is no way to notice or communicate this failure, so it's
extremely difficult to use this functionality correctly, and in fact
almost all callers use `ZFS_SPACE_CHECK_NONE`.

This commit removes the `zfs_space_check_t` argument from
`dsl_sync_task_nowait()`, and always uses `ZFS_SPACE_CHECK_NONE`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10855
2020-09-04 10:29:39 -07:00
Ryan Moeller cd80273909 Retain thread name when resuming a zthr
When created, a zthr is given a name to identify it by.  This name is
lost when a cancelled zthr is resumed.

Retain the name of a zthr so it can be used when resuming.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10881
2020-09-03 20:09:52 -07:00
Alexander Richardson f3064162ba Fixes for running FreeBSD buildworld on Linux/macOS hosts
Adding an #ifdef __FreeBSD__ to a FreeBSD-specific header may seem odd,
but these headers are used on non-FreeBSD systems during the bootstrap
tools phase.
Originally submitted downstream as https://reviews.freebsd.org/D26193

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Closes #10863
2020-09-03 20:06:03 -07:00
Matthew Macy ac6e5fb202 Replace cv_{timed}wait_sig with cv_{timed}wait_idle where appropriate
There are a number of places where cv_?_sig is used simply for
accounting purposes but the surrounding code has no ability to
cope with actually receiving a signal. On FreeBSD it is possible
to send signals to individual kernel threads so this could
enable undesirable behavior.

This patch adds routines on Linux that will do the same idle
accounting as _sig without making the task interruptible. On
FreeBSD cv_*_idle  are all aliases for cv_*

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10843
2020-09-03 20:04:09 -07:00
Garrett Fields f30703f6fc Remove 'ZFS on Linux' references from PR Template
As mentioned in the #OpenZFS IRC channel (thanks "Toomas Soome"):
The OpenZFS PR Template still mentions "ZFS on Linux".
This changes that reference and updates the URLs.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Garrett Fields <ghfields@gmail.com>
Closes #10868
2020-09-03 16:31:05 -07:00
Spencer Kinny 6fffc88bf3 Links in Source Files
Added comments in following files
with links to Illumos manual pages:

./module/avl/avl.c
./module/nvpair/nvpair.c
./module/os/linux/spl/spl-kstat.c
./module/os/freebsd/spl/spl_kstat.c

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Spencer Kinny <spencerkinny1995@gmail.com>
Closes #5113 
Closes #10859
2020-09-02 09:42:12 -07:00
Toomas Soome 0db1b84a6d zvol: unsigned off can not be less than zero
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10867
2020-09-02 09:30:29 -07:00
Alexander Richardson 417e646722 Fix -Werror,-Wmacro-redefined in limits.h
Those macros are also defined by the compiler-provided float.h which
will be included later on (at least in the FreeBSD buildworld case) and
triggers these -Werror warnings. Including <float.h> first and only
defining the macros when DBL_DIG/FLT_DIG is missing fixes this problem.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Closes #10864
2020-09-01 16:22:09 -07:00
Ryan Moeller 964791acdc Make spa_stats.c tunables visible on FreeBSD
Use ZFS_MODULE_PARAM for cross-platform tunables in spa_stats.c, and
add update tunables.cfg in tests for the newly supported ones.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10858
2020-09-01 16:19:19 -07:00
Matthew Macy e84e49218f FreeBSD: Fix up after spa_stats.c move
Moving spa_stats added the additional burden of supporting
KSTAT_TYPE_IO.

spa_state_addr will always return a valid value regardless of
the value of 'n'. On FreeBSD this will cause an infinite loop
as it relies on the raw ops addr routine to indicate that there
is no more data.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10860
2020-09-01 16:16:56 -07:00
Ryan Moeller 7b4e27232d Add 'zfs rename -u' to rename without remounting
Allow to rename file systems without remounting if it is possible.
It is possible for file systems with 'mountpoint' property set to
'legacy' or 'none' - we don't have to change mount directory for them.
Currently such file systems are unmounted on rename and not even
mounted back.

This introduces layering violation, as we need to update
'f_mntfromname' field in statfs structure related to mountpoint (for
the dataset we are renaming and all its children).

In my opinion it is worth it, as it allow to update FreeBSD in even
cleaner way - in ZFS-only configuration root file system is ZFS file
system with 'mountpoint' property set to 'legacy'. If root dataset is
named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone
it (system/oldrootfs), update FreeBSD and if it doesn't boot we can
boot back from system/oldrootfs and rename it back to system/rootfs
while it is mounted as /. Before it was not possible, because
unmounting / was not possible.

Authored by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported by: Matt Macy <mmacy@freebsd.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10839
2020-09-01 16:14:16 -07:00
Ryan Moeller 88d19d7cc2 FreeBSD: Remove unused SECLABEL code
SECLABEL is undefined on FreeBSD and should be pruned.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10847
2020-08-31 19:52:46 -07:00
Ryan Moeller 46b7d53baf libspl: Provide platform-specific zone implementations
FreeBSD has the concept of jails, a precursor to Solaris's zones, which
can be mapped to the required zones interface with relative ease.  The
previous ZFS implementation in FreeBSD did so, and we should continue
to provide an appropriate implementation in OpenZFS as well.

Move lib/libspl/zone.c into platform code and adopt the correct
implementation for FreeBSD.

While here, prune unused code.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10851
2020-08-31 19:43:30 -07:00
Ryan Moeller eff621071f FreeBSD: Simplify INGLOBALZONE
FreeBSD's previous ZFS implemented INGLOBALZONE(thread) as
(!jailed((thread)->td_ucred)) and passed curthread to INGLOBALZONE.

We pass curproc instead of curthread, so we can achieve the same effect
with (!jailed((proc)->p_ucred)).  The implementation is trivial enough
to fit on a single line in a define.  We don't really need a whole
separate function for something that's already macros all the way down.

Eliminate in_globalzone.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10851
2020-08-31 19:43:08 -07:00
Ryan Moeller 2f65c7a608 FreeBSD: Define crgetzoneid appropriately
The previous ZFS implementation on FreeBSD had ifdefs to use jailed()
instead of crgetzoneid() in dsl_dir.c, however we can simply provide an
appropriate definition of crgetzoneid for the same effect.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10851
2020-08-31 19:42:26 -07:00
Toomas Soome 1144586b57 zio_ereport_post() and zio_ereport_start() return values are ignored
use (void) to silence analyzers.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10857
2020-08-31 19:35:11 -07:00
Spencer Kinny abe4fbfd01 Typo Correction
Corrected the typo in zfs/cmd/zfs/zfs_main.c
line number 404 pbkfd2iters to pbkdf2iters

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Spencer Kinny <spencerkinny1995@gmail.com>
Closes #10850
2020-08-30 14:14:32 -07:00
Matthew Macy 7bb18b94c7 Move spa_stats.c to common code
Initially it was considered simplest to stub out all
of the functions on FreeBSD. Now that FreeBSD supports
KSTAT_TYPE_RAW at least some of the functionality should
be made available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10842
2020-08-30 14:12:46 -07:00
Matthew Macy cd23154a8c FreeBSD: Fix spurious failure in zvol_geom_open
In zvol_geom_open on first open we need to guarantee
that the namespace lock is held to avoid spurious
failures in zvol_first_open.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10841
2020-08-30 14:11:33 -07:00
Kjeld Schouten-Lebbing 0e6a7aaac0 Auto close "Status: Feedback requested" after a month
This commit closes issues labeled with:
"Status: Feedback requested" after 1 month, if the
label is not removed or the author has not responded

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10807 
Closes #10808
2020-08-30 14:09:54 -07:00
Matthew Macy d436de6389 FreeBSD: add support for KSTAT_TYPE_RAW
A few kstats use KSTAT_TYPE_RAW to provide a string generated on
demand.  Implementing these as sysctls was punted until now.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10836
2020-08-29 20:59:50 -07:00
Brian Behlendorf 3e29e1971b Linux 5.9 compat: NR_SLAB_RECLAIMABLE
Commit dcdc12e added compatibility code to treat NR_SLAB_RECLAIMABLE_B
as if it were the same as NR_SLAB_RECLAIMABLE.  However, the new value
is in bytes while the old value was in pages which means they are not
interchangeable.

The only place the reclaimable slab size is used is as a component of
the calculation done by arc_free_memory().  This function returns the
amount of memory the ARC considers to be free or reclaimable at little
cost.  Rather than switch to a new interface to get this value it has
been removed it from the calculation.  It is normally a minor component
compared to the number of inactive or free pages, and removing it
aligns the behavior with the FreeBSD version of arc_free_memory().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10834
2020-08-29 20:57:45 -07:00
Richard Laager 62663fb7ec Fix another dependency loop
zfs-load-key-DATASET.service was gaining an
After=systemd-journald.socket due to its stdout/stderr going to the
journal (which is the default).  systemd-journald.socket has an After
(via RequiresMountsFor=/run/systemd/journal) on -.mount.  If the root
filesystem is encrypted, -.mount gets an After
zfs-load-key-DATASET.service.

By setting stdout and stderr to null on the key load services, we avoid
this loop.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10356
Closes #10388
2020-08-28 10:17:14 -07:00
Richard Laager ec41cafee1 Fix a dependency loop
When generating units with zfs-mount-generator, if the pool is already
imported, zfs-import.target is not needed.  This avoids a dependency
loop on root-on-ZFS systems:
  systemd-random-seed.service After (via RequiresMountsFor)
  var-lib.mount After
  zfs-import.target After
  zfs-import-{cache,scan}.service After
  cryptsetup.service After
  systemd-random-seed.service

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10388
2020-08-28 10:16:13 -07:00
Georgy Yakovlev 7ddcfe7c00 config/zfs-build.m4: add --with-vendor flag
This will allow an override of auto-detection of distribution, which
is based on checking presence of /etc/*-release files.

Build systems makes a lot of file location assumptions based on
detected distribution.

Some distributions (like gentoo) may prefer explicitly
setting --with-vendor=gentoo to avoid auto-detection.

Since auto-detection checks all files in order, current script may
misdetect even on gentoo system if /etc/redhat-release file is present

Default behavior is unchanged and default is --with-vendor=check

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #10835
2020-08-28 09:43:44 -07:00
Alexander Richardson 2b07c5aa3e Fix definition of BLKGETSIZE64 on FreeBSD
The matching ioctl is DIOCGMEDIASIZE.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Closes #10818
2020-08-27 16:09:26 -07:00
Georgy Yakovlev 735ba76104 module/zstd: pass -U__BMI__
If kernel is compiled with -march=znver1 or -march=znver2 zstd module 
compilation will fail due to SSE register return with SSE disabled.
What's interesting, is that -march=skylake also implies -mbmi which 
defines __BMI__ but compilation succeeds.  It is probably due to 
different BMI implementations on AMD and INTEL processors and the 
way compiler uses instructions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes #10758 
Closes #10829
2020-08-27 15:50:13 -07:00
John-Mark Gurney 770269ef3a Add the Xr's to the SEE ALSO as well
There are a ton of zfs-* and zpool-* man pages. This adds them to 
the SEE ALSO section so that people can more quickly look through 
what all the options are, now that the pages have been split.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: John-Mark Gurney <jmg@funkthat.com>
Closes #10589
2020-08-26 22:29:00 -07:00
Patrick Mooney 8d42c98d95 dnode_sync is careless with range tree
Because dnode_sync_free_range() must drop dn_mtx during its processing,
using it as a callback to range_tree_vacate() is not safe.  No other
operations (besides destroy) are allowed once range_tree_vacate() has
begun, and dropping dn_mtx would leave a window open for another thread
to observe that invalid (and unsafe) state via dnode_block_freed().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Patrick Mooney <pmooney@oxide.computer>
Closes #10708 
Closes #10823
2020-08-26 21:48:29 -07:00
Cédric Berger 600def792e Fix NEWS file
Points to https://github.com/openzfs/zfs/releases

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Cédric Berger <cedric@precidata.com>
Closes #10824
2020-08-26 21:44:41 -07:00
Ryan Moeller a2f944a140 zpool: Change base URL for ZFS messages to openzfs-docs
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10820
2020-08-26 21:43:06 -07:00
Brian Behlendorf 03f5d2fd6a Remove duplicate dnode.h include
The zfs/sa.c source file accidentally includes sys/dnode.h twice.
Remove the second occurrence.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10816 
Closes #10819
2020-08-26 21:41:09 -07:00
Paul Dagnelie 4aa3b3bd47 Always track temporary fses and snapshots for accounting
The root cause of the issue is that we only occasionally do as the 
comments in the code suggest and actually ignore the %recv dataset when 
it comes to filesystem limit tracking. Specifically, the only time we 
ignore it is when initializing the filesystem and snapshot limit values; 
when creating a new %recv dataset or deleting one, we always update 
the bookkeeping. This causes a problem if you init the fs count on a 
filesystem that already has a %recv dataset, since the bookmarking 
will be decremented but not incremented. This is resolved in this 
patch by simply always tracking the %recv dataset as a child.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10791
2020-08-26 21:38:27 -07:00
Kjeld Schouten-Lebbing ad52de77c2 Fix broken bug report form
By accident previous PR broke the bug report form.
This commit fixes it
(and is actually tested completely to work)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10821
2020-08-26 10:49:51 -07:00
Toomas Soome ec0c480c14 Remove pragma ident lines
The #pragma ident is a historical relic and not needed any more, this
pragma is actually unknown for common compilers and is only causing
trouble.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes #10810
2020-08-26 10:35:50 -07:00
Matthew Macy 2dbad44710 FreeBSD: disable neon usage
The neon support code does not build on FreeBSD,
ifdef out references to fix linker issues on arm64.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10809
2020-08-26 09:54:37 -07:00
George Melikov d6f90c78ab Github CI: Enable checkbashism
Run checkbashisms on checkstyle too.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10811
2020-08-26 09:52:28 -07:00
Kjeld Schouten-Lebbing 59ad15d3c5 StaleBot Tweaks
- Add Status: Triage Needed to bug reports

Currently "Type: Defect" is auto added.
Adding a triage tag, makes sure all issues are reviewed by a maintainer
It also opens up some options to priorities defects in the near future.

- Prevent future StaleBot Spam

StaleBot will limit itself to 6 actions per hour
This should prevent future floods of StaleBot activity
(aka Spam)

- StaleBot: Ignore issues that are being worked on

Ignore the following Issues:
- tagged: "Status: Work in Progress"
- Having a maintainer assigned
- Being part of a project
- Having a milestone tag

- Rename Ignore "Type: Understood" to "Bot: Not Stale"

This Commits changes the general ignore tag for StaleBot from:
 "Type: Understood"
to
"Bot: Not Stale"

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10813
2020-08-26 09:49:58 -07:00
Alexander Motin 523e1295fe Introduce limit on size of L2ARC headers
Since L2ARC buffers are not evicted on memory pressure, too large
amount of headers on system with irrationally large L2ARC can render
it slow or even unusable.  This change limits L2ARC writes and
rebuild if unevictable L2ARC-only headers reach dangerous level.

While there, call arc_adapt() on L2ARC rebuild, so that it could
properly grow arc_c, reflecting potentially significant ARC size
increase and avoiding slow growth with hopeless eviction attempts
later when "overflow" is detected.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #10765
2020-08-25 14:33:36 -07:00
Brian Behlendorf 47a3f3fc01 Tag 2.0.0-rc1
New features:
- Unified code base for Linux and FreeBSD
- Redacted 'zfs send/recv'
- Persistent L2ARC
- Sequential resilvering
- ZSTD Compression
- Log spacemaps
- Fast clone deletion
- Sectional zfs/zpool man pages
- Added 'zpool wait' subcommand
- Improved 'zfs share' scalability
- Improved AES-GCM encryption performance

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-08-25 13:11:25 -07:00
Allan Jude 7a6c12fd6a Don't assert on nvlists larger than SPA_MAXBLOCKSIZE
Originally we asserted that all reads are less than SPA_MAXBLOCKSIZE
However, nvlists are not ZFS records, and are not limited to
SPA_MAXBLOCKSIZE.

Add a new environment variable, ZFS_SENDRECV_MAX_NVLIST, to allow the
user to specify the maximum size of the nvlist that can be sent or
received.
Default value: 4 * SPA_MAXBLOCKSIZE (64 MB)

Modify libzfs send routines to return a useful error if the send stream
will generate an nvlist that is beyond the maximum size.

Modify libzfs recv routines to add an explicit error message if the
nvlist is too large, rather than abort()ing.

Move the change the assert() to only trigger on data records

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #9616
2020-08-25 11:04:20 -07:00
sterlingjensen 87688b686b Mark lua setjmp/longjmp for powerpc weak
Linux already defines setjmp/longjmp for powerpc, which leads to
duplicate symbols in a statically linked build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterlng Jensen <sterlingjensen@users.noreply.github.com>
Closes #10795
2020-08-25 10:32:49 -07:00
Brian Behlendorf 94dac3e880 Export dmu_offset_next() symbol
Export the dmu_offset_next() symbol for use by Lustre.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10796
2020-08-25 08:34:41 -07:00
Ryan Moeller b596585fd9 man: Canonicalize .TH usage
* Use all caps for document title.
* Remove section name as it can be inferred from the section number.
* Name "OpenZFS" as the document source.
* Bump modification date.

While here, fixed trailing whitespace reported by igor.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10792
2020-08-24 21:25:28 -07:00
youzhongyang b900799768 Fix inability to destroy snapshot used over NFS
The cache of struct svc_export and struct svc_expkey by nfsd and
rpc.mountd for the snapshot holds references to the mount point.
We need to flush them out before unmounting, otherwise umount
would fail with EBUSY.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #6000 
Closes #10783
2020-08-24 17:33:02 -07:00
Sebastian Gottschall 184df27eef Avoid symbol collision with in-kernel zstdlib
For Linux, when zfs is compiled as an in kernel static variant
and the in kernel zstd library is compiled statically into the kernel
a symbol collision will occur.  This wrapper header renames all
of the relevant zstd functions to avoid this problem.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes #10775
2020-08-24 12:20:41 -07:00
Kjeld Schouten-Lebbing 04c37b6851 Add Stale-bot
This file configures the following stale-bot:
https://github.com/apps/stale

It is set to mark issues as "Stale" after 365 days
It is also set to auto-close the issue 90 days after.

Please be aware that this issue also requires-
The listed stale-bot being added to the repo.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10778
2020-08-24 12:04:38 -07:00
Chris McDonough c3b03d0701 Appease GCC sprintf warnings found on Fedora 32/GCC 10.0.1
Increase the size of DDT_NAMELEN and MNT_LINE_MAX to appease GCC
snprintf truncation warnings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris McDonough <chrism@plope.com>
Closes #10712
Closes #10766
2020-08-24 10:32:59 -07:00
Ryan Moeller b0e75805ee ZTS: Improve block_device_wait on FreeBSD
FreeBSD doesn't have an equivalent to udevadm settle, so we have been
resorting to a three second sleep to wait for device changes to take
effect.  This is far from ideal.

We are mainly waiting for volmode=geom zvols to appear in /dev, so as
a hack, reading the geom config will have the desired effect of
quiescing the geom state.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10768
2020-08-24 08:50:15 -07:00
Chris McDonough 07ce8961e5 Improve documentation of zpool import -d/-c vs -s
Specify that, by default, zpool import uses the libblkid
cache on Linux and geom on FreeBSD, and only scans when
-d/-s is provided.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Chris McDonough <chrism@plope.com>
Closes #7656
Closes #10771
2020-08-23 21:18:30 -07:00
George Melikov a541f7d485 CI checkstyle: add linter + rename job + install latest flake8
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10784
2020-08-23 21:15:25 -07:00
Tony Nguyen c686c6fe75 ZFS performance tests should clean up NFS mount
This change umounts client's NFS mount after each test so we can avoid
two sporadic issues:
1) client NFS stale mount and
2) zpool export and zpool destroy failed due to dataset busy

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #10767
2020-08-23 15:14:22 -07:00
Kjeld Schouten-Lebbing 68f2288620 Add seperate issue for questions
A big portion of issues are of "Type: Question".

This PR adds a separate issue template for those.
It also automatically adds the "Type: Question" tag.

in addition it adds "Type: Defect" to all bug reports by default

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10779
2020-08-23 15:13:07 -07:00
Ryan Moeller 7b15b8d18c libzstd: Don't warn about stack frame size in userspace
With the current way CFLAGS are modified in libzstd, CFLAGS passed on
the make command line will cause the CFLAGS in the Makefile for zstd.c
to be discarded, but not AM_CFLAGS.  This causes a smaller frame size
limit to be used, and the build fails.

We don't need to worry about stack frame sizes in userspace.  Drop the
extra flags.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10773
2020-08-23 11:13:34 -07:00
Andrew a741b386d3 Prevent zfs_acl_chmod() if aclmode restricted and ACL inherited
In absence of inheriting entry for owner@, group@, or everyone@,
zfs_acl_chmod() is called to set these. This can cause confusion for Samba
admins who do not expect these entries to appear on newly created files and
directories once they have been stripped from from the parent directory.

When aclmode is set to "restricted", chmod is prevented on non-trivial ACLs.
It is not a stretch to assume that in this case the administrator does not want
ZFS to add the missing special entries. Add check for this aclmode, and if an
inherited entry is present skip zfs_acl_chmod().

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes #10748
2020-08-22 21:49:25 -07:00
Kjeld Schouten-Lebbing ab4a78c744 Update issue template
Github has started using a new issue templating structure.
This commit moves the current template and adds one additional one.

- Moves issue template to new issue-template folder
- Adds feature request template
- removes the following warning when viewing issue template

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #10759
2020-08-22 21:41:01 -07:00
Ryan Moeller 72eedb69fd ZTS: Remove leftover variable names
These were overlooked when use of `local` was removed to satisfy
checkbashisms.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10762
2020-08-22 11:05:59 -07:00
Chris McDonough a5b1b60e9b Remove vestigial settings related to initramfs
Remove ZFS_POOL_IMPORT, ZFS_INITRD_PRE_MOUNTROOT_SLEEP,
ZFS_INITRD_POST_MODPROBE_SLEEP, and ZFS_INITRD_ADDITIONAL_DATASETS
features from etc/defaults/zfs.in.  These features no longer work.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Chris McDonough <chrism@plope.com>
Closes #9126
Closes #10757
2020-08-22 11:04:49 -07:00
Clint Armstrong 1ddd7cdb92 Make formatting of dedup values string consistent
All other prop values return options separated by ` | `,
dedup values do not, they are separated by `, `. This change
makes the dedup value formatting consistent with other properties.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Clint Armstrong <clint@clintarmstrong.net>
Closes #10761
2020-08-22 10:58:07 -07:00
Ryan Moeller 6fe3498ca3 Import vdev ashift optimization from FreeBSD
Many modern devices use physical allocation units that are much
larger than the minimum logical allocation size accessible by
external commands. Two prevalent examples of this are 512e disk
drives (512b logical sector, 4K physical sector) and flash devices
(512b logical sector, 4K or larger allocation block size, and 128k
or larger erase block size). Operations that modify less than the
physical sector size result in a costly read-modify-write or garbage
collection sequence on these devices.

Simply exporting the true physical sector of the device to ZFS would
yield optimal performance, but has two serious drawbacks:

 1. Existing pools created with devices that have different logical
    and physical block sizes, but were configured to use the logical
    block size (e.g. because the OS version used for pool construction
    reported the logical block size instead of the physical block
    size) will suddenly find that the vdev allocation size has
    increased. This can be easily tolerated for active members of
    the array, but ZFS would prevent replacement of a vdev with
    another identical device because it now appears that the smaller
    allocation size required by the pool is not supported by the new
    device.

 2. The device's physical block size may be too large to be supported
    by ZFS. The optimal allocation size for the vdev may be quite
    large. For example, a RAID controller may export a vdev that
    requires read-modify-write cycles unless accessed using 64k
    aligned/sized requests. ZFS currently has an 8k minimum block
    size limit.

Reporting both the logical and physical allocation sizes for vdevs
solves these problems. A device may be used so long as the logical
block size is compatible with the configuration. By comparing the
logical and physical block sizes, new configurations can be optimized
and administrators can be notified of any existing pools that are
sub-optimal.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Matthew Macy <mmacy@freebsd.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10619
2020-08-21 12:53:17 -07:00
Ryan Moeller 6706552ea6 Remove hard coded "Linux" OS from manpages
The recommended practice for `.Os` on FreeBSD is to not specify any
arguments.  The correct OS name is used automatically.

Oddly enough, on the Linux distro I tested this on (CentOS 7), the man
pager defaulted to displaying "BSD" as the OS rather than "Linux".  To
accommodate this, tack " Linux" back on in an install hook on Linux.
This is much simpler than removing it for FreeBSD when vendored in the
base system.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10760
2020-08-21 11:55:47 -07:00
Brian Behlendorf 64025fa3a1 Silence 'make checkbashisms'
Commit d2bce6d03 added the 'make checkbashisms' target but did not
resolve all of the bashisms in the scripts.  This commit doesn't
resolve them all either but it does fix up a few, and it excludes
the others so 'make checkstyle' no longer prints warnings.  It's
a small step in the right direction.

* Dracut is Linux specific and itself depends on bash.  Therefore
  all dracut support scripts can be bash specific, update their
  shebang accordingly.

* zed-functions.sh, zfs-import, zfs-mount, zfs-zed, smart
  paxcheck.sh, make_gitrev.sh - these scripts were excuded from
  the check until they can be updated and properly tested.

* zfsunlock - only whole values for sleep are allowed.

* vdev_id - removed unneeded locals; use && instead of -a.

* dkms.mkconf, dkms.postbuil - use || instead of -o.

Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Reviewed-by:  Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10755
2020-08-20 13:45:47 -07:00
Don Brady 7bba1d404c 'zfs share -a' should clean noauto exports
This is a follow on to PR #10688 where `zfs share -a` allows the 
sharing of canmount=noauto datasets if they are mounted.  However, 
when a dataset with canmount=noauto is not mounted, the command 
should also purge any existing entries from the exports file. 
Otherwise, after a reboot, the nfs server attempts to export the 
underlying mountpath, not the dataset. This can lead to a hard hang 
for existing client mounts.

Instead of just skipping the adding of an export if not mounted 
and canmount=noauto, have it also remove an existing export of the 
dataset so that, after a reboot, we don't export an unmounted dataset.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #10747
2020-08-20 13:12:12 -07:00
Matthew Ahrens 3dc18995bd Fix indentation in dnode_free_range()
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10744
2020-08-20 11:45:20 -07:00
Matthew Macy 1c2725a157 FreeBSD: 11.x arc_stats compatibility
Removing other_size from arc_stats breaks top in 11.x jails
running on HEAD.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10745
2020-08-20 10:55:02 -07:00
Michael Niewöhner 10b3c7f5e4 Add zstd support to zfs
This PR adds two new compression types, based on ZStandard:

- zstd: A basic ZStandard compression algorithm Available compression.
  Levels for zstd are zstd-1 through zstd-19, where the compression
  increases with every level, but speed decreases.

- zstd-fast: A faster version of the ZStandard compression algorithm
  zstd-fast is basically a "negative" level of zstd. The compression
  decreases with every level, but speed increases.

  Available compression levels for zstd-fast:
   - zstd-fast-1 through zstd-fast-10
   - zstd-fast-20 through zstd-fast-100 (in increments of 10)
   - zstd-fast-500 and zstd-fast-1000

For more information check the man page.

Implementation details:

Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.

The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers).  The upper bits are used to store the compression level.

It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.

All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables.  Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).

The userspace tools all use the combined/bit-shifted value.

Additional notes:

zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.

ZSTD is included with all current tests and new tests are added
as-needed.

Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born.  This is currently only used by zstd but can be
extended as needed.

Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #6247
Closes #9024
Closes #10277
Closes #10278
2020-08-20 10:30:06 -07:00
Michael Niewöhner dc544aba15 Import ZStandard v1.4.5
ZStandard is a modern, high performance, general compression algorithm.
It provides similar or better compression levels to GZIP, but with much
better performance. ZStandard provides a large selection of compression
levels to allow a storage administrator to select the preferred
performance/compression trade-off.

This commit imports the unmodified ZStandard single-file library which
will be used by ZFS.

The implementation of this new library is done with future updates of
zstd in mind. For this reason we integrated the code in a way, that does
not require modifications to the library. For more details, see
`module/zstd/README.md`.

The library is excluded from codecov calculation and cppcheck as
unaltered dependencies do not need full codecov or cppcheck.

Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
2020-08-20 10:30:06 -07:00
Pavel Snajdr 772c69d230 Linux 5.7 compat: Include linux/sched.h in spl/sys/mutex.h
struct task_struct is needed for lockdep_off() in mutex.h

This has popped up after e616cb8daadf (in linux-5.7-rc7).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10741
2020-08-19 21:37:38 -07:00
Mariusz Zaborski f2c027bd6a FreeBSD: Add option to rewind checkpoint while importing root pool
This option is used by FreeBSD boot loader.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #10738
2020-08-19 17:19:42 -07:00
Brian Behlendorf 5266a0728a ZED: Do not offline a missing device if no spare is available
Due to commit d48091d a removed device is now explicitly offlined by
the ZED if no spare is available, rather than the letting ZFS detect
it as UNAVAIL. This broke auto-replacing of whole-disk devices, as
described in issue #10577.  In short, when a new device is reinserted
in the same slot, the ZED will try to ONLINE it without letting ZFS
recreate the necessary partition table.

This change simply avoids setting the device OFFLINE when removed if
no spare is available (or if spare_on_remove is false).  This change
has been left minimal to allow it to be backported to 0.8.x release.
The auto_offline_001_pos ZTS test has been updated accordingly.

Some follow up work is planned to update the ZED so it transitions
the vdev to a REMOVED state.  This is a state which has always
existed but there is no current interface the ZED can use to
accomplish this.  Therefore it's being left to a follow up PR.

Reviewed-by: Gionatan Danti <g.danti@assyoma.it>
Co-authored-by: Gionatan Danti <g.danti@assyoma.it>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10577
Closes #10730
2020-08-18 22:13:17 -07:00
Brian Behlendorf cfd59f904b Fix ARC aggsum access after arc_state_fini()
Commit 85ec5cbae updated abd_update_scatter_stats() such that it
calls arc_space_consume() and arc_space_return() when updating the
scatter stats.  This requires that the global aggsum value for the
ARC be initialized.  Normally this is not an issue, however during
module unload the l2arc_do_free_on_write() function was called in
l2arc_cleanup() after arc_state_fini() destroyed the aggsum values.
We can resolve this issue by performing l2arc_do_free_on_write()
slightly earlier in arc_fini().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10739
2020-08-18 22:11:34 -07:00
Ryan Moeller 4f7fb135bd libzfs_core: Initialize fail_ioc_cmd to ZFS_IOC_LAST
FreeBSD numbers `ZFS_IOC_*` starting at 0, so pick a different
sentinel value to avoid unintentionally messing with
`ZFS_IOC_POOL_CREATE` ioctls.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10729
2020-08-18 18:07:43 -07:00
Matthew Macy 716b53d0a1 FreeBSD: Fix UNIX permissions checking
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10727
2020-08-18 09:57:07 -07:00
Matthew Macy 5e7eaf8fbd Add define to enable autotrim to default to on
In FreeBSD trim has defaulted to on for several
years. In order to minimize POLA violations on
import it's important to maintain this default
when importing vendored openzfs in to FreeBSD
base.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10719
2020-08-18 09:52:30 -07:00
Ryan Moeller 009cc8e884 Make zc_nvlist_src_size limit tunable
We limit the size of nvlists passed to the kernel so a user cannot make
the kernel do an unreasonably large allocation.  On FreeBSD this limit
was 128 kiB, which turns out to be a bit too small when doing some
operations involving a large number of datasets or snapshots, for
example replication.

Make this limit tunable, with a platform-specific auto default.
Linux keeps its limit at KMALLOC_MAX_SIZE. FreeBSD uses 1/4 of the
system limit on user wired memory, which allows it to scale depending
on system configuration.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Issue #6572 
Closes #10706
2020-08-18 09:33:55 -07:00
George Melikov 663a070c92 Remove unused zpool_is_bootable
Otherwise compiler errors with:

```
libzfs_pool.c:449:1: error: 'zpool_is_bootable'
 defined but not used [-Werror=unused-function]
```

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10734
2020-08-18 09:30:12 -07:00
Richard Laager eaa25f1a8e Remove GRUB restrictions
The GRUB restrictions are based around the pool's bootfs property.
Given the current situation where GRUB is not staying current with
OpenZFS pool features, having either a non-ZFS /boot or a separate
pool with limited features are pretty much the only long-term answers
for GRUB support.  Only the second case matters in this context.  For
the restrictions to be useful, the bootfs property would have to be set
on the boot pool, because that is where we need the restrictions, as
that is the pool that GRUB reads from. The documentation for bootfs
describes it as pointing to the root pool. That's also how it's used in
the initramfs. ZFS does not allow setting bootfs to point to a dataset
in another pool. (If it did, it'd be difficult-to-impossible to enforce
these restrictions cross-pool). Accordingly, bootfs is pretty much
useless for GRUB scenarios moving forward.

Even for users who have only one pool, the existing restrictions for
GRUB are incomplete. They don't prevent you from enabling the
unsupported checksums, for example. For that reason, I have ripped out
all the GRUB restrictions.

A little longer-term, I think extending the proposed features=portable
system to define a features=grub is a much more useful approach. The
user could set that on the boot pool at creation, and things would
Just Work.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #8627
2020-08-17 23:12:39 -07:00
Brian Behlendorf d60c0dbdf3 ZTS: ztest may cause mmp tests failures
The mmp_exported_import and mmp_inactive_import tests depend on
ztest simulating an active pool.  If ztest unexpectedly terminates
due to an unrelated issue the test case will fail.  Since ztest is
not yet 100% reliable I've added these tests to the maybe exception
list.  They can be removed when the issues with ztest are resolved
or if the test cases are updated to handle these unexpected failures.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10726
2020-08-17 22:31:18 -07:00
Matthew Ahrens 85ec5cbae2 Include scatter_chunk_waste in arc_size
The ARC caches data in scatter ABD's, which are collections of pages,
which are typically 4K.  Therefore, the space used to cache each block
is rounded up to a multiple of 4K.  The ABD subsystem tracks this wasted
memory in the `scatter_chunk_waste` kstat.  However, the ARC's `size` is
not aware of the memory used by this round-up, it only accounts for the
size that it requested from the ABD subsystem.

Therefore, the ARC is effectively using more memory than it is aware of,
due to the `scatter_chunk_waste`.  This impacts observability, e.g.
`arcstat` will show that the ARC is using less memory than it
effectively is.  It also impacts how the ARC responds to memory
pressure.  As the amount of `scatter_chunk_waste` changes, it appears to
the ARC as memory pressure, so it needs to resize `arc_c`.

If the sector size (`1<<ashift`) is the same as the page size (or
larger), there won't be any waste.  If the (compressed) block size is
relatively large compared to the page size, the amount of
`scatter_chunk_waste` will be small, so the problematic effects are
minimal.

However, if using 512B sectors (`ashift=9`), and the (compressed) block
size is small (e.g. `compression=on` with the default `volblocksize=8k`
or a decreased `recordsize`), the amount of `scatter_chunk_waste` can be
very large.  On a production system, with `arc_size` at a constant 50%
of memory, `scatter_chunk_waste` has been been observed to be 10-30% of
memory.

This commit adds `scatter_chunk_waste` to `arc_size`, and adds a new
`waste` field to `arcstat`.  As a result, the ARC's memory usage is more
observable, and `arc_c` does not need to be adjusted as frequently.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10701
2020-08-17 20:04:04 -07:00
Matthew Ahrens 994de7e4b7 Remove KMC_KMEM and KMC_VMEM
`KMC_KMEM` and `KMC_VMEM` are now unused since all SPL-implemented
caches are `KMC_KVMEM`.

KMC_KMEM: Given the default value of `spl_kmem_cache_kmem_limit`, we
don't use kmalloc to back the SPL caches, instead we use kvmalloc
(KMC_KVMEM).  The flag, module parameter, /proc entries, and associated
code are removed.

KMC_VMEM: This flag is not used, and kvmalloc() is always preferable to
vmalloc().  The flag, /proc entries, and associated code are removed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10673
2020-08-17 16:04:28 -07:00
Ryan Moeller 3df0c2fa32 FreeBSD: fix the build with Clang 11
* Cast void * to uintptr_t before casting to boolean_t.

* Avoid clashing definition of __asm when not on Linux to
  prevent duplicate __volatile__. This was already done in
  some places but not all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10723
2020-08-17 15:40:17 -07:00
Matthew Macy cfdc432e64 FreeBSD: fix merge error in zfs_acl_ids_create
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10721
2020-08-17 15:28:03 -07:00
Serapheim Dimitropoulos b0099072df Fix typo in btree.c
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #10725
2020-08-17 15:25:37 -07:00
Matthew Macy 5f1984f2f8 FreeBSD: fallback to /boot/ to look for zpool.cache
Up until now zpool.cache has always lived in /boot on FreeBSD.
For the sake of compatibility fallback to /boot if zpool.cache
isn't found in /etc/zfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10720
2020-08-17 14:43:47 -07:00
George Amanakis 9352d8c004 Fix reporting of L2ARC writes in arc_summary3
arc_summary3 reports L2ARC writes in bytes. However, the related
arc_stat is reported as hits. arc_summary2 report this correctly.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10717
2020-08-17 11:04:06 -07:00
Ryan Moeller 3eaf76a8d2 Fix l2arc_dev_rebuild_start thread name
`thread_create` on FreeBSD stringifies the argument passed as the
thread function to create a name for the thread. The thread name for
`l2arc_dev_rebuild_start` ended up with `(void (*)(void *))` in it.

Change the type signature so the function does not need to be cast
when creating the thread.  Rename the function to
`l2arc_dev_rebuild_thread` for clarity and consistency, as well.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10716
2020-08-17 11:02:32 -07:00
Ryan Moeller 3c3d7c8a57 FreeBSD: Create taskq threads in appropriate proc
Stepping stone toward re-enabling spa_thread on FreeBSD.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10715
2020-08-17 11:01:19 -07:00
Allan Jude fc34dfba8e Fix L2ARC reads when compressed ARC disabled
When reading compressed blocks from the L2ARC, with
compressed ARC disabled, arc_hdr_size() returns
LSIZE rather than PSIZE, but the actual read is PSIZE.
This causes l2arc_read_done() to compare the checksum
against the wrong size, resulting in checksum failure.

This manifests as an increase in the kstat l2_cksum_bad
and the read being retried from the main pool, making the
L2ARC ineffective.

Add new L2ARC tests with Compressed ARC enabled/disabled

Blocks are handled differently depending on the state of the
zfs_compressed_arc_enabled tunable.

If a block is compressed on-disk, and compressed_arc is enabled:
- the block is read from disk
- It is NOT decompressed
- It is added to the ARC in its compressed form
- l2arc_write_buffers() may write it to the L2ARC (as is)
- l2arc_read_done() compares the checksum to the BP (compressed)

However, if compressed_arc is disabled:
- the block is read from disk
- It is decompressed
- It is added to the ARC (uncompressed)
- l2arc_write_buffers() will use l2arc_apply_transforms() to
  recompress the block, before writing it to the L2ARC
- l2arc_read_done() compares the checksum to the BP (compressed)
- l2arc_read_done() will use l2arc_untransform() to uncompress it

This test writes out a test file to a pool consisting of one disk
and one cache device, then randomly reads from it. Since the arc_max
in the tests is low, this will feed the L2ARC, and result in reads
from the L2ARC.

We compare the value of the kstat l2_cksum_bad before and after
to determine if any blocks failed to survive the trip through the
L2ARC.

Sponsored-by: The FreeBSD Foundation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #10693
2020-08-13 23:31:20 -07:00
Jorgen Lundman faa296c73c Release onexit/events with any missed zfsdev_state
Linux and FreeBSD will most likely never see this issue.
On macOS when kext is unloaded, but zed is still connected, zed
will be issued ENODEV. As the cdevsw is released, the kernel
will not have zfsdev_release() called to release minor/onexit/events,
and it "leaks". This ensures it is cleaned up before unload.

Changed the for loop from zsprev, to zsnext style, for less
code duplication.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10700
2020-08-13 15:03:23 -07:00
George Melikov f67f5832ec Github workflow: checkstyle
Use github workflow to run checkstyle
- use free (for OS projects) resources
- starts for every commit and branch
- work on forks, contributors may use it
  before creating PRs

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10705
2020-08-13 14:59:24 -07:00
George Melikov 42d4a8e5fe cstyle.pl: echo commands for github workflow
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10705
2020-08-13 14:58:53 -07:00
George Melikov 2925ed6c70 Remove stale .travis.yml
- It doesn't work now.
- It has to be manually edited on tests changes.
  (even on test runtime changes!)
- Travis gives too small time to run to be useful.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10704
2020-08-13 14:55:45 -07:00
Matthew Ahrens d64c6a2eee Use zfs_dbgmsg to log metaslab_load/unload
Metaslabs are now (usually) loaded and unloaded infrequently, but when
that is not the case, it is useful to have a log of when and why these
events happened.

This commit enables the zfs_dbgmsg() in metaslab_load(), and adds a
zfs_dbgmsg() in metaslab_unload().

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10683
2020-08-12 10:10:50 -07:00
Matthew Macy e111c80247 Restore ARC MFU/MRU pressure
The arc_adapt() function tunes LRU/MLU balance according to 4 types of
cache hits (which is passed as state agrument): ghost LRU, LRU, MRU,
ghost MRU. If this function is called with wrong cache hit (state),
adaptation will be sub-optimal and performance will suffer.

Some time ago upstream received this commit:

6950 ARC should cache compressed data) in arc_read() do next
sequence (access to ghost buffer)

Before this commit, hit to any ghost list was passed arc_adapt() before
call to arc_access() which revive element in cache and change state from
ghost to real hit.

After this commit, the order of calls was reverted and arc_adapt() is
now called only with «real» hits even if hit was in one of two ghost
lists, which renders ghost lists useless and breaks the ARC algorithm.

FreeBSD fixed this problem locally in Change D19094 / Commit r348772.

This change is an adaptation of the above commit to the current arc
code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10548 
Closes #10618
2020-08-12 10:03:24 -07:00
George Wilson 53c9d1d9b5 'zfs share -a' should handle 'canmount=noauto'
The 'zfs share -a' currently skips any filesystems which
have 'canmount=noauto' set. This behavior is unexpected since the
one would expect 'zfs share -a' to share any mounted filesystem
that has the 'sharenfs' property already set.

This changes the behavior of 'zfs share -a' to allow the sharing
of 'canmount=noauto' datasets if they are mounted.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-issue: DLPX-71313
Closes #10688
2020-08-11 13:55:04 -07:00
Matthew Macy 6f763d4085 FreeBSD: Fix module autoloading when built in base
The KMOD name is "zfs" instead of "openzfs" when building in FreeBSD.

Define a ZFS_KMOD symbol as "zfs" when IN_BASE is defined, otherwise
"openzfs".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10699
2020-08-11 13:49:50 -07:00
Coleman Kane d817c17100 Linux 5.9 compat: make_request_fn replaced with submit_bio interface
The make_request_fn and associated API was replaced recently in a
Linux 5.9 merge, to replace its functionality with a new submit_bio
member in struct block_device_operations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #10696
2020-08-11 13:37:33 -07:00
Coleman Kane dcdc12e8ba Linux 5.9 compat: Update NR_SLAB_RECLAIMABLE to NR_SLAB_RECLAIMABLE_B
This change appears to primarily be a name change for the enum. Had
to update the test logic so that it works so long as either one of
these is present (favoring the newer one). Additionally, as this is
newer, it only shows up in node_page_item, so this commit doesn't
test zone_page_item for the same enum.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #10696
2020-08-11 13:35:20 -07:00
Coleman Kane 1823c8fe6a Linux 5.9 compat: add linux/blkdev.h include
Many of the block device operations (often functions with bdev in
the name) were moved into linux/blkdev.h from linux/fs.h. Seems
that this header is already included where needed in the code, but
in the autoconf tests it was missing causing false negatives. This
commit has those tests include linux/fs.h (old location) and now
also linux/blkdev.h (new locations).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #10696
2020-08-11 13:35:10 -07:00
Allan Jude 9777044f1c Fix typo
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #10694
2020-08-11 13:16:57 -07:00
Ryan Moeller ed726fb063 Move ZVOL_DIR back to zfs.h
This was previously moved because nothing else in-tree uses it, but
evidently DilOS uses it out of tree.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <freqlabs@freebsd.org>
Closes #10361 
Closes #10685
2020-08-11 13:12:12 -07:00
Matthew Macy 0f95ddcc0c FreeBSD: update vaccess signature on most recent HEAD
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10682
2020-08-07 14:16:01 -07:00
Paul Dagnelie 12045d0278 Clarify error message when a range-tree double-add occurs
In various other pieces of logic have resulted in situations where 
we double-free space in ZFS. This in turn results in a double-add 
to the range trees. These issues have been much more difficult to 
diagnose than they should have been, because the error handling 
around this case is much weaker than around the double remove case.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10654
2020-08-07 14:13:13 -07:00
Ryan Moeller d4e6e9597d ZTS: Remove bashisms from zfs-tests.sh
Bring zfs-tests.sh in to compliance with the other scripts
by converting it /bin/sh for to avoid a dependency on bash.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10640
2020-08-07 14:10:48 -07:00
Matthew Ahrens 0cab7970f9 Remove commented-out code
Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650
2020-08-05 10:28:18 -07:00
Matthew Ahrens 74e994ec63 Remove KM_NODEBUG
Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650
2020-08-05 10:28:13 -07:00
Matthew Ahrens c6f2b942be Remove KMC_NOMAGAZINE
Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650
2020-08-05 10:28:07 -07:00
Matthew Ahrens 3c09f6949a Remove KMC_QCACHE
Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650
2020-08-05 10:28:01 -07:00
Matthew Ahrens d519c10575 Remove KMC_NOHASH
Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650
2020-08-05 10:27:56 -07:00
Matthew Ahrens f68af67a0c Remove KMC_NOTOUCH
Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650
2020-08-05 10:27:46 -07:00
Matthew Ahrens 492db125dc Remove KMC_OFFSLAB
Remove dead code to make the implementation easier to understand.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <matt@delphix.com>
Closes #10650
2020-08-05 10:25:37 -07:00
Matthew Ahrens d87676a9fa Fix i/o error handling of livelists and zap iteration
Pool-wide metadata is stored in the MOS (Meta Object Set).  This
metadata is stored in triplicate, in addition to any pool-level
reduncancy (e.g. RAIDZ).  However, if all 3+ copies of this metadata are
not available, we can still get EIO/ECKSUM when reading from the MOS.
If we encounter such an error in syncing context, we have typically
already committed to making a change that we now can't do because of the
corrupt/missing metadata.  We typically "handle" this with a `VERIFY()`
or `zfs_panic_recover()`.  This prevents the system from continuing on
in an undefined state, while minimizing the amount of error-handling
code.

However, there are some code paths that ignore these i/o errors, or
`ASSERT()` that they don't happen.  Since assertions are disabled on
non-debug builds, they effectively ignore them as well.  This can lead
to ZFS continuing on in an incorrect state, potentially leading to
on-disk inconsistencies.

This commit adds handling for these i/o errors on MOS metadata,
typically with a `VERIFY()`:

* Handle error return from `zap_cursor_retrieve()` in 4 places in
`dsl_deadlist.c`.

* Handle error return from `zap_contains()` in `dsl_dir_hold_obj()`.
Turns out this call isn't necessary because we can always call
`zap_lookup()`.

* Handle error return from `zap_lookup()` in `dsl_fs_ss_limit_check()`.

* Handle error return from `zap_remove()` in `dsl_dir_rename_sync()`.

* Handle error return from `zap_lookup()` in
`dsl_dir_remove_livelist()`.

* Handle error return from `dsl_process_sub_livelist()` in
`spa_livelist_delete_cb()`.

Additionally:

* Augment the internal history log message for `zfs destroy` to note
which method is used (e.g. bptree, livelist, or, synchronous) and the
mintxg.

* Correct a comment in `dbuf_init()`.

* Correct indentation in `dsl_dir_remove_livelist()`.

Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10643
2020-08-05 10:22:09 -07:00
Matthew Macy 1b376d176e FreeBSD: Add support for lockless lookup
Authored-by: mjg <mjg@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10657
2020-08-05 10:19:51 -07:00
Matthew Macy 22dcf89181 Add missed thread_exit() to vdev_{autotrim,rebuild}_thread
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10668
2020-08-05 10:17:07 -07:00
Pavel Snajdr 309d20882f Fix arc__wait__for__eviction tracepoint
3442c2a02d added new `arc_wait_for_eviction` tracepoint, which fails to
compile, when tracepoints are enabled.

The tracepoint definition begins with `DEFINE_ARC_WAIT_FOR_EVICTION_EVENT`
and is a multi-line definition, so this fixes the backslash
and parenthesis accordingly.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #10669
2020-08-04 10:04:00 -07:00
Jonathon f1de1600d1 Verify zfs module loaded before starting services
This is a minor change to the systemd service templates that verifies
the zfs kernel module is loaded by the kernel prior to attempting to
import any zpool.

The services check for the presence of /sys/module/zfs which indicates
the zfs is module is loaded. This uses the systemd built-in check
ConditionPathIsDirectory.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Thode <prometheanfire@gentoo.org>
Signed-off-by: Jonathon Fernyhough <jonathon.fernyhough@york.ac.uk>
Closes #10663
2020-08-01 17:13:15 -07:00
George Amanakis da60484db5 Fix logging in l2arc_rebuild()
In case the L2ARC rebuild was canceled, do not log to spa history
log as the pool may be in the process of being removed and a panic
may occur:

BUG: kernel NULL pointer dereference, address: 0000000000000018
RIP: 0010:spa_history_log_internal+0xb1/0x120 [zfs]
Call Trace:
 l2arc_rebuild+0x464/0x7c0 [zfs]
 l2arc_dev_rebuild_start+0x2d/0x130 [zfs]
 ? l2arc_rebuild+0x7c0/0x7c0 [zfs]
 thread_generic_wrapper+0x78/0xb0 [spl]
 kthread+0xfb/0x130
 ? IS_ERR+0x10/0x10 [spl]
 ? kthread_park+0x90/0x90
 ret_from_fork+0x35/0x40

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10659
2020-08-01 11:17:18 -07:00
Ryan Moeller b6737193ee FreeBSD: Fix zfs jail and add a test
zfs_jail was not using zfs_ioctl so failed to map the IOC number
correctly.  Use zfs_ioctl to perform the jail ioctl and add a test 
case for FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10658
2020-08-01 08:44:54 -07:00
Matthew Macy fe628bc21d Fix page fault in zfsctl_snapdir_getattr
Must acquire the z_teardown_lock before accessing the zfsvfs_t object.
I can't reproduce this panic on demand, but this looks like the
correct solution.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: asomers <asomers@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10656
2020-08-01 08:42:55 -07:00
Allan Jude 8fb79fdddb Change the error handling for invalid property values
ZFS recv should return a useful error message when an invalid index
property value is provided in the send stream properties nvlist

With a compression= property outside of the understood range:

Before:
```
receiving full stream of zof/zstd_send@send2 into testpool/recv@send2
internal error: Invalid argument
Aborted (core dumped)
```
Note: the recv completes successfully, the abort() is likely just to
make it easier to track the unexpected error code.

After:
```
receiving full stream of zof/zstd_send@send2 into testpool/recv@send2
cannot receive compression property on testpool/recv: invalid property
value received 28.9M stream in 1 seconds (28.9M/sec)
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10631
2020-08-01 08:41:31 -07:00
Matthew Macy 47ed79ff60 Changes to make openzfs build within FreeBSD buildworld
A collection of header changes to enable FreeBSD to build
with vendored OpenZFS.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10635
2020-07-31 21:30:31 -07:00
Ryan Moeller 0cc3454821 Convert Linux-isms to FreeBSD-isms in platform zfs_debug.c
Change some comments copied from the Linux code to describe 
the appropriate methods on FreeBSD.

Convert some tunables to ZFS_MODULE_PARAM so they get created 
on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10647
2020-07-31 21:25:35 -07:00
Ryan Moeller af524bf7ff ZTS: FreeBSD does have a l2arc.trim_ahead tunable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10633
2020-07-31 21:18:32 -07:00
Matthew Ahrens 3442c2a02d Revise ARC shrinker algorithm
The ARC shrinker callback `arc_shrinker_count/_scan()` is invoked by the
kernel's shrinker mechanism when the system is running low on free
pages.  This happens via 2 code paths:

1. "direct reclaim": The system is attempting to allocate a page, but we
are low on memory.  The ARC shrinker callback is invoked from the
page-allocation code path.

2. "indirect reclaim": kswapd notices that there aren't many free pages,
so it invokes the ARC shrinker callback.

In both cases, the kernel's shrinker code requests that the ARC shrinker
callback release some of its cache, and then it measures how many pages
were released.  However, it's measurement of released pages does not
include pages that are freed via `__free_pages()`, which is how the ARC
releases memory (via `abd_free_chunks()`).  Rather, the kernel shrinker
code is looking for pages to be placed on the lists of reclaimable pages
(which is separate from actually-free pages).

Because the kernel shrinker code doesn't detect that the ARC has
released pages, it may call the ARC shrinker callback many times,
resulting in the ARC "collapsing" down to `arc_c_min`.  This has several
negative impacts:

1. ZFS doesn't use RAM to cache data effectively.

2. In the direct reclaim case, a single page allocation may wait a long
time (e.g. more than a minute) while we evict the entire ARC.

3. Even with the improvements made in 67c0f0dedc ("ARC shrinking blocks
reads/writes"), occasionally `arc_size` may stay above `arc_c` for the
entire time of the ARC collapse, thus blocking ZFS read/write operations
in `arc_get_data_impl()`.

To address these issues, this commit limits the ways that the ARC
shrinker callback can be used by the kernel shrinker code, and mitigates
the impact of arc_is_overflowing() on ZFS read/write operations.

With this commit:

1. We limit the amount of data that can be reclaimed from the ARC via
the "direct reclaim" shrinker.  This limits the amount of time it takes
to allocate a single page.

2. We do not allow the ARC to shrink via kswapd (indirect reclaim).
Instead we rely on `arc_evict_zthr` to monitor free memory and reduce
the ARC target size to keep sufficient free memory in the system.  Note
that we can't simply rely on limiting the amount that we reclaim at once
(as for the direct reclaim case), because kswapd's "boosted" logic can
invoke the callback an unlimited number of times (see
`balance_pgdat()`).

3. When `arc_is_overflowing()` and we want to allocate memory,
`arc_get_data_impl()` will wait only for a multiple of the requested
amount of data to be evicted, rather than waiting for the ARC to no
longer be overflowing.  This allows ZFS reads/writes to make progress
even while the ARC is overflowing, while also ensuring that the eviction
thread makes progress towards reducing the total amount of memory used
by the ARC.

4. The amount of memory that the ARC always tries to keep free for the
rest of the system, `arc_sys_free` is increased.

5. Now that the shrinker callback is able to provide feedback to the
kernel's shrinker code about our progress, we can safely enable
the kswapd hook. This will allow the arc to receive notifications
when memory pressure is first detected by the kernel. We also
re-enable the appropriate kstats to track these callbacks.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10600
2020-07-31 21:10:52 -07:00
Ryan Moeller 18c624302d ZTS: zvol_misc_volmode is flaky on FreeBSD
Mark this as a known issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10655
2020-07-31 21:05:55 -07:00
Ryan Moeller 721ed01548 ZTS: Use POSIX-compatible space character class
FreeBSD recently integrated a change which causes \s in a regex to
throw an error instead of silently being misinterpreted as an s.

Change the regex in zpool_colors.ksh to use [[:space:]].

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes #10651
2020-07-31 18:11:21 -07:00
Ryan Moeller 25499e2139 lua: Increase reserved stack space for FreeBSD in debug config
FreeBSD uses more stack space in debug configurations and can overflow
the stack while formatting the error message when the call depth limit
of 20 frames is reached.  This is readily reproduced by running the
gsub recursion test with increased kstack size.  I hit the panic with
16 pages per kstack, and noticed it go away when bumped to 17.

Reserve an additional 64 bytes on the stack when building for FreeBSD.
This is enough to avoid the panic with a deep stack while not wasting
too much space when the default stack size is used.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10634
2020-07-31 09:17:37 -07:00
Allan Jude 24f98ed383 When encountering EZFS_UNKNOWN, print the error text buffer anyway
Rather than just saying there was an internal error, provide any
context we might have to the user to help them understand the issue.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10632
2020-07-31 09:07:37 -07:00
Allan Jude eabf270b2c Remove duplicate include of sys/zfeature.h in dmu_objset.c
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10636
2020-07-31 09:04:45 -07:00
Allan Jude 4cf6f10714 pyzfs: Add missing entry to zfs_errno
This was causing all later errno's to have the incorrect value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10649
2020-07-31 09:01:41 -07:00
Matthew Ahrens 948423a3d1 zfs promote does not delete livelist of origin
When a clone is promoted, its livelist is no longer accurate, so it is
discarded.  If the clone's origin is also a clone (i.e. we are promoting
a clone of a clone), then the origin's livelist is also no longer
accurate, so it should be discarded, but the code doesn't actually do
that.

Consider a pool with:
* Filesystem A
* Clone B, a clone of A
* Clone C, a clone of B

If we promote C, it discards C's livelist.  It should discard B's
livelist, but that is not happening.  The impact is that when B is
destroyed, we use the livelist to find the blocks to free, but the
livelist is no longer correct so we end up freeing blocks that are still
in use by C.  The incorrectly-freed blocks can be reallocated causing
checksum errors.  And when C is destroyed it can double-free the
incorrectly-freed blocks.

The problem is that we remove the livelist of `origin_ds->ds_dir`, but
the origin snapshot has already been moved to the promoted dsl_dir.  So
this is actually trying to remove the livelist of the promoted dsl_dir,
which was already removed.  As explained in a comment in the beginning
of `dsl_dataset_promote_sync()`, we need to use the saved `odd` for the
origin's dsl_dir.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10652
2020-07-31 08:59:00 -07:00
Don Brady a15c6f3310 ZTS: minor improvements to alloc_class_009_pos functional test
* Fixed a typo that cause one of the variations to be a no-op
* Added additional coverage for adding special vdev after pool create
* Added additional coverage for using 4K sector size

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #10641
2020-07-30 09:11:05 -07:00
Ryan Moeller 2f571dbe06 Use correct prefix for share/pam-configs
Respect the configured install prefix.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10604
2020-07-30 09:09:46 -07:00
Matthew Ahrens 3a92552f75 Fix error handling of vdev_top_zap
In `vdev_load()`, we look up several entries in the `vdev_top_zap`
object.  In most cases, if we encounter an i/o error, it will be
returned to the caller.  However, when handling
`VDEV_TOP_ZAP_ALLOCATION_BIAS`, if we get an i/o error, we may continue
on, which in theory could cause us to not realize that a vdev should be
used only for `special` allocations.

In practice, if we encountered an i/o error while looking for
`VDEV_TOP_ZAP_ALLOCATION_BIAS` in the `vdev_top_zap`, we'd also get an
i/o error while looking for other entries in the same object, and thus
the zpool open/import would fail.  Therefore the impact of this problem
is negligible.

This commit adds error handling for i/o errors while accessing the
`vdev_top_zap`, so that we aren't relying on unrelated code to fail for
us.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10637
2020-07-29 17:04:34 -07:00
Jonathon ae12b02308 Verify zfs module loaded before starting services
This is a minor change to the systemd service templates that verifies the zfs
kernel module is loaded by the kernel prior to attempting to import any zpool.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jonathon Fernyhough <jonathon.fernyhough@york.ac.uk>
Closes #10627
2020-07-29 16:52:18 -07:00
Matthew Macy 27d96d2254 Rename refcount.h to zfs_refcount.h
Renamed to avoid conflicting with refcount.h when a different
implementation is already provided by the platform.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10620
2020-07-29 16:35:33 -07:00
Serapheim Dimitropoulos 843e9ca2e1 Introduce names for ZTHRs
When debugging issues or generally analyzing the runtime of
a system it would be nice to be able to tell the different
ZTHRs running by name rather than having to analyze their
stack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #10630
2020-07-29 09:43:33 -07:00
Matthew Macy 5678d3f593 Prefix zfs internal endian checks with _ZFS
FreeBSD defines _BIG_ENDIAN BIG_ENDIAN _LITTLE_ENDIAN
LITTLE_ENDIAN on every architecture. Trying to do
cross builds whilst hiding this from ZFS has proven
extremely cumbersome.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10621
2020-07-28 13:02:49 -07:00
Matthew Ahrens 3eabed74c0 Fix lua stack overflow on recursive call to gsub()
The `zfs program` subcommand invokes a LUA interpreter to run ZFS
"channel programs".  This interpreter runs in a constrained environment,
with defined memory limits.  The LUA stack (used for LUA functions that
call each other) is allocated in the kernel's heap, and is limited by
the `-m MEMORY-LIMIT` flag and the `zfs_lua_max_memlimit` module
parameter.  The C stack is used by certain LUA features that are
implemented in C.  The C stack is limited by `LUAI_MAXCCALLS=20`, which
limits call depth.

Some LUA C calls use more stack space than others, and `gsub()` uses an
unusually large amount.  With a programming trick, it can be invoked
recursively using the C stack (rather than the LUA stack).  This
overflows the 16KB Linux kernel stack after about 11 iterations, less
than the limit of 20.

One solution would be to decrease `LUAI_MAXCCALLS`.  This could be made
to work, but it has a few drawbacks:

1. The existing test suite does not pass with `LUAI_MAXCCALLS=10`.

2. There may be other LUA functions that use a lot of stack space, and
the stack space may change depending on compiler version and options.

This commit addresses the problem by adding a new limit on the amount of
free space (in bytes) remaining on the C stack while running the LUA
interpreter: `LUAI_MINCSTACK=4096`.  If there is less than this amount
of stack space remaining, a LUA runtime error is generated.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10611 
Closes #10613
2020-07-27 16:11:47 -07:00
Matthew Macy e64cc4954c Refactor ccompile.h to not include system headers
This is a step toward being able to vendor the OpenZFS code in FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10625
2020-07-25 20:09:50 -07:00
Matthew Macy 6d8da84106 Make use of ZFS_DEBUG consistent within kmod sources
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10623
2020-07-25 20:07:44 -07:00
Matthew Macy f5b189f937 FreeBSD: Fixes required to build ZFS on PowerPC
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10622
2020-07-25 11:00:23 -07:00
Ryan Moeller d364de7a89 FreeBSD: Remove accidental ARC size limiter
i386 has some additional memory reservation logic that limits the size
of the reported available memory.  This was accidentally being used on
all arches due to a missing header.

Include machine/vmparam.h in freebsd/zfs/arc_os.c to pull in the
missing UMA_MD_SMALL_ALLOC definition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10616
2020-07-25 10:49:49 -07:00
Ryan Moeller cb18d88060 FreeBSD: Implement arc_free_memory
This is only used for the kstat, but something other than 0 is nice.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10626
2020-07-25 10:47:18 -07:00
Brian Atkinson 6fba7bfd0e Add gang ABD child to parent gang ABD
By design a gang ABD can not have another gang ABD as a child. This is
to make sure the logical offset in a gang ABD is consistent with the
individual ABDS it contains as children. If a gang ABD is added as a
child of a gang ABD we will add the individual children of the gang ABD
to the parent gang ABD. This allows for a consistent view of offsets
within the parent gang ABD.

Reviewed-by: Mark Maybee <mmaybee@cray.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10430
2020-07-24 21:09:20 -07:00
Ryan Moeller 8348fac30c Limit dbuf cache sizes based only on ARC target size by default
Set the initial max sizes to ULONG_MAX to allow the caches to grow
with the ARC.

Recalculate the metadata cache size on demand so it can adapt, too.

Update descriptions in zfs-module-parameters(5).

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10563 
Closes #10610
2020-07-24 20:38:48 -07:00
Matthew Ahrens 4fbdb10c7b remove kmem_cache module parameter KMC_EXPIRE_AGE
By default, `spl_kmem_cache_expire` is `KMC_EXPIRE_MEM`, meaning that
objects will be removed from kmem cache magazines by
`spl_kmem_cache_reap_now()`.

There is also a module parameter to change this to `KMC_EXPIRE_AGE`,
which establishes a maximum lifetime for objects to stay in the
magazine.  This setting has rarely, if ever, been used, and is not
regularly tested.

This commit removes the code for `KMC_EXPIRE_AGE`, and associated module
parameters.

Additionally, the unused module parameter
`spl_kmem_cache_obj_per_slab_min` is removed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10608
2020-07-24 09:39:26 -07:00
tony-zfs 02fced3067 Add support to decode a resume token
Adding a new subcommand to zstream called token. This
now allows users to decode a resume token to retrieve the toname
field. This can be useful for tools that need this information.
The syntax works as follows zstream token <resume_token>.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Tony Perkins <tperkins@datto.com>
Closes #10558
2020-07-23 17:44:03 -07:00
Kyle Evans bfafe1780a Annotate unused parameters on inline definitions as such
* libspl: umem: These are obviously and intentionally unused; annotate 
  them as such to appease -Wunused-parameter builds that include this 
  header.

* sys/dmu.h: In this case, clear_on_evict_dbufp is only used for 
  ZFS_DEBUG builds, so annotate it as __maybe_unused to appease 
  -Wunused-parameter.


Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #10606
2020-07-23 17:41:48 -07:00
Ryan Moeller f7a68f99d0 FreeBSD: Remove some code duplication in sysctl_os.c
Drop unnecessary redefinition's of several arcstat values.
Put missing extern declaration of arc_no_grow_shift in arc_impl.h.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10609
2020-07-23 17:35:34 -07:00
Kyle Evans b197457cd6 libzfs: const'ify path argument to zfs_path_to_zhandle
zfs_path_to_zhandle has no need to mutate the path argument,
most notably:

- zfs_open takes path as const
- getextmntent takes path as const
- fprintf most clearly doesn't need to mutate it

It's hard to foresee any reason that libzfs could conceivably
want to mutate it in the future, either, so const'ify it.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #10605
2020-07-22 11:14:20 -07:00
Ryan Moeller db5b3926e9 OpenZFSify CONTRIBUTING.md
Update stale references to "ZFS on Linux" to "OpenZFS" in
CONTRIBUTING.md.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10602
2020-07-22 11:09:04 -07:00
Ryan Moeller 0c79b070a7 ZTS: Fix devname2devid build on FreeBSD with libudev
When libudev is installed on FreeBSD, configure finds it and sets
WANT_DEVNAME2DEVID, but it isn't found by the linker because we
didn't specify where it is.

Use LIBUDEV_LIBS so the location of the library gets added to the
linker flags for devname2devid.
Also use LIBUDEV_CFLAGS here in case some other platform needs it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10590
2020-07-22 10:49:22 -07:00
Arvind Sankar 59415fc9fb Add zfs_gitrev.h to the distributed sources
Commit 109d2c9310 ("Move zfs_gitrev.h to build directory") stopped
distributing zfs_gitrev.h, as it is a generated file. Add it back, with
some changes in behavior.

Change the logic for gitrev as follows
- if the source tree is a git repository, the behavior for build is
  unchanged. For make dist, append -dist to the git tag in the
  distributed version of zfs_gitrev.h.
- otherwise, check if the source tree contains zfs_gitrev.h, and use it
  if so, falling back to "unknown" if it doesn't exist.
- clean it only in make maintainer-clean, so we don't remove it from the
  source tree on make clean or make distclean.

This allows disted sources to track what git tag they originally came
from, with the -dist suffix indicating that the code wasn't built
directly from git and so might contain additional changes beyond the git
tag.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10595
2020-07-22 10:00:40 -07:00
Arvind Sankar d32a59fe2b Restore scripts/make_gitrev.sh
Commit 109d2c9310 ("Move zfs_gitrev.h to build directory") removed
scripts/make_gitrev.sh, putting the logic into the Makefile itself.

However, at least the Arch Linux packager wants the script so that the
file can be generated without having to run configure first, for
DKMS packaging purposes.

So move the make recipe back into the script.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10595
2020-07-22 09:59:36 -07:00
Matthew Ahrens 5dd92909c6 Adjust ARC terminology
The process of evicting data from the ARC is referred to as
`arc_adjust`.

This commit changes the term to `arc_evict`, which is more specific.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10592
2020-07-22 09:51:47 -07:00
Evan Harris 317dbea173 Updated CONTRIBUTING doc links for new issue labels
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Evan Harris <eharris@puremagic.com>
Closes #10584
2020-07-19 10:22:44 -07:00
Arvind Sankar 5ad61b5b01 Disable shebang mangling on input files
The DKMS module installs the entire source tree, including the .in files
that will later be substituted when building. This makes
brp_mangle_shebangs complain about shebang lines in the .in files.

Exclude everything under /usr/src from shebang mangling in the DKMS
package.

The KMOD package doesn't contain any of the files it excludes from
mangling, so just drop the exclusion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: João Carlos Mendes Luís <jonny@jonny.eng.br>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10581 
Closes #10582
2020-07-19 10:19:08 -07:00
Ryan Moeller 0421f257b2 FreeBSD: Add legacy arc_min and arc_max
These tunables were renamed from vfs.zfs.arc_min and 
vfs.zfs.arc_max to vfs.zfs.arc.min and vfs.zfs.arc.max.
Add legacy compat tunables for the old names.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10579
2020-07-19 10:15:34 -07:00
Jean-Baptiste Lallement ceadc0dbbd Make unloading the key more robust
The unit was failing instead of stopping if someone manually unloaded
the key before stopping the unit (zfs unload-key is failing on an
unavailable key).
Follow a similar logic than for loading the key, checking for the key
status before unloading it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Co-authored-by: Didier Roche <didrocks@ubuntu.com>
Signed-off-by: Didier Roche <didrocks@ubuntu.com>
Closes #10477
2020-07-19 10:04:14 -07:00
Jean-Baptiste Lallement b717f9b95e BindsTo dataset keyload unit to mount associate unit
We need a stronger dependency between the mount unit and its keyload unit
when we know that the dataset is encrypted.
If the keyload unit fails, Wants= will still try to mount the dataset,
which will then fail.
It’s better to show that the failure is due to a dependency failing, the
keyload unit, by tighting up the dependency. We can do this as we know
that we generate both units in the generator and so, it’s not an
optional dependency.
BindsTo enable as well that if the keyload unit fails at any point, the
associated mountpoint will be then unmounted.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Didier Roche <didrocks@ubuntu.com>
Signed-off-by: Didier Roche <didrocks@ubuntu.com>
Closes #10477
2020-07-19 10:03:48 -07:00
Jean-Baptiste Lallement de817cc7b7 Ensure mount unit pilots when its ZFS key is loaded
Drop Before=zfs.mount dependency explicity on generated key-load .service
unit.
Indeed, the associated mount unit is After=<dataset-key-load>.service.
This is thus the mount point which controls at what point it wants to be
mounted (Before=zfs-mount.service in stock generator), but this can be
an automount point, or triggered by another service.
This additional dependency from the key load service is not needed thus.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Didier Roche <didrocks@ubuntu.com>
Signed-off-by: Didier Roche <didrocks@ubuntu.com>
Closes #10477
2020-07-19 10:03:02 -07:00
Matthew Ahrens 026e529cb3 Remove skc_reclaim, hdr_recl, kmem_cache shrinker
The SPL kmem_cache implementation provides a mechanism, `skc_reclaim`,
whereby individual caches can register a callback to be invoked when
there is memory pressure.  This mechanism is used in only one place: the
ARC registers the `hdr_recl()` reclaim function.  This function wakes up
the `arc_reap_zthr`, whose job is to call `kmem_cache_reap()` and
`arc_reduce_target_size()`.

The `skc_reclaim` callbacks are invoked only by shrinker callbacks and
`arc_reap_zthr`, and only callback only wakes up `arc_reap_zthr`.  When
called from `arc_reap_zthr`, waking `arc_reap_zthr` is a no-op.  When
called from shrinker callbacks, we are already aware of memory pressure
and responding to it.  Therefore there is little benefit to ever calling
the `hdr_recl()` `skc_reclaim` callback.

The `arc_reap_zthr` also wakes once a second, and if memory is low when
allocating an ARC buffer.  Therefore, additionally waking it from the
shrinker calbacks has little benefit.

The shrinker callbacks can be invoked very frequently, e.g. 10,000 times
per second.  Additionally, for invocation of the shrinker callback,
skc_reclaim is invoked many times.  Therefore, this mechanism consumes
significant amounts of CPU time.

The kmem_cache shrinker calls `spl_kmem_cache_reap_now()`, which,
in addition to invoking `skc_reclaim()`, does two things to attempt to
free pages for use by the system:
 1. Return free objects from the magazine layer to the slab layer
 2. Return entirely-free slabs to the page layer (i.e. free pages)

These actions apply only to caches implemented by the SPL, not those
that use the underlying kernel SLAB/SLUB caches.  The SPL caches are
used for objects >=32KB, which are primarily linear ABD's cached in the
DBUF cache.

These actions (freeing objects from the magazine layer and returning
entirely-free slabs) are also taken whenever a `kmem_cache_free()` call
finds a full magazine.  So there would typically be zero entirely-free
slabs, and the number of objects in magazines is limited (typically no
more than 64 objects per magazine, and there's one magazine per CPU).
Therefore the benefit of `spl_kmem_cache_reap_now()`, while nonzero, is
modest.

We also call `spl_kmem_cache_reap_now()` from the `arc_reap_zthr`, when
memory pressure is detected.  Therefore, calling
`spl_kmem_cache_reap_now()` from the kmem_cache shrinker is not needed.

This commit removes the `skc_reclaim` mechanism, its only callback
`hdr_recl()`, and the kmem_cache shrinker callback.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10576
2020-07-19 09:58:30 -07:00
Brian Behlendorf e862b7ecfc Linux 4.10 compat: has_capability()
Stock kernels older than 4.10 do not export the has_capability()
function which is required by commit e59a377.  To avoid breaking
the build on older kernels revert to the safe legacy behavior and
return EACCES when privileges cannot be checked.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10565
Closes #10573
2020-07-19 09:56:21 -07:00
Matthew Ahrens 8fbf432ae2 anon_pages are not free/evictable
`arc_free_memory()` returns the amount of memory that the ARC considers
to be free.  This includes pages that are not actually free, but can be
evicted with essentially zero cost (without doing any i/o), for example
the page cache.  The ARC can "squeeze out" any pages included in this
calculation, leaving only `arc_sys_free` (1/64th of RAM) for these
free/evictable pages.

Included in the count of free/evictable pages is
`nr_inactive_anon_pages()`, which is described as "Anonymous memory that
has not been used recently and can be swapped out".  These pages would
have to be written out to disk (swap) in order to evict them, and they
are not included in `/proc/meminfo`'s `MemAvailable`.

Therefore it is not appropriate for `nr_inactive_anon_pages()` to be
included in the free/evictable memory returned by `arc_free_memory()`,
because the ARC shouldn't (intentionally) make the system swap.

This commit removes `nr_inactive_anon_pages()` from the memory returned
by `arc_free_memory()`.  This is a step towards enabling the ARC to
manage free memory by monitoring it and reducing the ARC size as we
notice that there is insufficient free memory (in the `arc_reap_zthr`),
rather than the current method of relying on the `arc_shrinker`
callback.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10575
2020-07-16 10:11:26 -07:00
Matthew Macy 23c871671c FreeBSD: zfs commands backward compatibility
Update the zfs commands such that they're backwards compatible with
the version of ZFS is the base FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10542
2020-07-15 21:32:50 -07:00
Brian Behlendorf f1b8153794 Update zts-report.py with additional tests
The following test cases have been observed to fail frequently
enough to be a problem when reporting CI results.  Until they can
be updated to be entirely reliable add them to the zts-report.py
script.

    alloc_class/alloc_class_011_neg
    cli_root/zpool_import/zpool_import_012_pos
    mmp/mmp_on_uberblocks
    rsend/send_partial_dataset

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10578
2020-07-15 21:28:18 -07:00
Ryan Moeller 103bc5b957 ZTS: Fix nonportable use of stat in list_file_blocks
FreeBSD stat uses -f to specify the format string rather than -c.
list_file_blocks in blkdev.shlib uses stat -c %i to get a file's
object ID for zdb.  We already have a library function to do this
portably.

Use get_objnum to get the file's object ID.

Take log_must off of the call to list_free_blocks in
corrupt_blocks_at_level, which had masked the error.  It was not good
to pipe the output of log_must into the while-loop, anyway.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10572
2020-07-15 21:26:39 -07:00
Romain Dolbeau 01a4852ecb Fix early include of <linux/percpu_compat.h>
Move/add include of <linux/percpu_compat.h> to satisfy missing
requirements.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain@dolbeau.org>
Closes #10568 
Closes #10569
2020-07-15 15:58:15 -07:00
Matthew Ahrens 6774931dfa Extend zdb to print inconsistencies in livelists and metaslabs
Livelists and spacemaps are data structures that are logs of allocations
and frees.  Livelists entries are block pointers (blkptr_t). Spacemaps
entries are ranges of numbers, most often used as to track
allocated/freed regions of metaslabs/vdevs.

These data structures can become self-inconsistent, for example if a
block or range can be "double allocated" (two allocation records without
an intervening free) or "double freed" (two free records without an
intervening allocation).

ZDB (as well as zfs running in the kernel) can detect these
inconsistencies when loading livelists and metaslab.  However, it
generally halts processing when the error is detected.

When analyzing an on-disk problem, we often want to know the entire set
of inconsistencies, which is not possible with the current behavior.
This commit adds a new flag, `zdb -y`, which analyzes the livelist and
metaslab data structures and displays all of their inconsistencies.
Note that this is different from the leak detection performed by
`zdb -b`, which checks for inconsistencies between the spacemaps and the
tree of block pointers, but assumes the spacemaps are self-consistent.

The specific checks added are:

Verify livelists by iterating through each sublivelists and:
- report leftover FREEs
- report double ALLOCs and double FREEs
- record leftover ALLOCs together with their TXG [see Cross Check]

Verify spacemaps by iterating over each metaslab and:
- iterate over spacemap and then the metaslab's entries in the
  spacemap log, then report any double FREEs and double ALLOCs

Verify that livelists are consistenet with spacemaps.  The space
referenced by livelists (after using the FREE's to cancel out
corresponding ALLOCs) should be allocated, according to the spacemaps.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-66031
Closes #10515
2020-07-14 17:51:05 -07:00
Arvind Sankar 38e2e9ce83 Centralize variable substitution
A bunch of places need to edit files to incorporate the configured paths
i.e. bindir, sbindir etc. Move this logic into a common file.

Create arc_summary by copying arc_summary[23] as appropriate at build
time instead of install time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10559
2020-07-14 17:33:44 -07:00
Arvind Sankar bdb518c13a Make RPM_DEFINE_KMOD conditional on CONFIG_KERNEL
The configure variables won't be defined if CONFIG_KERNEL is disabled
and defining empty macros causes errors. The spec files do provide some
defaults if the macros are undefined.

Remove config conditionals in the tgz Makefile.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10564
2020-07-14 17:32:21 -07:00
Arvind Sankar e6c093dd94 Fix parallel make srpm
When building srpm using make -j, each of the recursive makes invoked to
build srpm-{dkms,kmod,utils} will build the dist target. This is both
unnecessary, and also has a very good chance of breaking when they race
trying to build gitrev.

Fix this by make dist a prerequisite of srpm-{dkms,kmod,utils} instead
of srpm-common, so that it will be done once before invoking the
recursive makes.

Also, gitrev is not really required for make dist, so instead of adding
it to BUILT_SOURCES, just add it as a prerequisite of the all target.

Mark the individual package targets as PHONY.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10564
2020-07-14 17:31:45 -07:00
Alexander Motin 1743c737f5 Fix LOR between dp_config_rwlock and spa_props_lock
Our QE team during automated API testing hit deadlock in ZFS, caused
by lock order reversal.  From one side dsl_sync_task_sync() locks
dp_config_rwlock as writer and calls spa_sync_props(), which waits
for spa_props_lock.  From another spa_prop_get() locks spa_props_lock
and then calls dsl_pool_config_enter(), trying to lock dp_config_rwlock
as reader.

This patch makes spa_prop_get() lock dp_config_rwlock before
spa_props_lock, making the order consistent.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #10553
2020-07-14 12:21:57 -07:00
Joao Carlos Mendes Luis 5f72109e5b Disable -Wl,-z,defs for ASAN builds
Commit af65916 added -Wl,-z,defs for the shared libraries. This
apparently does not work in some cases with --enable-asan, so only add
it for non-ASAN builds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: João Carlos Mendes Luis <jonny@jonny.eng.br>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10557
Closes #10560
2020-07-14 12:17:44 -07:00
Brian Atkinson e4d3d77684 Fixing gang ABD child removal race condition
On linux the list debug code has been setting off a failure when
checking that the node->next->prev value is pointing back at the node.
At times this check evaluates to 0xdead. When removing a child from a
gang ABD we must acquire the child's abd_mtx to make sure that the
same ABD is not being added to another gang ABD while it is being
removed from a gang ABD. This fixes a race condition when checking
if an ABDs link is already active and part of another gang ABD before
adding it to a gang.

Added additional debug code for the gang ABD in abd_verify() to make
sure each child ABD has active links. Also check to make sure another
gang ABD is not added to a gang ABD.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10511
2020-07-14 11:04:35 -07:00
George Wilson c15d36c674 Remove dependency on sharetab file and refactor sharing logic
== Motivation and Context

The current implementation of 'sharenfs' and 'sharesmb' relies on
the use of the sharetab file. The use of this file is os-specific
and not required by linux or freebsd. Currently the code must
maintain updates to this file which adds complexity and presents
a significant performance impact when sharing many datasets. In
addition, concurrently running 'zfs sharenfs' command results in
missing entries in the sharetab file leading to unexpected failures.

== Description

This change removes the sharetab logic from the linux and freebsd
implementation of 'sharenfs' and 'sharesmb'. It still preserves an
os-specific library which contains the logic required for sharing
NFS or SMB. The following entry points exist in the vastly simplified
libshare library:

- sa_enable_share -- shares a dataset but may not commit the change
- sa_disable_share -- unshares a dataset but may not commit the change
- sa_is_shared -- determine if a dataset is shared
- sa_commit_share -- notify NFS/SMB subsystem to commit the shares
- sa_validate_shareopts -- determine if sharing options are valid

The sa_commit_share entry point is provided as a performance enhancement
and is not required. The sa_enable_share/sa_disable_share may commit
the share as part of the implementation. Libshare provides a framework
for both NFS and SMB but some operating systems may not fully support
these protocols or all features of the protocol.

NFS Operation:
For linux, libshare updates /etc/exports.d/zfs.exports to add
and remove shares and then commits the changes by invoking
'exportfs -r'. This file, is automatically read by the kernel NFS
implementation which makes for better integration with the NFS systemd
service. For FreeBSD, libshare updates /etc/zfs/exports to add and
remove shares and then commits the changes by sending a SIGHUP to
mountd.

SMB Operation:
For linux, libshare adds and removes files in /var/lib/samba/usershares
by calling the 'net' command directly. There is no need to commit the
changes. FreeBSD does not support SMB.

== Performance Results

To test sharing performance we created a pool with an increasing number
of datasets and invoked various zfs actions that would enable and
disable sharing. The performance testing was limited to NFS sharing.
The following tests were performed on an 8 vCPU system with 128GB and
a pool comprised of 4 50GB SSDs:

Scale testing:
- Share all filesystems in parallel -- zfs sharenfs=on <dataset> &
- Unshare all filesystems in parallel -- zfs sharenfs=off <dataset> &

Functional testing:
- share each filesystem serially -- zfs share -a
- unshare each filesystem serially -- zfs unshare -a
- reset sharenfs property and unshare -- zfs inherit -r sharenfs <pool>

For 'zfs sharenfs=on' scale testing we saw an average reduction in time
of 89.43% and for 'zfs sharenfs=off' we saw an average reduction in time
of 83.36%.

Functional testing also shows a huge improvement:
- zfs share -- 97.97% reduction in time
- zfs unshare -- 96.47% reduction in time
- zfs inhert -r sharenfs -- 99.01% reduction in time

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Bryant G. Ly <bryangly@gmail.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-Issue: DLPX-68690
Closes #1603
Closes #7692
Closes #7943
Closes #10300
2020-07-13 09:19:18 -07:00
Matthew Ahrens e59a377a8f filesystem_limit/snapshot_limit is incorrectly enforced against root
The filesystem_limit and snapshot_limit properties limit the number of
filesystems or snapshots that can be created below this dataset.
According to the manpage, "The limit is not enforced if the user is
allowed to change the limit."  Two types of users are allowed to change
the limit:

1. Those that have been delegated the `filesystem_limit` or
`snapshot_limit` permission, e.g. with
`zfs allow USER filesystem_limit DATASET`.  This works properly.

2. A user with elevated system privileges (e.g. root).  This does not
work - the root user will incorrectly get an error when trying to create
a snapshot/filesystem, if it exceeds the `_limit` property.

The problem is that `priv_policy_ns()` does not work if the `cred_t` is
not that of the current process.  This happens when
`dsl_enforce_ds_ss_limits()` is called in syncing context (as part of a
sync task's check func) to determine the permissions of the
corresponding user process.

This commit fixes the issue by passing the `task_struct` (typedef'ed as
a `proc_t`) to syncing context, and then using `has_capability()` to
determine if that process is privileged.  Note that we still need to
pass the `cred_t` to syncing context so that we can check if the user
was delegated this permission with `zfs allow`.

This problem only impacts Linux.  Wrappers are added to FreeBSD but it
continues to use `priv_check_cred()`, which works on arbitrary `cred_t`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8226
Closes #10545
2020-07-11 17:18:02 -07:00
Ryan Moeller 217f48373f libzfs: Add error message for why creating mountpoint failed
When zfs_mount_at() fails to stat the mountpoint and can't create the
directory, we return an error with a message "failed to create
mountpoint" but there is no indication why it failed.

Add the error string from the syscall to the error aux message.

Update do_mount for Linux to return the errno instead of -1.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10550
2020-07-11 17:16:13 -07:00
Matthew Macy 3933305eac FreeBSD: Use a hash table for taskqid lookups
Previously a tqent could be recycled prematurely, update the
code to use a hash table for lookups to resolve this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10529
2020-07-11 17:13:45 -07:00
Serapheim Dimitropoulos 6f1db5f37e Unconditionally enable debugging for libzpool
We already enable -DDEBUG unconditionally (meaning regardless
of this is a debug build or a performance build) for zdb and
ztest as they are mostly used for development and debugging.

This patch enables -DDEBUG for libzpool extending the debugging
checks for zdb, ztest, and a couple of other test utilities.

In addition to passing -DDEBUG we also enable -DZFS_DEBUG so
all assertion checks work s expected. We do so not only in
libzpool but in every utility that links to it, even if the
utility doesn't directly use any functionality wrapped in
ZFS_DEBUG macro definitions. The reason is that these utilities
may still include headers that contain structs that have more
fields when ZFS_DEBUG is defined. This can be a problem as
enabling that flag for libzpool but not for zdb can lead into
random problems (e.g. segmentation faults) as zdb may be have
an incorrect view of a struct passed to it by libzpool.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #10549
2020-07-10 15:30:31 -07:00
Arvind Sankar f040a7b0f8 Fix up FIND_SYSTEM_LIBRARY to work with cross-compiling
Make FIND_SYSTEM_LIBRARY respect a configured sysroot, otherwise it
might find headers from the build machine and assume the library is
available on the host/target.

Tighten up error checking: if pkg-config or the user specified _CFLAGS
or _LIBS but we can't find the header/library, issue a fatal error.

Fix the -L flag to /usr/local/lib instead of just /usr/local.

Clean out the _CFLAGS and _LIBS if we located something that we later
find doesn't work.

Rename FIND_SYSTEM_LIBRARY into the ZFS_AC_ scope.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538
2020-07-10 14:26:40 -07:00
Arvind Sankar 3e597dee11 Use abs_top_builddir when referencing libraries
libtool stores absolute paths in the dependency_libs component of the
.la files. If the Makefile for a dependent library refers to the
libraries by relative path, some libraries end up duplicated on the link
command line.

As an example, libzfs specifies libzfs_core, libnvpair and libuutil as
dependencies to be linked in. The .la file for libzfs_core also
specifies libnvpair, but using an absolute path, with the result that
libnvpair is present twice in the linker command line for producing
libzfs.

While the only thing this causes is to slightly slow down the linking,
we can avoid it by using absolute paths everywhere, including for
convenience libraries just for consistency.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538
2020-07-10 14:26:32 -07:00
Arvind Sankar af65916226 Add -z defs to LDFLAGS
This will make sure the installed libraries are linked with everything
they require.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538
2020-07-10 14:26:22 -07:00
Arvind Sankar 1537105a8c Add config.rpath for AM_GNU_GETTEXT
Commit e8864b1b28 ("config: libintl/libiconv for gettext() detection")
added an empty config.rpath with a comment that the real one doesn't
work with libtool.

However, an empty config.rpath doesn't really work: eg. on FreeBSD,
where libintl is in /usr/local/lib, configure thinks that gettext
doesn't exist and NLS should be disabled, which currently isn't
supported in the source, and hence requires manual workaround to
directly link -lintl without relying on configure. config.rpath is
essential to let it be detected either in --prefix or using
--with-libintl-prefix.

I also don't see the mentioned issue with libtool flags applied to
compilation, it seems to work fine to pass LTLIBINTL to libtool. It's
unnecessary to include LTLIBICONV as the configure test will
automatically append that to LTLIBINTL if it is necessary to link with
libiconv.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538
2020-07-10 14:26:12 -07:00
Arvind Sankar 4d61ade1a3 Clean up lib dependencies
libzutil is currently statically linked into libzfs, libzfs_core and
libzpool. Avoid the unnecessary duplication by removing it from libzfs
and libzpool, and adding libzfs_core to libzpool.

Remove a few unnecessary dependencies:
- libuutil from libzfs_core
- libtirpc from libspl
- keep only libcrypto in libzfs, as we don't use any functions from
  libssl
- librt is only used for clock_gettime, however on modern systems that's
  in libc rather than librt. Add a configure check to see if we actually
  need librt
- libdl from raidz_test

Add a few missing dependencies:
- zlib to libefi and libzfs
- libuuid to zpool, and libuuid and libudev to zed
- libnvpair uses assertions, so add assert.c to provide aok and
  libspl_assertf

Sort the LDADD for programs so that libraries that satisfy dependencies
come at the end rather than the beginning of the linker command line.

Revamp the configure tests for libaries to use FIND_SYSTEM_LIBRARY
instead. This can take advantage of pkg-config, and it also avoids
polluting LIBS.

List all the required dependencies in the pkgconfig files, and move the
one for libzfs_core into the latter's directory. Install pkgconfig files
in $(libdir)/pkgconfig on linux and $(prefix)/libdata/pkgconfig on
FreeBSD, instead of /usr/share/pkgconfig, as the more correct location
for library .pc files.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538
2020-07-10 14:26:00 -07:00
Arvind Sankar b6437ea41c Move libspl_assertf into .c file
Variadic functions cannot be inlined. libspl_assertf ends up being
duplicated in every file that uses it.

Fix this by moving the function into a new assert.c. Also move the
definition of aok into the new file instead of zone.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10538
2020-07-10 14:25:24 -07:00
George Amanakis 2054f35e56 Fix a persistent L2ARC bug in l2arc_write_done()
In case l2arc_write_done() handles a zio that was not successful check
that the list of log block pointers is not empty when restoring them
in the device header. Otherwise zero them out. In any case perform the
actual write updating the device header after the zio of
l2arc_write_buffers() completes as l2arc_write_done() may have touched
the memory holding the log block pointers in the device header.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10540 
Closes #10543
2020-07-10 14:10:03 -07:00
Gabriel A. Devenyi d2bce6d036 Add Makefile command to run checkbashisms on all /bin/sh scripts
Based on the shellcheck make target, add a target which checks
for violations of POSIX standards for shell scripts

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Closes #10513
2020-07-09 20:04:49 -07:00
Ryan Moeller a2ec738c75 ZTS: Make bc conditional use compatible with new BSD bc
FreeBSD recently replaced the GNU bc and dc in the base system with
BSD licensed versions.  They are supposed to be compatible with all
the features present in the GNU versions, but it turns out they are
picky about `if` statements having a corresponding `else`.  ZTS uses
`echo "if ($x > $y) 1" | bc` in a few places, which causes tests to
fail unexpectedly with the new bc.

Change the two expressions in ZTS to `if ($x > $y) 1 else 0` for
compatibility with the new BSD bc.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10551
2020-07-09 17:49:02 -07:00
Ryan Moeller 659f4008be libzfs: Make zfs_cmd_t initialization consistent, use zfs_ioctl
The clang version 8.0.1 shipped in FreeBSD 12.1-RELEASE also oddly
throws a warning that is treated as an error on the initialization of
the zc struct in zpool_nextboot.

The zpool_nextboot code from FreeBSD was not updated to use zfs_ioctl.

Switch ioctl to zfs_ioctl in and use {"\0"} to initialize the struct.
Do a consistency pass for zfs_cmd_t initialization.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10539
2020-07-09 17:47:12 -07:00
Ryan Moeller fb91f0367e Add zpool_nextboot, move zfs_jail to libzfs.h
FreeBSD has a zfsbootcfg command that wants zpool_nextboot in libzfs.

Add the function to FreeBSD's libzfs_compat.c, and while here move
the prototype for zfs_jail out of param.h in FreeBSD's SPL and into
libzfs.h under an ifdef for FreeBSD, where the prototype for
zpool_nextboot joins it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10524
2020-07-06 11:57:24 -07:00
Mark Johnston cd32b4f5b7 Fix a deadlock in the FreeBSD getpages VOP
FreeBSD has a per-page "busy" lock which is held when handling a page
fault on a mapped file.  This lock is also acquired when copying data
from the DMU to the page cache in zfs_write().  File range locks are
also acquired in both of these paths, in the opposite order with respect
to the busy lock.

In the getpages VOP, the range lock is only used to determine the extent
of optional read-ahead and read-behind operations.  To resolve the lock
order reversal, modify the getpages VOP to avoid blocking on the range
lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10519
2020-07-06 11:53:57 -07:00
Mark Johnston 6e00561712 Add a "try" operation for range locks
zfs_rangelock_tryenter() bails immediately instead of waiting for the
lock to become available.  This will be used to resolve a deadlock in
the FreeBSD page-in code.  No functional change intended.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #10519
2020-07-06 11:53:31 -07:00
winglq a4b0a74c7f Fix atomic_clear_long_excl wrong return
When clearing a bit, we should check whether that bit is 0.
Note atomic_clear_long_excl is not used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Liu Qing <winglq@gmail.com>
Closes #10526
2020-07-06 11:46:17 -07:00
Ryan Moeller 8a3d9186ba Update zfs_freebsd_need_inactive to fix mmapped writes
`zfs_freebsd_need_inactive` appears to been based on an unfinished
version of https://reviews.freebsd.org/D22130 which had a bug where
files written via mmap wouldn't actually persist.

Update the function to match the final version committed to FreeBSD.

Authored-by: Mateusz Guzik <mjg@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10527 
Closes #10528
2020-07-03 11:30:04 -07:00
Brian Behlendorf 9a49d3f3d3 Add device rebuild feature
The device_rebuild feature enables sequential reconstruction when
resilvering.  Mirror vdevs can be rebuilt in LBA order which may
more quickly restore redundancy depending on the pools average block
size, overall fragmentation and the performance characteristics
of the devices.  However, block checksums cannot be verified
as part of the rebuild thus a scrub is automatically started after
the sequential resilver completes.

The new '-s' option has been added to the `zpool attach` and
`zpool replace` command to request sequential reconstruction
instead of healing reconstruction when resilvering.

    zpool attach -s <pool> <existing vdev> <new vdev>
    zpool replace -s <pool> <old vdev> <new vdev>

The `zpool status` output has been updated to report the progress
of sequential resilvering in the same way as healing resilvering.
The one notable difference is that multiple sequential resilvers
may be in progress as long as they're operating on different
top-level vdevs.

The `zpool wait -t resilver` command was extended to wait on
sequential resilvers.  From this perspective they are no different
than healing resilvers.

Sequential resilvers cannot be supported for RAIDZ, but are
compatible with the dRAID feature being developed.

As part of this change the resilver_restart_* tests were moved
in to the functional/replacement directory.  Additionally, the
replacement tests were renamed and extended to verify both
resilvering and rebuilding.

Original-patch-by: Isaac Huang <he.huang@intel.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: John Poduska <jpoduska@datto.com>
Co-authored-by: Mark Maybee <mmaybee@cray.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10349
2020-07-03 11:05:50 -07:00
Matthew Macy 7ddb753d17 freebsd: changes necessary to coexist with dtrace in tree
Fix header conflicts when building zfs with openzfs as a vendor import.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10497
2020-07-01 09:10:08 -07:00
Harald van Dijk 22831636c8 configure fixes
a+=b is not supported by all shells. It is equivalent to a=${a}b, so
just rewrite it as that.

This also fixes commit 9ea6c3d3, which intended to only make the
definitions of _dracutdir, _udevdir, and _udevruledir conditional, but
actually ensured that _initconfdir no longer got defined if _dracutdir
was defined, and defined _udevdir to the value that should have been
used for _udevruledir.

This also fixes the fact that the checks introduced by commit 9ea6c3d3
could never work: ZFS_AC_PACKAGE was called before the configuration
options were processed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Harald van Dijk <harald@gigawatt.nl>
Closes #10518
2020-07-01 09:05:21 -07:00
Harald van Dijk 2ac6aa1176 pam: fix test usage in configure script
The standard test command does not support the == operator. Certain
shells, including bash, do support it, but in those shells it does
exactly the same thing as the standard = operator. Use that instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Harald van Dijk <harald@gigawatt.nl>
Closes #10509
2020-06-29 09:28:22 -07:00
Matthew Ahrens 3c42c9ed84 Clean up OS-specific ARC and kmem code
OS-specific code (e.g. under `module/os/linux`) does not need to share
its code structure with any other operating systems.  In particular, the
ARC and kmem code need not be similar to the code in illumos, because we
won't be syncing this OS-specific code between operating systems.  For
example, if/when illumos support is added to the common repo, we would
add a file `module/os/illumos/zfs/arc_os.c` for the illumos versions of
this code.

Therefore, we can simplify the code in the OS-specific ARC and kmem
routines.

These changes do not impact system behavior, they are purely code
cleanup.  The changes are:

Arenas are not used on Linux or FreeBSD (they are always `NULL`), so
`heap_arena`, `zio_arena`, and `zio_alloc_arena` can be removed, along
with code that uses them.

In `arc_available_memory()`:
 * `desfree` is unused, remove it
 * rename `freemem` to avoid conflict with pre-existing `#define`
 * remove checks related to arenas
 * use units of bytes, rather than converting from bytes to pages and
   then back to bytes

`SPL_KMEM_CACHE_REAP` is unused, remove it.

`skc_reap` is unused, remove it.

The `count` argument to `spl_kmem_cache_reap_now()` is unused, remove
it.

`vmem_size()` and associated type and macros are unused, remove them.

In `arc_memory_throttle()`, use a less confusing variable name to store
the result of `arc_free_memory()`.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10499
2020-06-29 09:01:07 -07:00
Arvind Sankar 94a2dca6a0 Mark phony targets as phony.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10506
2020-06-27 17:40:15 -07:00
Arvind Sankar 2b5f3045ae paxcheck needs to run against the builddir not the srcdir
Otherwise it does nothing on an out-of-tree build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10506
2020-06-27 17:40:15 -07:00
Arvind Sankar c79907f901 Move cppcheck suppressions out of .github
Move this file out of .github and add it to distribution.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10506
2020-06-27 17:40:15 -07:00
Arvind Sankar a86f0be8ef Remove config/suppressed-warnings.txt
This doesn't appear to be used by the buildbot any more, let's remove
it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10506
2020-06-27 17:40:15 -07:00
Arvind Sankar 4d8e68c42f Avoid installing kernel headers on FreeBSD
The kernel headers are installed for DKMS on linux, so don't install
them unless we're building on linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10506
2020-06-27 17:40:14 -07:00
Brian Behlendorf 67b1362f04 Style fixes
* Fix cstyle issue in shrinker.h which exceeded 80 columns.
* Silence shellcheck warning in zpool.d/smart script.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-06-27 17:38:55 -07:00
Allan Jude 3bc92b9ef6 Make zstreamdump output the size of the payload for BEGIN records
This is helpful for determining the size of the nvlist of snapshots
and properties

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10505
2020-06-27 10:29:47 -07:00
Matthew Ahrens 270ece24b6 Revise SPL wrapper for shrinker callbacks
The SPL provides a wrapper for the kernel's shrinker callbacks, which
enables the ZFS code to interface with multiple versions of the shrinker
API's from different kernel versions.  Specifically, Linux kernels 3.0 -
3.11 has a single "combined" callback, and Linux kernels 3.12 and later
have two "split" callbacks.  The SPL provides a wrapper function so that
the ZFS code only needs to implement one version of the callbacks.

Currently the SPL's wrappers are designed such that the ZFS code
implements the older, "combined" callback.  There are a few downsides to
this approach:

* The general design within ZFS is for the latest Linux kernel to be
considered the "first class" API.

* The newer, "split" callback API is easier to understand, because each
callback has one purpose.

* The current wrappers do not completely abstract out the differing
API's, so ZFS code needs `#ifdef` code to handle the differing return
values required for different kernel versions.

This commit addresses these drawbacks by having the ZFS code provide the
latest, "split" callbacks, and the SPL provides a wrapping function for
the older, "combined" API.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10502
2020-06-27 10:27:02 -07:00
Serapheim Dimitropoulos ec1fea4516 Use percpu_counter for obj_alloc counter of Linux-backed caches
A previous commit enabled the tracking of object allocations
in Linux-backed caches from the SPL layer for debuggability.
The commit is: 9a170fc6fe54f1e852b6c39630fe5ef2bbd97c16

Unfortunately, it also introduced minor performance regressions
that were highlighted by the ZFS perf test-suite. Within Delphix
we found that the regression would be from -1%, all the way up
to -8% for some workloads.

This commit brings performance back up to par by creating a
separate counter for those caches and making it a percpu in
order to avoid lock-contention.

The initial performance testing was done by myself, and the
final round was conducted by @tonynguien who was also the one
that discovered the regression and highlighted the culprit.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #10397
2020-06-26 18:06:50 -07:00
Matthew Ahrens 7b232e9354 arcstat: add 'avail', fix 'free'
The meaning of the `free` field is currently `zfs_arc_sys_free`, which
is the target amount of memory to leave free for the system, and is
constant after booting.

This commit changes the meaning of `free` to arc_free_memory(), the
amount of memory that the ARC considers to be free.

It also adds a new arcstat field `avail`, which tracks
`arc_available_memory()`.

Since `avail` can be negative, it also updates the arcstat script to
pretty-print negative values.

example output:

  $ arcstat -f time,miss,arcsz,c,grow,need,free,avail 1
      time  miss  arcsz     c  grow  need  free  avail
  15:03:02   39K   114G  114G     0     0  2.4G  407M
  15:03:03   42K   114G  114G     0     0  2.1G  120M
  15:03:04   40K   114G  114G     0     0  1.8G  -177M
  15:03:05   24K   113G  112G     0     0  1.7G  -269M
  15:03:06   29K   111G  110G     0     0  1.6G  -385M
  15:03:07   27K   110G  108G     0     0  1.4G  -535M
  15:03:08   13K   108G  108G     0     0  2.2G  239M
  15:03:09   33K   107G  107G     0     0  1.3G  -639M
  15:03:10   16K   105G  102G     0     0  2.6G  704M
  15:03:11  7.2K   102G  102G     0     0  5.1G  3.1G
  15:03:12   42K   103G  102G     0     0  4.8G  2.8G

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10494
2020-06-26 18:05:28 -07:00
Robert Novak bfcbec6f5d Add block histogram to zdb
The block histogram tracks the changes to psize, lsize and asize
both in the count of the number of blocks (by blocksize) and the
total length of all of the blocks for that blocksize.  It also
keeps a running total of the cumulative size of all of the blocks
up to each size to help determine the size of caching SSDs to be
added to zfs hardware deployments.

The block history counts and lengths are summarized in bins
which are powers of two. Even rows with counts of zero are printed.

This change is accessed by specifying one of two options:

zdb -bbb pool
zdb -Pbbb pool

The first version prints the table in fixed size columns.
The second prints in "parseable" output that can be placed into
a CSV file.

Fixed Column, nicenum output sample:
  block   psize                lsize                asize
   size   Count Length   Cum.  Count Length   Cum.  Count Length   Cum.
    512:  3.50K  1.75M  1.75M  3.43K  1.71M  1.71M  3.41K  1.71M  1.71M
     1K:  3.65K  3.67M  5.43M  3.43K  3.44M  5.15M  3.50K  3.51M  5.22M
     2K:  3.45K  6.92M  12.3M  3.41K  6.83M  12.0M  3.59K  7.26M  12.5M
     4K:  3.44K  13.8M  26.1M  3.43K  13.7M  25.7M  3.49K  14.1M  26.6M
     8K:  3.42K  27.3M  53.5M  3.41K  27.3M  53.0M  3.44K  27.6M  54.2M
    16K:  3.43K  54.9M   108M  3.50K  56.1M   109M  3.42K  54.7M   109M
    32K:  3.44K   110M   219M  3.41K   109M   218M  3.43K   110M   219M
    64K:  3.41K   218M   437M  3.41K   218M   437M  3.44K   221M   439M
   128K:  3.41K   437M   874M  3.70K   474M   911M  3.41K   437M   876M
   256K:  3.41K   874M  1.71G  3.41K   874M  1.74G  3.41K   874M  1.71G
   512K:  3.41K  1.71G  3.41G  3.41K  1.71G  3.45G  3.41K  1.71G  3.42G
     1M:  3.41K  3.41G  6.82G  3.41K  3.41G  6.86G  3.41K  3.41G  6.83G
     2M:      0      0  6.82G      0      0  6.86G      0      0  6.83G
     4M:      0      0  6.82G      0      0  6.86G      0      0  6.83G
     8M:      0      0  6.82G      0      0  6.86G      0      0  6.83G
    16M:      0      0  6.82G      0      0  6.86G      0      0  6.83G

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Robert E. Novak <novak5@llnl.gov>
Closes: #9158 
Closes #10315
2020-06-26 15:09:20 -07:00
Arvind Sankar 6b99fc0620 Fixes for make dist
Reduce the usage of EXTRA_DIST. If files are conditionally included in
_SOURCES, _HEADERS etc, automake is smart enough to dist all files that
could possibly be included, but this does not apply to EXTRA_DIST,
resulting in make dist depending on the configuration.

Add some files that were missing altogether in various Makefile's.

The changes to disted files in this commit (excluding deleted files):

+./cmd/zed/agents/README.md
+./etc/init.d/README.md
+./lib/libspl/os/freebsd/getexecname.c
+./lib/libspl/os/freebsd/gethostid.c
+./lib/libspl/os/freebsd/getmntany.c
+./lib/libspl/os/freebsd/mnttab.c
-./lib/libzfs/libzfs_core.pc
-./lib/libzfs/libzfs.pc
+./lib/libzfs/os/freebsd/libzfs_compat.c
+./lib/libzfs/os/freebsd/libzfs_fsshare.c
+./lib/libzfs/os/freebsd/libzfs_ioctl_compat.c
+./lib/libzfs/os/freebsd/libzfs_zmount.c
+./lib/libzutil/os/freebsd/zutil_compat.c
+./lib/libzutil/os/freebsd/zutil_device_path_os.c
+./lib/libzutil/os/freebsd/zutil_import_os.c
+./module/lua/README.zfs
+./module/os/linux/spl/README.md
+./tests/README.md
+./tests/zfs-tests/tests/functional/cli_root/zfs_clone/zfs_clone_rm_nested.ksh
+./tests/zfs-tests/tests/functional/cli_root/zfs_send/zfs_send_encrypted_unloaded.ksh
+./tests/zfs-tests/tests/functional/inheritance/README.config
+./tests/zfs-tests/tests/functional/inheritance/README.state
+./tests/zfs-tests/tests/functional/rsend/rsend_016_neg.ksh
+./tests/zfs-tests/tests/perf/fio/sequential_readwrite.fio

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10501
2020-06-26 14:20:02 -07:00
Arvind Sankar 7c902a5178 Include FreeBSD sources in module dist
Add os/freebsd and Makefile.bsd into distdir target.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10501
2020-06-26 14:19:35 -07:00
Matthew Ahrens 67c0f0dedc ARC shrinking blocks reads/writes
ZFS registers a memory hook, `__arc_shrinker_func`, which is supposed to
allow the ARC to shrink when the kernel experiences memory pressure.
The ARC shrinker changes `arc_c` via a call to
`arc_reduce_target_size()`.  Before commit 3ec34e5527, the ARC
shrinker would also evict data from the ARC to bring `arc_size` down to
the new `arc_c`.  However, that commit (seemingly inadvertently) made it
so that the ARC shrinker no longer evicts any data or waits for eviction
to complete.

Repeated calls to the ARC shrinker can reduce `arc_c` drastically, often
all the way to `arc_c_min`.  Since it doesn't wait for the actual
eviction of data from the ARC, this creates a situation where `arc_size`
is more than `arc_c` for the several seconds/minutes it takes for
`arc_adjust_zthr` to evict data from the ARC.  During this time,
arc_get_data_impl() will block, so ZFS can't process read/write requests
(e.g. from iSCSI, NFS, or read/write syscalls).

To ensure that `arc_c` doesn't shrink faster than the adjust thread can
keep up, this commit makes the ARC shrinker wait for the eviction to
complete, resulting in similar behavior to what we had before commit
3ec34e5527.

Note: commit 3ec34e5527 is `OpenZFS 9284 - arc_reclaim_thread
has 2 jobs` and was integrated in December 2018, and is part of ZoL
0.8.x but not 0.7.x.

Additionally, when the ARC size is reduced drastically, the
`arc_adjust_zthr` can be on-CPU for many seconds without blocking.  Any
threads that are bound to the same CPU that arc_adjust_zthr is running
on will not able to run for a long time.

To ensure that CPU-bound threads can make progress, this commit changes
`arc_evict_state_impl()` make a voluntary preemption call,
`cond_resched()`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-70703
Closes #10496
2020-06-26 10:42:27 -07:00
felixdoerre 221e67040f pam: implement a zfs_key pam module
Implements a pam module for automatically loading zfs encryption keys 
for home datasets. The pam module:

  - loads a zfs key and mounts the dataset when a session opens.
  - unmounts the dataset and unloads the key when the session closes.
  - when the user is logged on and changes the password, the module
    changes the encryption key.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: @jengelh <jengelh@inai.de>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Felix Dörre <felix@dogcraft.de>
Closes #9886
Closes #9903
2020-06-24 18:45:44 -07:00
Arvind Sankar 7513807320 Drop unnecessary srcdir paths
There's no need to specify the srcdir explicitly in _HEADERS and
EXTRA_DIST.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10493
2020-06-24 18:20:18 -07:00
Arvind Sankar 5ca349f95d Fix check for sed --in-place
The test added in commit
  4313a5b4c5 ("Detect if sed supports --in-place")
doesn't work at least on my system (autoconfig-2.69).

The issue is that SED has already been found and cached before this
function is evaluated, with the result that the test is completely
skipped.

...
checking for a sed that does not truncate output... /usr/bin/sed
...
checking for sed --in-place... (cached) /usr/bin/sed

The first test is executed by libtool.m4. This looks to have been around
in libtool for at least 15 years or so, not sure why this was not
encountered at the time of the original commit.

Fix this by caching the value of the ac_inplace flag rather than the
path to SED. Also use $SED and add AC_REQUIRE to ensure that we use the
sed that was located by the standard configure test.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10493
2020-06-24 18:19:59 -07:00
Arvind Sankar 9642beef2b Add missing third-party license files to dist
These license files were added in commit
  31b160f0a6 ("ICP: Improve AES-GCM performance")
but not added to the distributed files.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10493
2020-06-24 18:19:51 -07:00
Arvind Sankar 0b03254830 Fix tags targets in module/Makefile.in + cleanup
These targets look to have been copied from an automake-generated
Makefile.in, and can't work since none of the auto-generated automake
variables are defined here.

Moreover, ctags has been overridden in the top-level Makefile, so the
target is pointless anyway, and gtags is not a recursive target.

Fix cscopelist by moving it to the top-level Makefile as well, in line
with ctags and etags.

Also, add -a to ctags command as well, otherwise it won't work if more
than one xargs invocation takes place.

Add assembler files to ctags/etags, prune all dotted-dirs, and restrict
the find to files only.

Cleanup: add .PHONY to module/Makefile.in, and fix one recipe with a
missing continuation character.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10493
2020-06-24 18:19:44 -07:00
Arvind Sankar 2989d1012a Fix libspl/asm-generic/atomic for VPATH build
Currently, asm-generic/atomic.c is compiled into a .S file, with a
comment saying this is to simplify the upper-level Makefile.

However, this doesn't work properly with a VPATH build, which would
require better logic to deal with generated sources correctly.

It also doesn't seem more complex to just specify the .c/.S source file,
depending on the cpu, instead of only the source directory in
lib/libspl/Makefile.am, which eliminates the need to do the intermediate
compilation.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10493
2020-06-24 18:19:36 -07:00
Arvind Sankar 109d2c9310 Move zfs_gitrev.h to build directory
Currently an out-of-tree build does not work with read-only source
directory because zfs_gitrev.h can't be created. Move this file to the
build directory, which is more appropriate for a generated file, and
drop the dist-hook for zfs_gitrev.h. There is no need to distribute this
file since it will be regenerated as part of the compilation in any
case.

scripts/make_gitrev.sh tries to avoid updating zfs_gitrev.h if there has
been no change, however this doesn't cover the case when the source
directory is not in git: in that case zfs_gitrev.h gets overwritten even
though it's always "unknown". Simplify the logic to always write out a
new version of zfs_gitrev.h, compare against the old and overwrite only
if different. This is now simple enough to just include in the
Makefile, so drop the script.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10493
2020-06-24 18:19:28 -07:00
Arvind Sankar 33982eb24c Support out-of-tree kmod build on FreeBSD
If srcdir != builddir, pass down MAKEOBJDIR to the FreeBSD make to
support out-of-tree builds.

Also allow passing all the gmake options that FreeBSD make understands
to support useful flags like -k, -n, -q etc, and detect the number of
CPUs if -j was specified without an argument.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10493
2020-06-24 18:18:41 -07:00
Kevin P. Fleming f21de6883f Add trim_finish notify script for ZED
Allow users to configure notifications when TRIM operations are
completed on pools. Unlike resilver_finish and scrub_finish,
the trim_finish event is generated for each vdev in the pool
which was trimmed, so the script will generate a notification
for each one.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kevin P. Fleming <kevin@km6g.us>
Closes #10491
2020-06-24 16:57:13 -07:00
Ryan Moeller 9192f27c1d Add zfs_multihost_interval tunable handler for FreeBSD
This tunable required a handler to be implemented for
ZFS_MODULE_PARAM_CALL.

Add the handler so the tunable can be declared in common code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10490
2020-06-23 13:32:42 -07:00
Prawn 2451a55368 zfs -V: Print userland version even if kernel module not loaded
Running zfs -V when the modules are not loaded would currently 
result in the following output:

    zfs_version_kernel() failed: No such file or directory

Note the lack of userland version output.  Reorder the code to
ensure the userland version is printed even when the kmods
are not loaded.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #10483
2020-06-22 09:56:29 -07:00
Jorgen Lundman 68301ba20e zed additional features
This commit adds two features to zed, that macOS desires. The first
is that when you unload the kernel module, zed would enter into a
cpubusy loop calling zfs_events_next() repeatedly. We now look for
ENODEV, returned by kernel, so zed can exit gracefully.

Second feature is -I (idle) (alas -P persist was taken) is for the
deamon to;

1; if started without ZFS kernel module, stick around waiting for it.
2; if kernel module is unloaded, go back to 1.

This is due to daemons in macOS is started by launchctl, and is
expected to stick around.

Currently, the busy loop only exists when errno is ENODEV. This is
to ensure that functionality that upstream expects is not changed.
It did not care about errors before, and it still does not. (with the
exception of ENODEV).

However, it is probably better that all errors
(ERESTART notwithstanding) exits the loop, and the issues complaining
about zed taking all CPU will go away.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10476
2020-06-22 09:53:34 -07:00
Serapheim Dimitropoulos 42d8d1d66a Remove unnecessary terminology from error-injection in ztest
Rephrase error-injection comment in ztest to be more clear.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #10482
2020-06-22 09:48:36 -07:00
Matthew Ahrens 540493ba4f Clarify comments in config/*.m4, vdev_geom.c, zfs_allow_*.ksh
Rephrase comments to be more clear.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10481
2020-06-22 09:46:37 -07:00
Brian Behlendorf 745ace3f24 Update zts-report.py with additional tests
The following test cases may still occasionally fail and are being
added to the "maybe" list for Linux until they can be updated to be
entirely reliable.

  cli_root/zfs_rename/zfs_rename_002_pos.ksh
  cli_root/zpool_reopen/zpool_reopen_003_pos.ksh
  refreserv/refreserv_raidz

These 6 tests consistently fail only on Fedora 31+, the failures
are related to the kernel rescanning the partition table on loopback
devices which is no longer reliable unless partprobe is used.  In
order to enable the Fedora bot by default they are also being added
to the list until the tests can be updated.  Any significant regression
in functionality covered by these tests will still be detected by the
FreeBSD builders.

  alloc_class/alloc_class_009_pos
  alloc_class/alloc_class_010_pos
  cli_root/zpool_expand/zpool_expand_001_pos
  cli_root/zpool_expand/zpool_expand_005_pos
  rsend/rsend_007_pos
  rsend/rsend_010_pos
  rsend/rsend_011_pos
  snapshot/rollback_003_pos

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10489
2020-06-22 09:44:49 -07:00
Ryan Moeller 1c08fa8b5b Fix copy-paste error breaking FreeBSD head
Resolve the FreeBSD head build failure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10480
2020-06-19 15:12:34 -07:00
Andriy Gapon a8bd6dcf87 zfs allow/unallow should work with numeric uid/gid
And that should work even (especially) if there is no matching user or
group name.  The change is originally by Xin Lin <delphij@FreeBSD.org>.

Original-patch-by: Xin Li <delphij@FreeBSD.org>
Reviewed-by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed-by: Andy Stormont <astormont@racktopsystems.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Andriy Gapon <avg@FreeBSD.org>
Closes #9792 
Closes #10280
2020-06-19 10:38:43 -07:00
Ryan Moeller 2e6af52b2e Match new vfs_checkexp KPI in FreeBSD head
KPI changed in FreeBSD, update accordingly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10475
2020-06-18 13:45:36 -07:00
Arvind Sankar ae7b167a98 Enable -Wmissing-prototypes/-Wstrict-prototypes
Switch on warning flags to detect mismatch between declaration and
definition.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:21:53 -07:00
Arvind Sankar c0673571d0 Switch off -Wmissing-prototypes for libgcc math functions
spl-generic.c defines some of the libgcc integer library functions on
32-bit. Don't bother checking -Wmissing-prototypes since nothing should
directly call these functions from C code.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:21:46 -07:00
Arvind Sankar eebba5d8f4 Make Skein_{Get,Put}64_LSB_First inline functions
Turn the generic versions into inline functions and avoid
SKEIN_PORT_CODE trickery.

Also drop the PLATFORM_MUST_ALIGN check for using the fast bcopy
variants. bcopy doesn't assume alignment, and the userspace version is
currently different because the _ALIGNMENT_REQUIRED macro is only
defined by the kernelspace headers.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:21:38 -07:00
Arvind Sankar 0ce2de637b Add prototypes
Add prototypes/move prototypes to header files.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:21:32 -07:00
Arvind Sankar 60356b1a21 Add include files for prototypes
Include the header with prototypes in the file that provides definitions
as well, to catch any mismatch between prototype and definition.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:21:25 -07:00
Arvind Sankar c3fe42aabd Remove dead code
Delete unused functions.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:21:18 -07:00
Arvind Sankar 65c7cc49bf Mark functions as static
Mark functions used only in the same translation unit as static. This
only includes functions that do not have a prototype in a header file
either.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:20:38 -07:00
Arvind Sankar 1fa5c7af33 Cleanup libzpool/kernel.c
Commit
  ec21397127 ("async zvol minor node creation interferes with receive")
replaced zvol_create_minors with zvol_create_minor and
zvol_create_minors_recursive, changing the prototype at the same time.

However the stub functions in libzpool/kernel.c were defined with the
old prototype. As the definitions are empty, this doesn't cause any
runtime issues, but an LTO build shows warnings because of the
mismatched prototypes.

Commit
  a0bd735adb ("Add support for asynchronous zvol minor operations")
removed the real zvol_remove_minor, but for some reason added a stub
implementation in libzpool/kernel.c with no references. Delete this dead
code.

Commit
  196bee4cfd ("Remove deduplicated send/receive code")
removed zfs_onexit_del_cb and zfs_onexit_cb_data. Drop the stubs as
well.

Add zvol.h include to provide prototypes, and sort the include
directives.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10470
2020-06-18 12:18:49 -07:00
adilger f734301d22 linux: add basic fallocate(mode=0/2) compatibility
Implement semi-compatible functionality for mode=0 (preallocation)
and mode=FALLOC_FL_KEEP_SIZE (preallocation beyond EOF) for ZPL.

Since ZFS does COW and snapshots, preallocating blocks for a file
cannot guarantee that writes to the file will not run out of space.
Even if the first overwrite was guaranteed, it would not handle any
later overwrite of blocks due to COW, so strict compliance is futile.
Instead, make a best-effort check that at least enough free space is
currently available in the pool (with a bit of margin), then create
a sparse file of the requested size and continue on with life.

This does not handle all cases (e.g. several fallocate() calls before
writing into the files when the filesystem is nearly full), which
would require a more complex mechanism to be implemented, probably
based on a modified version of dmu_prealloc(), but is usable as-is.

A new module option zfs_fallocate_reserve_percent is used to control
the reserve margin for any single fallocate call.  By default, this
is 110% of the requested preallocation size, so an additional 10% of
available space is reserved for overhead to allow the application a
good chance of finishing the write when the fallocate() succeeds.
If the heuristics of this basic fallocate implementation are not
desirable, the old non-functional behavior of returning EOPNOTSUPP
for calls can be restored by setting zfs_fallocate_reserve_percent=0.

The parameter of zfs_statvfs() is changed to take an inode instead
of a dentry, since no dentry is available in zfs_fallocate_common().

A few tests from @behlendorf cover basic fallocate functionality.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Arshad Hussain <arshad.super@gmail.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Issue #326
Closes #10408
2020-06-18 11:22:11 -07:00
Jorgen Lundman d553fb9b9e Avoid adding new primitives in zpool wait
zpool wait brought in sem_init() and family, which is a primitive set
not previously used in Open ZFS. It also happens to be deprecated
on macOS. Replace with phtread API calls.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10468
2020-06-18 10:44:45 -07:00
Matthew Macy 8056a75672 Disambiguate condvar API contract
On Illumos callers of cv_timedwait and cv_timedwait_hires
can't distinguish between whether or not the cv was signaled
or the call timed out. Illumos handles this (for some definition
of handles) by calling cv_signal in the return path if we were
signaled but the return value indicates instead that we timed
out. This would make sense if it were possible to query the the
cv for its net signal disposition. However, this isn't possible
and, in spite of the fact that there are places in the code that
clearly take a different and incompatible path if a timeout value
is indicated, this distinction appears to be rather subtle to most
developers. This problem is further compounded by the fact that on
Linux, calling cv_signal in the return path wouldn't even do the
right thing unless there are other waiters.

Since it is possible for the caller to independently determine how
much time is remaining but it is not possible to query if the cv
was in fact signaled, prioritizing signalling over timeout seems
like a cleaner solution. In addition, judging from usage patterns
within the code itself, it is also less error prone.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10471
2020-06-18 10:17:50 -07:00
Matthew Macy 7564073ed6 Add abd_cache_reap_now for abd_chunk_cache users
Apparently missed in the initial port integration was
the need to reap the abd_chunk_cache on FreeBSD. This
change addresses that oversight.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10474
2020-06-17 21:44:13 -07:00
Jorgen Lundman 4458157bee zfs_ioctl: saved_poolname can be truncated
As it uses kmem_strdup() and kmem_strfree() which both rely on
strlen() being the same, but saved_poolname can be truncated causing:

SPL: kernel memory allocator:
buffer freed to wrong cache
SPL: buffer was allocated from kmem_alloc_16,
SPL: caller attempting free to kmem_alloc_8.
SPL: buffer=0xffffff90acc66a38  bufctl=0x0  cache: kmem_alloc_8

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10469
2020-06-17 14:30:03 -07:00
Alexander Motin 17ca30185a Set initial arc_c to arc_c_min instead of arc_c_max
For at least 15 years since OpenSolaris arc_c was set by default to
arc_c_max, later decreased under memory pressure.  I've noticed that
if arc_c was set high enough to cause memory pressure as considered
by ZFS, setting of arc_no_grow to TRUE in arc_reap_cb_check() makes
no effect until both arc_kmem_reap_soon() and delay(reap_retry_ms)
return.  All that time ZFS can continue increasing its effective ARC
size, causing more memory pressure, potentially up to the point when
OS low memory handler activates and reduces arc_c, requesting fast
reclamation of just allocated memory.

The problem seems to be more serious on FreeBSD and I guess Linux,
since neither of them implement/use asynchronous kmem reclamation,
so arc_kmem_reap_soon() can take more time.  On older FreeBSD 11 not
supporting multiple memory domains system with lots of RAM can get
completely unresponsive for minutes due to heavy lock congestion
between ARC reclamation and page daemon kmem reclamation threads.
With this change to more conservative arc_c value ARC stops growing
just it time and does not need later reclamation.

Also while there, since now growing arc_c is a more often situation,
use aggsum_upper_bound() instead of aggsum_compare() in arc_adapt()
to reduce lock congestion.  It is also getting in sync with code in
arc_get_data_impl().

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #10437
2020-06-17 14:27:04 -07:00
Joao Carlos Mendes Luis fccaea454c Merge bash_completions changes from upstream
The current bash_completion contrib code in openzfs is very old, and
some changes have been added since.

The original repo is at https://github.com/Aneurin/zfs-bash

I've been using the original @Aneurin code since my first deploy of ZoL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: João Carlos Mendes Luís <jonny@jonny.eng.br>
Closes #10456
2020-06-16 12:27:23 -07:00
Jorgen Lundman cd07d7c83f drr_begin: can't forward declare untagged struct
When compiling with Clang++ it does not allow for untagged structs, so
struct ddr_begin needs to be declared before the struct that uses it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10453
2020-06-16 11:57:04 -07:00
Ryan Moeller 86a0f49483 FreeBSD: Kernel module should depend on xdr not krpc after 1300092
Since https://reviews.freebsd.org/D24408 FreeBSD provides XDR functions
in the xdr module instead of krpc.

For FreeBSD 13, the MODULE_DEPEND should be changed to xdr

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10442 
Closes #10443
2020-06-16 11:47:04 -07:00
Jorgen Lundman d366c8fd7a Make struct vdev_disk_t be platform private
Linux defines different vdev_disk_t members to macOS, but they are
only used in vdev_disk.c so move the declaration there.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10452
2020-06-16 11:43:33 -07:00
Matthew Ahrens ba54b180a5 Remove refences to blacklist/whitelist
These terms reinforce the incorrect notion that black is bad and white
is good.

Replace this language with more specific terms which are also more clear
and don't rely on metaphor.  Specifically:

* When vdevs are specified on the command line, they are the "selected"
vdevs.

* Entries in /dev/ which should not be considered as possible disks are
"excluded" devices.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10457
2020-06-16 11:41:45 -07:00
Brian Atkinson 0a03495e3e Fixing ABD struct allocation for FreeBSD
In the event we are allocating a gang ABD in FreeBSD we are passing 0
to abd_alloc_struct(); however, this led to an allocation of ABD scatter
with 0 chunks. This left the gang ABD allocation 24 bytes smaller than
it should have been.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10431
2020-06-16 10:05:22 -07:00
Ryan Moeller c13facb9c4 Fix FreeBSD condvar semantics
We should return -1 instead of negative deltas, and 0 if signaled.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10460
2020-06-16 09:59:31 -07:00
Jorgen Lundman 883a40fff4 Add convenience wrappers for common uio usage
The macOS uio struct is opaque and the API must be used, this
makes the smallest changes to the code for all platforms.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10412
2020-06-14 10:09:55 -07:00
Jorgen Lundman 4f73576ea1 Upstream: zil_commit_waiter() can stall forever
On macOS clock_t is unsigned, so when cv_timedwait_hires() returns -1
we loop forever. The conditional was tweaked to ignore signedness.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10445
2020-06-14 10:08:21 -07:00
George Amanakis f2edc0078f Fix gcc10.1 truncation error
gcc10.1 complains with:

../../include/sys/dmu.h:373:24: error: ‘%s’ directive output may be
truncated writing up to 95 bytes into a region of size 75
[-Werror=format-truncation=]
  373 | #define DMU_POOL_DDT   "DDT-%s-%s-%s"
      |                        ^~~~~~~~~~~~~~
../../module/zfs/ddt.c:256:37: note: in expansion of macro
‘DMU_POOL_DDT’
  256 |  (void) snprintf(name, DDT_NAMELEN, DMU_POOL_DDT,
      |                                     ^~~~~~~~~~~~
../../include/sys/dmu.h:373:32: note: format string is defined here
  373 | #define DMU_POOL_DDT   "DDT-%s-%s-%s"
      |                                ^~
../../module/zfs/ddt.c:256:9: note: ‘snprintf’ output 7 or more bytes
(assuming 102) into a destination of size 80
  256 |  (void) snprintf(name, DDT_NAMELEN, DMU_POOL_DDT,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  257 |      zio_checksum_table[ddt->ddt_checksum].ci_name,
      |      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  258 |      ddt_ops[type]->ddt_op_name, ddt_class_name[class]);
      |      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Increasing DTT_NAMELEN fixes it.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10433
2020-06-13 11:02:00 -07:00
Ryan Moeller 499dccd69b FreeBSD: Don't require zeroing new locks before init
This has not shown to be of use enough to justify the inconvenience.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10449
2020-06-13 10:58:10 -07:00
Brian Atkinson e08b993396 Removing ZERO_PAGE abd_alloc_zero_scatter
For MIPS architectures on Linux the ZERO_PAGE macro references
empty_zero_page, which is exported as a GPL symbol. The call to
ZERO_PAGE in abd_alloc_zero_scatter has been removed and a single
zero'd page is now allocated for each of the pages in abd_zero_scatter
in the kernel ABD code path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10428
2020-06-10 17:54:11 -07:00
Grischa Zengel 059f7c20e3 man.8: Add bookmark to list of types
While checking bash_completion I missed bookmark as type.

```
# zfs get type zpool2#b
NAME      PROPERTY  VALUE     SOURCE
zpool2#b  type      bookmark  -
```

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Grischa Zengel <github.zfsonlinux@zengel.info>
Closes #10419
2020-06-10 17:53:07 -07:00
Grischa Zengel 2bc07c6dff bash_completion: add missing attributes
There a some attributes missing which are shown in man pages:
zfs list -t type
           A comma-separated list of types to display, where type is one of filesystem, snapshot, volume, *bookmark*, or all.  For example, specifying -t snapshot displays only snapshots.
zfs get -s source
           A comma-separated list of sources to display.  Those properties coming from a source other than those in this list are ignored.  Each source must be one of the following: local, default, inherited, temporary, *received*, and none.  The default value is all sources.
zfs get -t type
           A comma-separated list of types to display, where type is one of filesystem, snapshot, volume, bookmark, or all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Grischa Zengel <github.zfsonlinux@zengel.info>
Closes #10418
2020-06-10 17:51:15 -07:00
Matthew Ahrens f66434268c Remove unnecessary references to slavery
The horrible effects of human slavery continue to impact society.  The
casual use of the term "slave" in computer software is an unnecessary
reference to a painful human experience.

This commit removes all possible references to the term "slave".

Implementation notes:

The zpool.d/slaves script is renamed to dm-deps, which uses the same
terminology as `dmsetup deps`.

References to the `/sys/class/block/$dev/slaves` directory remain.  This
directory name is determined by the Linux kernel.  Although
`dmsetup deps` provides the same information, it unfortunately requires
elevated privileges, whereas the `/sys/...` directory is world-readable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10435
2020-06-10 17:07:59 -07:00
Ryan Moeller feff3f69fc Fixup "Avoid the GEOM topology lock recursion when autoexpanding a pool"
The patch was applied to vdev_geom_open instead of vdev_geom_close by
mistake.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10427
2020-06-10 11:05:15 -07:00
Arvind Sankar 66786f7943 Fix VPATH builds for user config
cmd/zpool and lib/libzutil Makefile's use -I., which won't work with a
VPATH build. Replace it with -I$(srcdir) instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10379
Closes #10421
2020-06-10 09:25:37 -07:00
Arvind Sankar 71504277ae Cleanup linux module kbuild files
The linux module can be built either as an external module, or compiled
into the kernel, using copy-builtin. The source and build directories
are slightly different between the two cases, and currently, compiling
into the kernel still refers to some files from the configured ZFS
source tree, instead of the copies inside the kernel source tree. There
is also duplication between copy-builtin, which creates a Kbuild file to
build ZFS inside the kernel tree, and the top-level module/Makefile.in.

Fix this by moving the list of modules and the CFLAGS settings into a
new module/Kbuild.in, which will be used by the kernel kbuild
infrastructure, and using KBUILD_EXTMOD to distinguish the two cases
within the Makefiles, in order to choose appropriate include
directories etc.

Module CFLAGS setting is simplified by using subdir-ccflags-y (available
since 2.6.30) to set them in the top-level Kbuild instead of each
individual module. The disabling of -Wunused-but-set-variable is removed
from the lua and zfs modules. The variable that the Makefile uses is
actually not defined, so this has no effect; and the warning has long
been disabled by the kernel Makefile itself.

The target_cpu definition in module/{zfs,zcommon} is removed as it was
replaced by use of CONFIG_SPARC64 in
  commit 70835c5b75 ("Unify target_cpu handling")

os/linux/{spl,zfs} are removed from obj-m, as they are not modules in
themselves, but are included by the Makefile in the spl and zfs module
directories. The vestigial Makefiles in os and os/linux are removed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10379
Closes #10421
2020-06-10 09:24:15 -07:00
Andrea Gelmini dd4bc569b9 Fix typos
Correct various typos in the comments and tests.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #10423
2020-06-09 21:24:09 -07:00
Matthew Ahrens 7bcb7f0840 File incorrectly zeroed when receiving incremental stream that toggles -L
Background:

By increasing the recordsize property above the default of 128KB, a
filesystem may have "large" blocks.  By default, a send stream of such a
filesystem does not contain large WRITE records, instead it decreases
objects' block sizes to 128KB and splits the large blocks into 128KB
blocks, allowing the large-block filesystem to be received by a system
that does not support the `large_blocks` feature.  A send stream
generated by `zfs send -L` (or `--large-block`) preserves the large
block size on the receiving system, by using large WRITE records.

When receiving an incremental send stream for a filesystem with large
blocks, if the send stream's -L flag was toggled, a bug is encountered
in which the file's contents are incorrectly zeroed out.  The contents
of any blocks that were not modified by this send stream will be lost.
"Toggled" means that the previous send used `-L`, but this incremental
does not use `-L` (-L to no-L); or that the previous send did not use
`-L`, but this incremental does use `-L` (no-L to -L).

Changes:

This commit addresses the problem with several changes to the semantics
of zfs send/receive:

1. "-L to no-L" incrementals are rejected.  If the previous send used
`-L`, but this incremental does not use `-L`, the `zfs receive` will
fail with this error message:

    incremental send stream requires -L (--large-block), to match
    previous receive.

2. "no-L to -L" incrementals are handled correctly, preserving the
smaller (128KB) block size of any already-received files that used large
blocks on the sending system but were split by `zfs send` without the
`-L` flag.

3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`.
This feature indicates that we can correctly handle "no-L to -L"
incrementals.  This flag is currently not set on any send streams.  In
the future, we intend for incremental send streams of snapshots that
have large blocks to use `-L` by default, and these streams will also
have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams
from the default use of `zfs send` won't encounter the bug mentioned
above, because they can't be received by software with the bug.

Implementation notes:

To facilitate accessing the ZPL's generation number,
`zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and
restructured to fill in a struct with ZPL-specific info including owner
and generation.

In the "no-L to -L" case, if this is a compressed send stream (from
`zfs send -cL`), large WRITE records that are being written to small
(128KB) blocksize files need to be decompressed so that they can be
written split up into multiple blocks.  The zio pipeline will recompress
each smaller block individually.

A new test case, `send-L_toggle`, is added, which tests the "no-L to -L"
case and verifies that we get an error for the "-L to no-L" case.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #6224 
Closes #10383
2020-06-09 10:41:01 -07:00
Igor K 6722be2823 ZTS: Fix add-o_ashift.ksh
Use option '-o' after action for compatibility

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #10426
2020-06-09 10:31:16 -07:00
George Amanakis b7654bd794 Trim L2ARC
The l2arc_evict() function is responsible for evicting buffers which
reference the next bytes of the L2ARC device to be overwritten. Teach
this function to additionally TRIM that vdev space before it is
overwritten if the device has been filled with data. This is done by
vdev_trim_simple() which trims by issuing a new type of TRIM,
TRIM_TYPE_SIMPLE.

We also implement a "Trim Ahead" feature. It is a zfs module parameter,
expressed in % of the current write size. This trims ahead of the
current write size. A minimum of 64MB will be trimmed. The default is 0
which disables TRIM on L2ARC as it can put significant stress to
underlying storage devices. To enable TRIM on L2ARC we set
l2arc_trim_ahead > 0.

We also implement TRIM of the whole cache device upon addition to a
pool, pool creation or when the header of the device is invalid upon
importing a pool or onlining a cache device. This is dependent on
l2arc_trim_ahead > 0. TRIM of the whole device is done with
TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t.
We save the TRIM state for the whole device and the time of completion
on-disk in the header, and restore these upon L2ARC rebuild so that
zpool status -t can correctly report them. Whole device TRIM is done
asynchronously so that the user can export of the pool or remove the
cache device while it is trimming (ie if it is too slow).

We do not TRIM the whole device if persistent L2ARC has been disabled by
l2arc_rebuild_enabled = 0 because we may not want to lose all cached
buffers (eg we may want to import the pool with
l2arc_rebuild_enabled = 0 only once because of memory pressure). If
persistent L2ARC has been disabled by setting the module parameter
l2arc_rebuild_blocks_min_l2size to a value greater than the size of the
cache device then the whole device is trimmed upon creation or import of
a pool if l2arc_trim_ahead > 0.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #9713
Closes #9789 
Closes #10224
2020-06-09 10:15:08 -07:00
Michael Niewöhner 32f26eaa70 Move GFP flags kernel compatibility code
Move the GFP flags kernel compat code from c file to kmem header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #10424
2020-06-08 16:33:46 -07:00
Michael Niewöhner 080102a1b6 Linux 5.8 compat: __vmalloc()
The `pgprot` argument has been removed from `__vmalloc` in Linux 5.8,
being `PAGE_KERNEL` always now [1].

Detect this during configure and define a wrapper for older kernels.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/mm/vmalloc.c?h=next-20200605&id=88dca4ca5a93d2c09e5bbc6a62fbfc3af83c4fca

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #10422
2020-06-08 16:32:02 -07:00
Pawel Jakub Dawidek 529246df96 Restore support for in-kernel ZFS ioctls
In Illumos it is possible to call ioctl functions from within the
kernel by passing the FKIOCTL flag. Neither FreeBSD nor Linux support
that, but it doesn't hurt to keep it around, as all the code is there.

Before this commit it was a dead code and zc_iflags was always zero.
Restore this functionality by allowing to pass a flag to the
zfsdev_ioctl_common() function.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #10417
2020-06-08 13:57:22 -07:00
Pawel Jakub Dawidek 77b998fa70 Remove redundant includes
By removing excessive includes it takes us a small step close to
compiling this file in userland.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #10415
2020-06-08 09:57:36 -07:00
Paul Dagnelie b2f3709c3e Don't erase final byte of envblock
When we copy the envblock's contents out, we currently treat it as 
a normal C string. However, this functionality is supposed to more
closely emulate interacting with a file. As a consequence, we were 
incorrectly truncating the contents of the envblock by replacing 
the final byte of the buffer with a null character.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10405
2020-06-08 08:58:13 -07:00
Jorgen Lundman c9e319faae Replace sprintf()->snprintf() and strcpy()->strlcpy()
The strcpy() and sprintf() functions are deprecated on some platforms.
Care is needed to ensure correct size is used.  If some platforms
miss snprintf, we can add a #define to sprintf, likewise strlcpy().

The biggest change is adding a size parameter to zfs_id_to_fuidstr().

The various *_impl_get() functions are only used on linux and have
not yet been updated.

Reviewed by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10400
2020-06-07 11:42:12 -07:00
Ryan Moeller 60265072e0 Improve compatibility with C++ consumers
C++ is a little picky about not using keywords for names, or string
constness.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10409
2020-06-06 12:54:04 -07:00
Brian Behlendorf c1f3de18a4 ztest: Fix spa_open() ENOENT failures
The pool may not be imported when the previous pass is terminated.
In which case, spa_open() will return ENOENT to indicate the pool
is not currently imported.  Refactor to code slightly to handle
this case by importing the pool and then retrying the spa_open().

The ztest_import() function was moved before ztest_run() and the
import logic split in to a small internal helper function.  The
ztest_freeze() function was also moved but no changes were made.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10407
2020-06-06 12:51:35 -07:00
alaviss 13dd63ff81 mkfile: include missing headers
Without these headers, compilation fails on musl libc with offset_t
being undeclared and MIN being implictly declared.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes #10406
2020-06-05 17:22:10 -07:00
Brian Behlendorf 63761a8f1a zfsvfs_setup(): zap_stats_t may have undefined content when accessed (#10398)
Signed-off-by: Allan Jude <allanjude@klarasystems.com>

Co-authored-by: Allan Jude <allanjude@klarasystems.com>
2020-06-05 17:21:04 -07:00
Allan Jude 4547fc4e07 Connect dataset_kstats for FreeBSD
Expand the FreeBSD spl for kstats to support all current types

Move the dataset_kstats_t back to zvol_state_t from zfs_state_os_t
now that it is common once again

```
kstat.zfs/mypool.dataset.objset-0x10b.nunlinked: 0
kstat.zfs/mypool.dataset.objset-0x10b.nunlinks: 0
kstat.zfs/mypool.dataset.objset-0x10b.nread: 150528
kstat.zfs/mypool.dataset.objset-0x10b.reads: 48
kstat.zfs/mypool.dataset.objset-0x10b.nwritten: 134217728
kstat.zfs/mypool.dataset.objset-0x10b.writes: 1024
kstat.zfs/mypool.dataset.objset-0x10b.dataset_name: mypool/datasetname
```

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #10386
2020-06-05 17:17:02 -07:00
Paul Dagnelie 99b281f1ae Fix double mutex_init bug in send code
It was possible to cause a kernel panic in the send code by 
initializing an already-initialized mutex, if a record was created 
with type DATA, destroyed with a different type (bypassing the 
mutex_destroy call) and then re-allocated as a DATA record again.

We tweak the logic to not change the type of a record once it has 
been created, avoiding the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10374
2020-06-03 19:53:21 -07:00
George Melikov 52998c7f36 Update wiki links with new address
Use direct links to new documentation resource.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #10395
2020-06-03 19:46:31 -07:00
Allan Jude 7da304bbce zfsvfs_setup(): zap_stats_t may have undefined content when accessed
Signed-off-by: Allan Jude <allanjude@klarasystems.com>
2020-06-03 22:20:27 +00:00
Ryan Moeller c0eb5c35ef FreeBSD: Simplify zvol and fix locking
zvol_geom_bio_strategy should handle its own use of the zvol
suspend reader lock and ensure the zilog exists when needed.

A few other places using the zvol zilog should use the suspend
reader lock as well.

Simplify consumers of zvol_geom_bio_strategy, fix the locking, and
while in here, use the boolean_t constants with doread.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10381
2020-06-03 10:45:12 -07:00
Ryan Moeller a9dcfac51c Periodically update ARC kstats
FreeBSD needs arc_adjust_zthr to run periodically for kstats to be
updated.  A comment in the code suggests this may have been the
original intent in illumos as well:

https://github.com/openzfs/zfs/blob/c946d5a91329b075fb9bda1ac703a2e85139cf1c/module/zfs/arc.c#L4697-L4700

Create the thread with a 1 second timer.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10371
2020-06-03 09:52:38 -07:00
Jorgen Lundman 081de9a86d Restore avl_update() calls and related functions
The macOS kmem implementation uses avl_update() and related
functions.  These same function exist in the Solaris AVL code but
were removed because they were unused.  Restore them.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10390
2020-06-03 09:49:32 -07:00
Matthew Macy 3bf3b164ee Fix crypto build on FreeBSD HEAD
Update API usage to reflect recent change.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10384
2020-05-30 12:54:57 -07:00
gregory-lee-bartholomew 9052e3d70b Add bootfs.snapshot and bootfs.rollback kernel parameters
Unlike other filesystems, snapshots and rollbacks of bootfs need to be
done from a rescue environment. This patch makes it possible to snap-
shot or rollback the bootfs simply by specifying bootfs.snapshot or
bootfs.rollback on the kernel command line. The operation will be
performed by dracut just before bootfs is mounted.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com> 
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #10198
2020-05-29 21:16:08 -07:00
Brian Behlendorf 3d93161b01 ztest: Fix ztest_run_zdb() failure
It's possible for ztest to be killed while the pool is exported
which results in an empty cache file.  This is a valid state to
test, but the validation check performed by ztest_run_zdb()
depends on the pool being in the cache file.  If it's not the
following error is printed.

    zdb -bccsv -G -d -Y -U /tmp/zloop-run/zpool.cache ztest
    zdb: can't open '/tmp/zloop-run': No such file or directory

Resolve these failures by removing the dependency on the cache
file.  Functionally, we only care that the pool can be imported
and that the zdb verification passes.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10385
2020-05-29 21:14:10 -07:00
allen-4 4d829ad9c1 Update zfs-functions.in
The init.d zfs-share script does not perform the intended 
action without having a variable set for ZFS_SHARE and 
ZFS_UNSHARE

Assign default values to ZFS_SHARE and ZFS_UNSHARE. Export 
the environment variables after sourcing the configuration 
file.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Signed-off-by: Allen Holl <allen.m.holl@gmail.com>
Closes #10341 
Closes #10382
2020-05-29 12:01:57 -07:00
наб a1ba120927 Always use "%lld" for formatting time_ts
Given the following test program:
  #include <time.h>
  #include <stdio.h>
  #include <stdint.h>
  int main() {
    printf("time_t:    %d\n", sizeof(time_t));
    printf("long:      %d\n", sizeof(long));
    printf("long long: %d\n", sizeof(long long));
  }

These are output on various x86 architectures:
  x32$   time_t:    8
  x32$   long:      4
  x32$   long long: 8

  amd64$ time_t:    8
  amd64$ long:      8
  amd64$ long long: 8

  i386$  time_t:    4
  i386$  long:      4
  i386$  long long: 8

Therefore code using "%l[du]" to format time_ts produced warnings on x32

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@gmail.com>
Closes #10357
Closes #844
2020-05-28 10:29:58 -07:00
наб 6059f3a1f6 Correctly handle the x32 ABI
__x86_64__ && _ILP32 => don't forcibly define _LP64

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@gmail.com>
Closes #10357
Closes #844
2020-05-28 10:28:20 -07:00
John Gallagher 50ff632787 Rework error handling in zpool_trim()
When a manual trim is run against an entire pool, errors about
particular devices which don't support trim are suppressed. This changes
zpool_trim() in libzfs so that it doesn't return an error when the only
errors are suppressed ones. An exception is made when none of the
devices support trim, in which case an error is reported and a non-zero
status is returned.

This also fixes how the --wait flag works in the presence of suppressed
errors. In particular, suppressed errors no longer cause zpool_trim()
to skip the wait.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
Closes #10263 
Closes #10372
2020-05-27 17:27:28 -07:00
Georgy Yakovlev 40f96f4643 etc/zfs/Makefile.am: set initconfdir
The initconfdir variable is not defined in etc/zfs/Makefile,
so the sed code does not perform the correct replacement.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Issue #10375 
Closes #10376
2020-05-27 17:22:19 -07:00
Ryan Moeller 4f8b2356cc ZTS: Retry export/destroy when busy in zpool_import_012
It can take a moment for the NFS server to give up the mountpoint
after unsharing a filesystem.

Use log_must_busy to retry export/destroy a few times after switching
off sharenfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10380
2020-05-27 17:18:06 -07:00
Jorgen Lundman 70a5fc0530 Memory leak in dsl_destroy_snapshots_nvl error case
The dsl_destroy_snapshots_nvl() function has an early error out, 
and temporary nvlists were not freed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10366
2020-05-26 16:13:41 -07:00
Brian Behlendorf d1b84da8c1 Revert "Let zfs mount all tolerate in-progress mounts"
This reverts commit a9cd8bf which introduced a segfault when running
`zfs mount -a` multiple times when there are mountpoints which are
not empty.  This segfault is now seen frequently by the CI after
the mount code was updated to directly call mount(2).

The original reason this logic was added is described in #8881.
Since then the systemd `zfs-share.target` has been updated to run
"After" the `zfs-mount.server` which should avoid this issue.

Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9560
Closes #10364
2020-05-26 16:07:50 -07:00
Marcel Schilling ce98ed25de Fix dead links http://list.zfsonlinux.org
Originally, I wanted to point to directly to
https://zfsonlinux.topicbox.com/groups/zfs-discuss
as the text refers to that specific mailing list, but George Melikov
requested to change it to the general to give users the overview.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Marcel Schilling <marcel.schilling@uni-luebeck.de>
Closes #10367
Closes #10369
2020-05-26 15:09:25 -07:00
Brian Behlendorf c946d5a913 ZTS: Fix zfs_mount.kshlib cleanup
Update cleanup_filesystem to use destroy_dataset when performing
cleanup.  This ensures the destroy is retried if the pool is busy
preventing occasional failures.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10358
2020-05-23 17:13:42 -07:00
Brian Atkinson fb822260b1 Gang ABD Type
Adding the gang ABD type, which allows for linear and scatter ABDs to
be chained together into a single ABD.

This can be used to avoid doing memory copies to/from ABDs. An example
of this can be found in vdev_queue.c in the vdev_queue_aggregate()
function.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian <bwa@clemson.edu>
Co-authored-by: Mark Maybee <mmaybee@cray.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10069
2020-05-20 18:06:09 -07:00
felixdoerre 501a1511ae mount: use the mount syscall directly
Allow zfs datasets to be mounted on Linux without relying on the
invocation of an external processes.  This is the same behavior
which is implemented for FreeBSD.

Use of the libmount library was originally considered because it 
provides functionality to properly lock and update the /etc/mtab 
file.  However, these days /etc/mtab is typically a symlink to 
/proc/self/mounts so there's nothing to updated.  Therefore, we
call mount(2) directly and avoid any additional dependencies. 

If required the legacy behavior can be enabled by setting the 
ZFS_MOUNT_HELPER environment variable.  This may be needed in
environments where SELinux in enabled and the zfs binary does  
not have mount permission.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Felix Dörre <felix@dogcraft.de>
#10294
2020-05-20 18:02:41 -07:00
DeHackEd 57434abae6 Use boot_ncpus in place of max_ncpus in taskq_create
Due to hotplug support or BIOS bugs sometimes max_ncpus can be
an absurdly high value. I have a system with 32 cores/threads
but reports max_ncpus == 440. This many threads potentially
cripples the system during arc_prune floods for example.

boot_ncpus is the number of working CPUs when called so use
that instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #10282
2020-05-20 10:07:21 -07:00
Paul Dagnelie de4f06c275 Small program that converts a dataset id and an object id to a path
Small program that converts a dataset id and an object id to a path

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10204
2020-05-20 10:05:33 -07:00
Ryan Moeller 7b0e39030c freebsd: Correct the order of arguments to copyin() for Q_SETQUOTA
Sponsored by: DARPA
External-issue: https://reviews.freebsd.org/D24656
FreeBSD-commit: freebsd/freebsd@a431c095d3

Authored by: jhb <jhb@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10344
2020-05-19 16:45:25 -07:00
George Amanakis 7cd723e685 Fix gcc 10.1 stringop-truncation error
As we do not expect the destination of these strncpy calls to be NULL
terminated, substitute them with memcpy.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10346
2020-05-19 14:24:10 -07:00
Kyle Evans 5b090d57d4 freebsd: return EISDIR for read(2) on directories
This is arguably a change for internal consistency within OpenZFS, as the
Linux implementation will reject read(2) on directories with EISDIR. It's
not unreasonable for read(2) to do something here on FreeBSD, but we don't
currently copy out anything useful anyways so start rejecting it with the
appropriate error.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #10338
2020-05-16 10:12:01 -07:00
Ryan Moeller d2782af461 Fix ZVOL_DIR
We only use ZVOL_DIR on FreeBSD, and on FreeBSD it isn't correct.

Move the definition to the file where it is needed, and define it as
/dev/zvol/.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10337
2020-05-16 10:10:38 -07:00
ColMelvin 4d6043f2b7 RPM: Remove old versions of DKMS on upgrade
Due to a mismatch between the text and a regex looking for that text,
the `%preuninstall` script would never run the `dkms remove` command
necessary to avoid corrupting the DKMS data configuration.  Increase
regex specificity to avoid this issue.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Closes: #9891
Closes #10327
2020-05-14 20:51:33 -07:00
Matthew Ahrens 1b9cd1a9d9 Fix error handling in receive_writer_thread()
If `receive_writer_thread()` gets an error from `receive_process_record()`,
it should be saved in `rwa->err` so that we will stop processing records,
and the main thread will notice that the receive has failed.

When an error is first encountered, this happens correctly.  However, if
there are more records to dequeue, the next time through the loop we
will reset `rwa->err` to zero, allowing us to try to process the
following record (2 after the failed record).  Depending on what types
of records remain, we may incorrectly complete the receive
"successfully", but without actually having processed all the records.

The fix is to only set `rwa->err` if we got a *non-zero* error.

This bug was introduced by #10099 "Improve zfs receive performance by
batching writes".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10320
2020-05-14 20:48:29 -07:00
yparitcher cdcce2f019 Fix VN_OPEN_INVFS typo
The VN_OPEN_INVFS literal is in the wrong field.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: yparitcher <y@paritcher.com>
Closes #10322
2020-05-14 20:47:14 -07:00
Brian Behlendorf 2ade659eb4 Fix abd_enter/exit_critical wrappers
Commit fc551d7 introduced the wrappers abd_enter_critical() and
abd_exit_critical() to mark critical sections.  On Linux these are
implemented with the local_irq_save() and local_irq_restore() macros
which set the 'flags' argument when saving.  By wrapping them with
a function the local variable is no longer set by the macro and is
no longer properly restored.

Convert abd_enter_critical() and abd_exit_critical() to macros to
resolve this issue and ensure the flags are properly restored.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10332
2020-05-14 20:45:16 -07:00
Jorgen Lundman eeb8fae9c7 Upstream: add missing thread_exit()
Undo FreeBSD wrapper for thread_create() added to call thread_exit.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #10314
2020-05-14 15:58:09 -07:00
Matthew Ahrens 8b240f14f9 remove unneeded member drc_err of dmu_recv_cookie_t
The member drc_err of dmu_recv_cookie_t is used only locally in
receive_read, so we can replace it with a local variable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10319
2020-05-14 12:10:29 -07:00
Brian Behlendorf c87f958668 flake8 E741 variable name warning
Update the zts-report.py script to conform to the flake8 E741 rule.

    "Variables named I, O, and l can be very hard to read. This is
    because the letter I and the letter l are easily confused, and
    the letter O and the number 0 can be easily confused."

- https://www.flake8rules.com/rules/E741.html

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10323
2020-05-14 09:41:29 -07:00
John Wren Kennedy e1fcd940e7 ZTS: zpool_split_indirect deletes zfstest log file
The cleanup routine for this test attempts to remove some temporary
files with `rm -f $VDEV_*`, but VDEV_ is undefined. As a result, all
files in the current working directory (/var/tmp/test_results/current)
get removed instead. This includes the complete log file of all tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #10324
2020-05-14 09:39:47 -07:00
John Poduska 41035a0496 Resilver restarts unnecessarily when it encounters errors
When a resilver finishes, vdev_dtl_reassess is called to hopefully
excise DTL_MISSING (amongst other things). If there are errors during
the resilver, they are tracked in DTL_SCRUB, as spelled out in the
block comment in vdev.c. DTL_SCRUB is in-core only, so it can only
be used if the pool was online for the whole resilver. This state is
tracked with the spa_scrub_started flag, which only gets set when
the scan is initialized. Unfortunately, this flag gets cleared right
before vdev_dtl_reassess gets called, so if there are any errors
during the scan, DTL_MISSING will never get excised and the resilver
will just continually restart. This fix simply moves clearing that
flag until after the call to vdev_dtl_reasses.

In addition, if a pool is imported and already has scn_errors > 0,
this change will restart the resilver immediately instead of doing
the rest of the scan and then restarting it from the beginning. On
the other hand, if scn_errors == 0 at import, then no errors have
been encountered so far, so the spa_scrub_started flag can be safely
set.

A test has been added to verify that resilver does not restart when
relevant DTL's are available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #10291
2020-05-13 10:54:27 -07:00
AJ Jordan b29e31d80d Fix outdated comment header
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: AJ Jordan <alex@strugee.net>
Closes #10288
2020-05-11 16:23:16 -07:00
AJ Jordan 5a04b17717 Fix up arcstat(1) to match our version
Turns out the illumos manpage, which is what this originates from, was
written for the original Perl version of the utility which is not the
version in the OpenZFS tree. *That* version originates from a Python
rewrite that was done for FreeNAS. So fix up the manpage to match what
we actually ship (and fix a few typos in the process).

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: AJ Jordan <alex@strugee.net>
Closes #10288
2020-05-11 16:22:48 -07:00
AJ Jordan ac806a2559 Import the arcstat(1m) manpage from illumos
And move it from section 1m to section 1 for consistency.

Imported from illumos commit f34d737f.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: AJ Jordan <alex@strugee.net>
Closes #10288
2020-05-11 16:22:18 -07:00
AJ Jordan 2b21da4f76 Fix inconsistent capitalization in arcstat -v
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: AJ Jordan <alex@strugee.net>
Closes #10288
2020-05-11 16:21:08 -07:00
Richard Laager 7fcf82451c Change zfsunlock for better busybox compatibility
It turns out that there are two versions of Busybox, at least on Ubuntu
18.04.  If you have the busybox-static package installed, you get a
busybox that supports `ps a` and `head`.  If you only have
busybox-initramfs, you don't.  Either way, you have `awk`.

This change should also make this compatible with GNU ps, if you somehow
end up with that in the initramfs environment.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Andrey Prokopenko <job@terem.fr>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10307
2020-05-10 12:26:08 -07:00
Brian Atkinson fc551d7efb Combine OS-independent ABD Code into Common Source File
Reorganizing ABD code base so OS-independent ABD code has been placed
into a common abd.c file. OS-dependent ABD code has been left in each
OS's ABD source files, and these source files have been renamed to
abd_os.

The OS-independent ABD code is now under:
module/zfs/abd.c
With the OS-dependent code in:
module/os/linux/zfs/abd_os.c
module/os/freebsd/zfs/abd_os.c

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #10293
2020-05-10 12:23:52 -07:00
Petros Koutoupis bd95f00d4b Fixed LDADD library links in Makefiles for cross compilation builds
When building on native dev system, there are no issues but when
cross-compiling for target system, some linker errors are observed.
The only way to avoid these errors is by adjusting the Makefile.am
of those various components to add the library dependencies.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Petros Koutoupis <petros@petroskoutoupis.com>
Closes #10304
2020-05-09 10:17:08 -07:00
Brian Behlendorf d775c86dd4 ZTS: refreserv_005_pos.ksh
When recursively destroying the dataset it's possible for the
dataset volume to be open by an unrelated process, like blkid.
Use the destroy_dataset() which will retry when this occurs.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10305
2020-05-08 13:50:02 -07:00
Andrey Prokopenko 1cc635a2dd Unlock encrypted root partition over SSH
This commit add a new feature for Debian-based distributions to unlock
encrypted root partition over SSH.  This feature is very handy on
headless NAS or VPS cloud servers.  To use this feature, you will need
to install the dropbear-initramfs package.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Andrey Prokopenko <job@terem.fr>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10027
2020-05-07 16:41:16 -07:00
Richard Laager 746d22ee02 Rework README.initramfs.markdown
This file is listed as being in Markdown format, but it didn't really
use much Markdown.  I have added a fair amount of formatting.

I have reordered and reworded things to improve the flow of the text.

Reviewed-By: Andrey Prokopenko <job@terem.fr>
Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10027
2020-05-07 16:40:00 -07:00
Richard Laager 4cfd339ce4 Cleanup contrib/initramfs automake
The initramfs hook scripts depend on Makefile.  This way, if the
substitution code is changed, they should update.  This brings it in
line with etc/init.d (which was modified to match the example in the
automake docs).

The initramfs hook script cleaning now matches etc/init.d.

There was a mix of SUBDIRS recursion and custom install rules for files
in subdirectories.  This was duplicated for the "hooks" and "scripts"
subdirectories.  Now everything uses SUBDIRS.

I fixed the substitution of DEFAULT_INITCONF_DIR for hooks/zfs.

Reviewed-By: Andrey Prokopenko <job@terem.fr>
Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10027
2020-05-07 16:39:08 -07:00
George Amanakis 657fd33bcf Improvements on persistent L2ARC
Functional changes:

We implement refcounts of log blocks and their aligned size on the
cache device along with two corresponding arcstats. The refcounts are
reflected in the header of the device and provide valuable information
as to whether log blocks are accounted for correctly. These are
dynamically adjusted as log blocks are committed/evicted. zdb also uses
this information in the device header and compares it to the
corresponding values as reported by dump_l2arc_log_blocks() which
emulates l2arc_rebuild(). If the refcounts saved in the device header
report higher values, zdb exits with an error. For this feature to work
correctly there should be no active writes on the device. This is also
employed in the tests of persistent L2ARC. We extend the structure of
the cache device header by adding the two new variables mirroring the
refcounts after the existing variables to preserve backward
compatibility in terms of persistent L2ARC.

1) a new arcstat "l2_log_blk_asize" and refcount "l2ad_lb_asize" which
   reflect the total aligned size of log blocks on the device. This is
   also reflected in the header of the cache device as "dh_lb_asize".
2) a new arcstat "l2arc_log_blk_count" and refcount "l2ad_lb_count"
   which reflect the total number of L2ARC log blocks present on cache
   devices.  It is also reflected in the header of the cache device as
   "dh_lb_count".

In l2arc_rebuild_vdev() if the amount of committed log entries in a log
block is 0 and the device header is valid we update the device header.
This will facilitate trimming of the whole device in this case when
TRIM for L2ARC is implemented.

Improve loop protection in l2arc_rebuild() by using the starting offset
of the payload of each log block instead of the starting offset of the
log block.

If the zio in l2arc_write_buffers() fails, restore the lbps array in the
header of the device to its previous state in l2arc_write_done().

If l2arc_rebuild() ends the rebuild process without restoring any L2ARC
log blocks in ARC and without any other error, this means that the lbps
array in the header is pointing to non-existent or invalid log blocks.
Reset the device header in this case.

In l2arc_rebuild() change the zfs_dbgmsg messages to
spa_history_log_internal() making them user visible with zpool history
command.

Non-functional changes:

Make the first test in persistent L2ARC use `zdb -lll` to increase
coverage in `zdb.c`.

Rename psize with asize when referring to log blocks, since
L2ARC_SET_PSIZE stores the vdev aligned size for log blocks. Also
rename dh_log_blk_entries to dh_log_entries to make it clear that
it is a mirror of l2ad_log_entries. Added comments for both changes.

Fix inaccurate comments for example in l2arc_log_blk_restore().

Add asserts at the end in l2arc_evict() and l2arc_write_buffers().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10228
2020-05-07 16:34:03 -07:00
Paul Dagnelie 108a454a46 Add support for boot environment data to be stored in the label
Modern bootloaders leverage data stored in the root filesystem to 
enable some of their powerful features. GRUB specifically has a grubenv 
file which can store large amounts of configuration data that can be 
read and written at boot time and during normal operation. This allows 
sysadmins to configure useful features like automated failover after 
failed boot attempts. Unfortunately, due to the Copy-on-Write nature 
of ZFS, the standard behavior of these tools cannot handle writing to
ZFS files safely at boot time. We need an alternative way to store 
data that allows the bootloader to make changes to the data.

This work is very similar to work that was done on Illumos to enable 
similar functionality in the FreeBSD bootloader. This patch is different 
in that the data being stored is a raw grubenv file; this file can store 
arbitrary variables and values, and the scripting provided by grub is 
powerful enough that special structures are not required to implement 
advanced behavior.

We repurpose the second padding area in each label to store the grubenv 
file, protected by an embedded checksum. We add two ioctls to get and 
set this data, and libzfs_core and libzfs functions to access them more 
easily. There are no direct command line interfaces to these functions; 
these will be added directly to the bootloader utilities.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10009
2020-05-07 09:36:33 -07:00
Philip Pokorny a36bad1759 Fix column width calculation issue with certain terminal widths
If the reported terminal width is 0 or less than 42, the signed variable
width was set to a negative number that was then assigned to the
unsigned column width becoming a huge number.

Add comments and change logic to better explain what's happening.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Philip Pokorny <ppokorny@mindspring.com>
Closes #10247
2020-05-06 17:17:38 -07:00
George Amanakis 1b664952ae Enable splitting mirrors with indirect vdevs
When a top-level vdev is removed from a pool it is converted to an
indirect vdev. Until now splitting such mirrored pools was not possible
with zpool split. This patch enables handling of indirect vdevs and
splitting of those pools with zpool split.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10283
2020-05-06 10:32:28 -07:00
Ryan Moeller ddc7a2dd3b taskq: Don't leak system_delay_taskq on FreeBSD
Adds a missing taskq_destroy() call.

Reported by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10292
2020-05-05 09:36:41 -07:00
alaviss 3e5d41d807 config/kernel-inode-times: initialize timespec
Usage of this variable uninitialized triggers -Werror,-Wuninitialized
when compiled under clang for linux kernel 5.6, leading the build system
to believe that the function is not declared.

This commit initializes the variable to suppress the warning and fix the
build for kernel 5.6 with clang.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes #10279
Closes #10281
2020-05-04 15:25:48 -07:00
Ryan Moeller 6f3e1a4828 Avoid the GEOM topology lock recursion when autoexpanding a pool
The steps to reproduce the problem:

        mdconfig -a -t swap -s 3g -u 0
        gpart create -s GPT md0
        gpart add -t freebsd-zfs -s 1g md0
        zpool create -o autoexpand=on foo md0p1
        gpart resize -i 1 -s 2g md0

Authored by: pjd <pjd@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@bccd2db598

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10270
2020-05-04 15:10:41 -07:00
Ryan Moeller 639dfeb831 Update FreeBSD SPL atomics
Sync up with the following changes from FreeBSD:

ZFS: add emulation of atomic_swap_64 and atomic_load_64

Some 32-bit platforms do not provide 64-bit atomic operations that ZFS
requires, either in userland or at all.  We emulate those operations
for those platforms using a mutex.  That is not entirely correct and
it's very efficient.  Besides, the loads are plain loads, so torn
values are possible.

Nevertheless, the emulation seems to work for some definition of work.

This change adds atomic_swap_64, which is already used in ZFS code,
and atomic_load_64 that can be used to prevent torn reads.

Authored by: avg <avg@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@3458e5d1e6

cleanup of illumos compatibility atomics

atomic_cas_32 is implemented using atomic_fcmpset_32 on all platforms.
Ditto for atomic_cas_64 and atomic_fcmpset_64 on platforms that have
it.  The only exception is sparc64 that provides MD atomic_cas_32 and
atomic_cas_64.
This is slightly inefficient as fcmpset reports whether the operation
updated the target and that information is not needed for cas.
Nevertheless, there is less code to maintain and to add for new
platforms.  Also, the operations are done inline now as opposed to
function calls before.

atomic_add_64_nv is implemented using atomic_fetchadd_64 on platforms
that provide it.

casptr, cas32, atomic_or_8, atomic_or_8_nv are completely removed as
they have no users.

atomic_mtx that is used to emulate 64-bit atomics on platforms that
lack them is defined only on those platforms.

As a result, platform specific opensolaris_atomic.S files have lost
most of their code.  The only exception is i386 where the
compat+contrib code provides 64-bit atomics for userland use.  That
code assumes availability of cmpxchg8b instruction.  FreeBSD does not
have that assumption for i386 userland and does not provide 64-bit
atomics.  Hopefully, this can and will be fixed.

Authored by: avg <avg@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@e9642c209b

emulate illumos membar_producer with atomic_thread_fence_rel

membar_producer is supposed to be a store-store barrier.
Also, in the code that FreeBSD has ported from illumos membar_producer
is used only with regular stores to regular memory (with respect to
caching).

We do not have an MI primitive for the store-store barrier, so
atomic_thread_fence_rel is the closest we have as it provides
(load | store) -> store barrier.

Previously, membar_producer was an empty function call on all 32-bit
arm-s, 32-bit powerpc, riscv and all mips variants.  I think that it
was inadequate.
On other platforms, such as amd64, arm64, i386, powerpc64, sparc64,
membar_producer was implemented using stronger primitives than required
for a store-store barrier with respect to regular memory access.
For example, it used sfence on amd64 and lock-ed nop in i386 (despite
TSO).
On powerpc64 we now use recommended lwsync instead of eieio.
On sparc64 FreeBSD uses TSO mode.
On arm64/aarch64 we now use dmb sy instead of dmb ish.  Not sure if
this is an improvement, actually.

After this change we can drop opensolaris_atomic.S for aarch64, amd64,
powerpc64 and sparc64 as all required atomic operations have either
direct or light-weight mapping to FreeBSD native atomic operations.

Discussed with: kib
Authored by: avg <avg@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@50cdda62fc

fix up r353340, don't assume that fcmpset has strong semantics

fcmpset can have two kinds of semantics, weak and strong.
For practical purposes, strong semantics means that if fcmpset fails
then the reported current value is always different from the expected
value.  Weak semantics means that the reported current value may be the
same as the expected value even though fcmpset failed.  That's a so
called "sporadic" failure.

I originally implemented atomic_cas expecting strong semantics, but
many platforms actually have weak one.

Reported by:    pkubaj (not confirmed if same issue)
Discussed with: kib, mjg
Authored by: avg <avg@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@238787c74e

[PowerPC] [MIPS] Implement 32-bit kernel emulation of atomic64 operations

This is a lock-based emulation of 64-bit atomics for kernel use, split off
from an earlier patch by jhibbits.

This is needed to unblock future improvements that reduce the need for
locking on 64-bit platforms by using atomic updates.

The implementation allows for future integration with userland atomic64,
but as that implies going through sysarch for every use, the current
status quo of userland doing its own locking may be for the best.

Submitted by:   jhibbits (original patch), kevans (mips bits)
Reviewed by:    jhibbits, jeff, kevans
Authored by: bdragon <bdragon@FreeBSD.org>
Differential Revision:  https://reviews.freebsd.org/D22976
FreeBSD-commit: freebsd/freebsd@db39dab3a8

Remove sparc64 kernel support

Remove all sparc64 specific files
Remove all sparc64 ifdefs
Removee indireeect sparc64 ifdefs

Authored by: imp <imp@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@48b94864c5

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10250
2020-05-04 15:07:04 -07:00
Ryan Moeller 6ed4391da9 ZTS: Count CKSUM for all vdevs in verify_pool
The verify_pool function should detect checksum errors on any vdev, but
it was only checking at the root of the pool.

Accumulate the errors for all vdevs to obtain the correct count.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10271
2020-04-30 17:50:16 -07:00
Ryan Moeller 154e48eac9 zdb: Fix ignored zfs_arc_max tuning
Running zdb -l $disk shows a warning that zfs_arc_max is being ignored.
zdb sets zfs_arc_max below zfs_arc_min, which causes the value to be
ignored by arc_tuning_update().

Set zfs_arc_min to the bare minimum in zdb, which is below zfs_arc_max.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10269
2020-04-30 17:48:58 -07:00
Paul B. Henson 0aeb0bed6f OpenZFS 6765 - zfs_zaccess_delete() comments do not accurately
reflect delete permissions for ACLs

Authored by: Kevin Crowe <kevin.crowe@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Paul B. Henson <henson@acm.org>

Porting Notes:
* Only comments are updated

OpenZFS-issue: https://www.illumos.org/issues/6765
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/da412744bc
Closes #10266
2020-04-30 11:24:55 -07:00
Paul B. Henson 235a856576 OpenZFS 6762 - POSIX write should imply DELETE_CHILD on directories
- and some additional considerations

Authored by: Kevin Crowe <kevin.crowe@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Paul B. Henson <henson@acm.org>

OpenZFS-issue: https://www.illumos.org/issues/6762
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1eb4e906ec
Closes #10266
2020-04-30 11:24:38 -07:00
Paul B. Henson 99495ba6ab OpenZFS 8984 - fix for 6764 breaks ACL inheritance
Authored by: Dominik Hassler <hadfl@omniosce.org>
Reviewed by: Sam Zaydel <szaydel@racktopsystems.com>
Reviewed by: Paul B. Henson <henson@acm.org>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Ported-by: Paul B. Henson <henson@acm.org>

OpenZFS-issue: https://www.illumos.org/issues/8984
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/e9bacc6d1a
Closes #10266
2020-04-30 11:24:27 -07:00
Paul B. Henson 5a2f527d4b OpenZFS 6764 - zfs issues with inheritance flags during chmod(2)
with aclmode=passthrough

Authored by: Albert Lee <trisk@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Paul B. Henson <henson@acm.org>

OpenZFS-issue: https://www.illumos.org/issues/6764
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/de0f1ddb59
Closes #10266
2020-04-30 11:24:14 -07:00
Paul B. Henson 7bf3e1fa0f OpenZFS 3254 - add support in zfs for aclmode=restricted
Authored-by: Paul B. Henson <henson@acm.org>
Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Paul B. Henson <henson@acm.org>

OpenZFS-issue: https://www.illumos.org/issues/3254
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/71dbfc287c
Closes #10266
2020-04-30 11:23:59 -07:00
Paul B. Henson a1af567bb6 OpenZFS 742 - Resurrect the ZFS "aclmode" property OpenZFS 664 - Umask masking "deny" ACL entries OpenZFS 279 - Bug in the new ACL (post-PSARC/2010/029) semantics
Porting notes:
* Updated zfs_acl_chmod to take 'boolean_t isdir' as first parameter
  rather than 'zfsvfs_t *zfsvfs'
* zfs man pages changes mixed between zfs and new zfsprops man pages

Reviewed by: Aram Hvrneanu <aram@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Robert Gordon <rbg@openrbg.com>
Reviewed by: Mark.Maybee@oracle.com
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@nexenta.com>
Ported-by: Paul B. Henson <henson@acm.org>

OpenZFS-issue: https://www.illumos.org/issues/742
OpenZFS-issue: https://www.illumos.org/issues/664
OpenZFS-issue: https://www.illumos.org/issues/279
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a3c49ce110
Closes #10266
2020-04-30 11:22:45 -07:00
Adam D. Moss d7d4678fe6 Fix regression caused by c14ca14
The 'zfs load-key' command was broken for 'keyformat=passphrase'.
Use the correct output vars when stdin is an interactive terminal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: adam moss <c@yotes.com>
Closes #10264 
Closes #10265
2020-04-29 17:33:33 -07:00
Brian Behlendorf ab16c87e55 Add longjmp support for Thumb-2
When a Thumb-2 kernel is being used, then longjmp must be implemented
using the Thumb-2 instruction set in module/lua/setjmp/setjmp_arm.S.

Original-patch-by: @jsrlabs
Reviewed-by: @awehrfritz
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7408 
Closes #9957 
Closes #9967
2020-04-29 17:30:13 -07:00
Jason King c14ca1456e Support custom URI schemes for the keylocation property
Every platform has their own preferred methods for implementing URI 
schemes beyond the currently supported file scheme (e.g. 'https' on 
FreeBSD would likely use libfetch, while Linux distros and illumos
would probably use libcurl, etc). It would be helpful if libzfs can 
be extended to support additional schemes in a simple manner.

A table of (scheme, handler_function) pairs is added to libzfs_crypto.c, 
and the existing functions in libzfs_crypto.c so that when the key 
format is ZFS_KEYFORMAT_URI, the scheme from the URI string is 
extracted, and a matching handler it located in the aforementioned 
table (returning an error if no matching handler is found). The handler 
function is then invoked to retrieve the key material (in the format 
specified by the keyformat property) and the key is loaded or the 
handler can return an error to abort the key loading process.

Reviewed by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason King <jason.king@joyent.com>
Closes #10218
2020-04-28 10:55:18 -07:00
Sara Hartse 89a6610ed0 Add more sanity testing for zdb input args
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: sara hartse <sara.hartse@delphix.com>
Closes #10243
2020-04-28 09:56:31 -07:00
George Amanakis fa25460538 Add missing zfs_refcount_destroy() in key_mapping_rele()
Otherwise when running with reference_tracking_enable=TRUE mounting
and unmounting an encrypted dataset panics with:

Call Trace:
 dump_stack+0x66/0x90
 slab_err+0xcd/0xf2
 ? __kmalloc+0x174/0x260
 ? __kmem_cache_shutdown+0x158/0x240
 __kmem_cache_shutdown.cold+0x1d/0x115
 shutdown_cache+0x11/0x140
 kmem_cache_destroy+0x210/0x230
 spl_kmem_cache_destroy+0x122/0x3e0 [spl]
 zfs_refcount_fini+0x11/0x20 [zfs]
 spa_fini+0x4b/0x120 [zfs]
 zfs_kmod_fini+0x6b/0xa0 [zfs]
 _fini+0xa/0x68c [zfs]
 __x64_sys_delete_module+0x19c/0x2b0
 do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <tcaputi@datto.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10246
2020-04-28 09:53:45 -07:00
Ryan Moeller a8085184d6 Fix zlib leak on FreeBSD
zlib_inflateEnd was accidentally a wrapper for inflateInit instead of
inflateEnd, and hilarity ensues.

Fix the typo so we free memory instead of allocating more.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10225
Closes #10252
2020-04-28 09:14:30 -07:00
alex 47c9299fcc zfs_create: round up volume size to multiple of bs
Round up the volume size requested in `zfs create -V size` to the next
higher multiple of the volblocksize. Updates the man page and adds a
test to verify the new behavior.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: puffi <puffi@users.noreply.github.com>
Signed-off-by: Alex John <alex@stty.io>
Closes #8541 
Closes #10196
2020-04-24 19:04:34 -07:00
Tom Caputi aa646323db Fix missing ivset guid with resumed raw base recv
This patch corrects a bug introduced in 61152d1069. When
resuming a raw base receive, the dmu_recv code always sets
drc->drc_fromsnapobj to the object ID of the previous
snapshot. For incrementals, this is correct, but for base
sends, this should be left at 0. The presence of this ID
eventually allows a check to run which determines whether
or not the incoming stream and the previous snapshot have
matching IVset guids. This check fails becuase it is not
meant to run when there is no previous snapshot. When it
does fail, the user receives an error stating that the
incoming stream has the problem outlined in errata 4.

This patch corrects this issue by simply ensuring
drc->drc_fromsnapobj is left as 0 for base receives.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #10234 
Closes #10239
2020-04-24 19:00:32 -07:00
Brian Behlendorf 6de3e59bdd Fix unitialized variable in zstream redup command
Fix uninitialized variable in `zstream redup` command.  The compiler
may determine the 'stream_offset' variable can be uninitialized
because not all rdt_lookup() exit paths set it.  This should never
happen in practice as documented by the assert, but initialize it
regardless to resolve the warning.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10241
Closes #10244
2020-04-23 15:54:38 -07:00
Matthew Ahrens 5d4ed9614f change libspl list member names to match kernel
This aids in debugging, so that we can use the same infrastructure to
walk zfs's list_t in the kernel module and in the userland libraries
(e.g. when debugging ztest).

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10236
2020-04-23 15:53:14 -07:00
Matthew Ahrens 196bee4cfd Remove deduplicated send/receive code
Deduplicated send streams (i.e. `zfs send -D` and `zfs receive` of such
streams) are deprecated.  Deduplicated send streams can be received by
first converting them to non-deduplicated with the `zstream redup`
command.

This commit removes the code for sending and receiving deduplicated send
streams.  `zfs send -D` will now print a warning, ignore the `-D` flag,
and generate a regular (non-deduplicated) send stream.  `zfs receive` of
a deduplicated send stream will print an error message and fail.

The resulting code simplification (especially in the kernel's support
for receiving dedup streams) should help enable future performance
enhancements.

Several new tests are added which leverage `zstream redup`.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Issue #7887
Issue #10117
Issue #10156
Closes #10212
2020-04-23 10:06:57 -07:00
Joao Carlos Mendes Luis 70e5ad31f6 Fix more leaks detected by ASAN
This commit fixes a bunch of missing free() calls in a10d50f99

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: João Carlos Mendes Luís <jonny@jonny.eng.br>
Closes #10219
2020-04-22 10:40:34 -07:00
Matthew Ahrens 32d805c3e2 Use a struct to organize metaslab-group-allocator fields
Each metaslab group (of which there is one per top-level vdev) has
several (4, by default) "metaslab group allocators".  Each "allocator"
has its own metaslab that it prefers to allocate from (the "primary"
allocator), and each can perform allocations concurrently with the other
allocators.  In addition to the primary metaslab, there are several
other fields that need to be tracked separately for each allocator.
These are currently stored as several arrays in the metaslab_group_t,
each array indexed by allocator number.

This change organizes all the metaslab-group-allocator-specific fields
into a new struct, metaslab_group_allocator_t.  The metaslab_group_t now
needs only one array indexed by the allocator number - which contains
the metaslab_group_allocator_t's.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10213
2020-04-22 10:26:56 -07:00
Niklas Haas a84c92f933 Don't attempt trimming "hole" vdevs
On zpools containing hole vdevs (e.g. removed log devices), the `zpool
trim` (and presumably `zpool initialize`) commands will attempt calling
their respective functions on "hole", which fails, as this is not a real
vdev.

Avoid this by removing HOLE vdevs in zpool_collect_leaves.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Niklas Haas <git@haasn.xyz>
Closes #10227
2020-04-21 09:29:31 -07:00
Matthew Ahrens 1f043c8be1 Fix zfs send progress reporting
The progress of a send is supposed to be reported by `zfs send -v`, but
it is not.  This works by creating a new user thread (with
pthread_create()) which does ZFS_IOC_SEND_PROGRESS ioctls to check how
much progress has been made.  This IOCTL finds the specified send (since
there may be multiple concurrent sends in the system).  The IOCTL also
checks that the specified send was started by the current process.

On Linux, different threads of the same process are represented as
different `struct task_struct`s (and, confusingly, have different
PID's).  To check if if two threads are in the same process, we need to
check if they have the same `struct task_struct:group_leader`.

We used to to this correctly, but it was inadvertently changed by
30af21b025 (Redacted Send) to simply check if the current
`struct task_struct` is the one that started the send.

This commit changes the code back to checking if the send was started by
a `struct task_struct` with the same `group_leader` as the calling
thread.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Chris Wedgwood <cw@f00f.org>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10215 
Closes #10216
2020-04-20 10:12:48 -07:00
Matthew Macy c614fd6e12 Use new FreeBSD API to largely eliminate object locking
Propagate changes in HEAD that mostly eliminate object locking.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10205
2020-04-17 09:30:26 -07:00
George Amanakis 9249f1272e Persistent L2ARC minor fixes
Minor fixes on persistent L2ARC improving code readability and fixing 
a typo in zdb.c when byte-swapping a log block. It also improves the 
pesist_l2arc_007_pos.ksh test by giving it more time to retrieve log 
blocks on the cache device.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10210
2020-04-17 09:27:40 -07:00
Ryan Moeller a7929f3137 Update FreeBSD tunables
Remove some obsolete legacy compat, rename some misnamed, and add some
missing tunables for FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10203
2020-04-15 11:14:47 -07:00
Ryan Moeller af99094dee Don't delete freebsd.run in distclean
Add a comment so the file is not empty.

The comment can be removed when FreeBSD-specific tests are added.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10206
2020-04-15 09:21:40 -07:00
Ryan Moeller 813c8564ee Fix SC2086 note in zpool.d/smart
./cmd/zpool/zpool.d/smart:78:32:
note: Double quote to prevent globbing and word splitting. [SC2086]

Reported by latest shellcheck on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10194
2020-04-14 13:18:23 -07:00
alaviss 6b1139e82c sys/mnttab.h: include sys/stat.h for stat64
Musl libc defined `stat64` as a macro, which causes the build to fail
upon compiling os/linux/getmntany.c due to conflicts between the forward
declaration and the implementation.

This commit fixes that by including <sys/stat.h> in "sys/mnttab.h"
directly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes #10195
2020-04-14 11:47:40 -07:00
Matthew Macy 9f0a21e641 Add FreeBSD support to OpenZFS
Add the FreeBSD platform code to the OpenZFS repository.  As of this
commit the source can be compiled and tested on FreeBSD 11 and 12.
Subsequent commits are now required to compile on FreeBSD and Linux.
Additionally, they must pass the ZFS Test Suite on FreeBSD which is
being run by the CI.  As of this commit 1230 tests pass on FreeBSD
and there are no unexpected failures.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #898 
Closes #8987
2020-04-14 11:36:28 -07:00
Joao Carlos Mendes Luis 75c62019f3 Fix allocation errors, detected using ASAN
The test for VDEV_TYPE_INDIRECT is done after a memory allocation, and
could return from function without freeing it.  Since we don't need that
allocation yet, just postpone it.

Add a missing free() when buffer is no longer needed.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: João Carlos Mendes Luís <jonny@jonny.eng.br>
Closes #10193
2020-04-13 10:54:41 -07:00
Brian Behlendorf 791e480c6a Disable user space reference tracking
The memory and cpu cost of reference count tracking with the current
implementation is significant.  For this reason it has always been
disabled by default for the kmods.  Apply this same default to user
space so ztest doesn't always incur this performance penalty.

Our intention is to re-enable this by default for ztest once the code
has been optimized.  Since we expect to at some point provide a FUSE
implementation we wouldn't want this enabled by default for libzpool.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10189
2020-04-13 10:51:44 -07:00
alex c602b35cf7 ZTS: Fix and change testcase cache_010_neg
Commit 379ca9c removed the requirement on aux devices to be block
devices only but the test case cache_010_neg was not updated, making it
fail consistently.

This change changes the test to check that cache devices _can_ be
anything that presents a block interface. The testcase is renamed to
cache_010_pos and the exceptions for known failure removed from the test
runner.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Alex John <alex@stty.io>
Closes #10172
2020-04-13 10:50:41 -07:00
Matthew Ahrens 20f287855a zvol_write() can use dmu_tx_hold_write_by_dnode()
We can improve the performance of writes to zvols by using
dmu_tx_hold_write_by_dnode() instead of dmu_tx_hold_write().  This
reduces lock contention on the first block of the dnode object, and also
reduces the amount of CPU needed.  The benefit will be highest with
multi-threaded async writes (i.e. writes that don't call zil_commit()).

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10184
2020-04-10 21:14:01 -07:00
Brian Behlendorf 8080848254 Minor zstream redup command fixes
* Fix uninitialized variable in `zstream redup` command.  The
  'rdt.ddt_count' variable is uninitialized because it was
  allocated from the stack and not globally.  Initialize it.
  This was reported by gcc when compiling with debugging enabled.

    zstream_redup.c:157:16: error: 'rdt.ddt_count' may be used
    uninitialized in this function [-Werror=maybe-uninitialized]

* Remove the cmd/zstreamdump/.gitignore file.  It's no longer
  needed now that the zstreamdump command is a script.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10192
2020-04-10 21:10:09 -07:00
Matthew Ahrens c618f87cd2 Add zstream redup command to convert deduplicated send streams
Deduplicated send and receive is deprecated.  To ease migration to the
new dedup-send-less world, the commit adds a `zstream redup` utility to
convert deduplicated send streams to normal streams, so that they can
continue to be received indefinitely.

The new `zstream` command also replaces the functionality of
`zstreamdump`, by way of the `zstream dump` subcommand.  The
`zstreamdump` command is replaced by a shell script which invokes
`zstream dump`.

The way that `zstream redup` works under the hood is that as we read the
send stream, we build up a hash table which maps from `<GUID, object,
offset> -> <file_offset>`.

Whenever we see a WRITE record, we add a new entry to the hash table,
which indicates where in the stream file to find the WRITE record for
this block. (The key is `drr_toguid, drr_object, drr_offset`.)

For entries other than WRITE_BYREF, we pass them through unchanged
(except for the running checksum, which is recalculated).

For WRITE_BYREF records, we change them to WRITE records.  We find the
referenced WRITE record by looking in the hash table (for the record
with key `drr_refguid, drr_refobject, drr_refoffset`), and then reading
the record header and payload from the specified offset in the stream
file.  This is why the stream can not be a pipe.  The found WRITE record
replaces the WRITE_BYREF record, with its `drr_toguid`, `drr_object`,
and `drr_offset` fields changed to be the same as the WRITE_BYREF's
(i.e. we are writing the same logical block, but with the data supplied
by the previous WRITE record).

This algorithm requires memory proportional to the number of WRITE
records (same as `zfs send -D`), but the size per WRITE record is
relatively low (40 bytes, vs. 72 for `zfs send -D`).  A 1TB send stream
with 8KB blocks (`recordsize=8k`) would use around 5GB of RAM to
"redup".

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10124 
Closes #10156
2020-04-10 10:39:55 -07:00
George Amanakis 77f6826b83 Persistent L2ARC
This commit makes the L2ARC persistent across reboots. We implement
a light-weight persistent L2ARC metadata structure that allows L2ARC
contents to be recovered after a reboot. This significantly eases the
impact a reboot has on read performance on systems with large caches.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Saso Kiselkov <skiselkov@gmail.com>
Co-authored-by: Jorgen Lundman <lundman@lundman.net>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Ported-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #925 
Closes #1823 
Closes #2672 
Closes #3744 
Closes #9582
2020-04-10 10:33:35 -07:00
Ryan Moeller 36a6e2335c Don't ignore zfs_arc_max below allmem/32
Set arc_c_min before arc_c_max so that when zfs_arc_min is set lower
than the default allmem/32 zfs_arc_max can also be set lower.

Add warning messages when tunables are being ignored.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10157
Closes #10158
2020-04-09 15:39:48 -07:00
Matthew Macy 8b27e08ed8 Add separate field for indicating that spa is in middle of split
By default it's not possible to open a device already owned by an
active vdev. It's necessary to make an exception to this for vdev
split. The FreeBSD platform code will make an exception if
spa_is splitting is set to to true.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10178
2020-04-09 09:59:31 -07:00
Brian Behlendorf 68dde63d13 Linux 5.7 compat: blk_alloc_queue()
Commit https://github.com/torvalds/linux/commit/3d745ea5 simplified
the blk_alloc_queue() interface by updating it to take the request
queue as an argument.  Add a wrapper function which accepts the new
arguments and internally uses the available interfaces.

Other minor changes include increasing the Linux-Maximum to 5.6 now
that 5.6 has been released.  It was not bumped to 5.7 because this
release has not yet been finalized and is still subject to change.

Added local 'struct zvol_state_os *zso' variable to zvol_alloc.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10181 
Closes #10187
2020-04-09 09:16:46 -07:00
Matthew Macy 01c4f2bf29 Use vn_io_fault_uiomove on FreeBSD to avoid potential deadlock
Added to prevent a possible deadlock, the following comments from
FreeBSD explain the issue.  The comment describing vn_io_fault_uiomove:

/*
 * Helper function to perform the requested uiomove operation using
 * the held pages for io->uio_iov[0].iov_base buffer instead of
 * copyin/copyout.  Access to the pages with uiomove_fromphys()
 * instead of iov_base prevents page faults that could occur due to
 * pmap_collect() invalidating the mapping created by
 * vm_fault_quick_hold_pages(), or pageout daemon, page laundry or
 * object cleanup revoking the write access from page mappings.
 *
 * Filesystems specified MNTK_NO_IOPF shall use vn_io_fault_uiomove()
 * instead of plain uiomove().
 */

This used for vn_io_fault which has the following motivation:

/*
 * The vn_io_fault() is a wrapper around vn_read() and vn_write() to
 * prevent the following deadlock:
 *
 * Assume that the thread A reads from the vnode vp1 into userspace
 * buffer buf1 backed by the pages of vnode vp2.  If a page in buf1 is
 * currently not resident, then system ends up with the call chain
 *   vn_read() -> VOP_READ(vp1) -> uiomove() -> [Page Fault] ->
 *     vm_fault(buf1) -> vnode_pager_getpages(vp2) -> VOP_GETPAGES(vp2)
 * which establishes lock order vp1->vn_lock, then vp2->vn_lock.
 * If, at the same time, thread B reads from vnode vp2 into buffer buf2
 * backed by the pages of vnode vp1, and some page in buf2 is not
 * resident, we get a reversed order vp2->vn_lock, then vp1->vn_lock.
 *
 * To prevent the lock order reversal and deadlock, vn_io_fault() does
 * not allow page faults to happen during VOP_READ() or VOP_WRITE().
 * Instead, it first tries to do the whole range i/o with pagefaults
 * disabled. If all pages in the i/o buffer are resident and mapped,
 * VOP will succeed (ignoring the genuine filesystem errors).
 * Otherwise, we get back EFAULT, and vn_io_fault() falls back to do
 * i/o in chunks, with all pages in the chunk prefaulted and held
 * using vm_fault_quick_hold_pages().
 *
 * Filesystems using this deadlock avoidance scheme should use the
 * array of the held pages from uio, saved in the curthread->td_ma,
 * instead of doing uiomove().  A helper function
 * vn_io_fault_uiomove() converts uiomove request into
 * uiomove_fromphys() over td_ma array.
 *
 * Since vnode locks do not cover the whole i/o anymore, rangelocks
 * make the current i/o request atomic with respect to other i/os and
 * truncations.
 */

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10177
2020-04-08 10:30:27 -07:00
Ryan Moeller 7e3df9db12 Finish refactoring for ZFS_MODULE_PARAM_CALL
Linux and FreeBSD have different parameters for tunable proc handler.
This has prevented FreeBSD from implementing the ZFS_MODULE_PARAM_CALL
macro.

To complete the sharing of ZFS_MODULE_PARAM_CALL declarations, create
per-platform definitions of the parameter list, ZFS_MODULE_PARAM_ARGS.

With the declarations wired up we discovered an incorrect scope prefix
for spa_slop_shift, so this is now fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10179
2020-04-07 10:06:22 -07:00
alex 2a15c6aab4 libzfs_pool: Remove unused check for ENOTBLK
Commit 379ca9c removed the check on aux devices to be block devices also
changing zfs_ioctl(hdl, ZFS_IOC_VDEV_ADD, ...) and
zfs_ioctl(hdl, ZFS_IOC_POOL_CREATE, ...) to never set ENOTBLK. This
change removes the dangling check for ENOTBLK that will never trigger.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Alex John <alex@stty.io>
Closes #10173
2020-04-07 10:04:40 -07:00
Ryan Moeller 4a21ec0560 ZTS: Fix non-portable date format
The delegate tests use `date(1)` to generate snapshot names, using
the format '%F-%T-%N' to get nanosecond resolution (since multiple
snapshots may be taken in the same second).  '%N' is not portable, and
causes tests to fail on FreeBSD.

Since the only purpose these timestamps serve is to create a unique
name, simply use $RANDOM instead.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10170
2020-04-06 16:07:35 -07:00
Paul Dagnelie 5a42ef04fd Add 'zfs wait' command
Add a mechanism to wait for delete queue to drain.

When doing redacted send/recv, many workflows involve deleting files 
that contain sensitive data. Because of the way zfs handles file 
deletions, snapshots taken quickly after a rm operation can sometimes 
still contain the file in question, especially if the file is very 
large. This can result in issues for redacted send/recv users who 
expect the deleted files to be redacted in the send streams, and not 
appear in their clones.

This change duplicates much of the zpool wait related logic into a 
zfs wait command, which can be used to wait until the internal
deleteq has been drained.  Additional wait activities may be added 
in the future. 

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9707
2020-04-01 10:02:06 -07:00
Fabio Scaccabarozzi c9e3efdb3a Bugfix/fix uio partial copies
In zfs_write(), the loop continues to the next iteration without
accounting for partial copies occurring in uiomove_iov when 
copy_from_user/__copy_from_user_inatomic return a non-zero status.
This results in "zfs: accessing past end of object..." in the 
kernel log, and the write failing.

Account for partial copies and update uio struct before returning
EFAULT, leave a comment explaining the reason why this is done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: ilbsmart <wgqimut@gmail.com>
Signed-off-by: Fabio Scaccabarozzi <fsvm88@gmail.com>
Closes #8673 
Closes #10148
2020-04-01 09:48:54 -07:00
Matthew Ahrens 0929c4de39 Improve ZVOL sync write performance by using a taskq
== Summary ==

Prior to this change, sync writes to a zvol are processed serially.
This commit makes zvols process concurrently outstanding sync writes in
parallel, similar to how reads and async writes are already handled.
The result is that the throughput of sync writes is tripled.

== Background ==

When a write comes in for a zvol (e.g. over iscsi), it is processed by
calling `zvol_request()` to initiate the operation.  ZFS is expected to
later call `BIO_END_IO()` when the operation completes (possibly from a
different thread).  There are a limited number of threads that are
available to call `zvol_request()` - one one per iscsi client (unless
using MC/S).  Therefore, to ensure good performance, the latency of
`zvol_request()` is important, so that many i/o operations to the zvol
can be processed concurrently.  In other words, if the client has
multiple outstanding requests to the zvol, the zvol should have multiple
outstanding requests to the storage hardware (i.e. issue multiple
concurrent `zio_t`'s).

For reads, and async writes (i.e. writes which can be acknowledged
before the data reaches stable storage), `zvol_request()` achieves low
latency by dispatching the bulk of the work (including waiting for i/o
to disk) to a taskq.  The taskq callback (`zvol_read()` or
`zvol_write()`) blocks while waiting for the i/o to disk to complete.
The `zvol_taskq` has 32 threads (by default), so we can have up to 32
concurrent i/os to disk in service of requests to zvols.

However, for sync writes (i.e. writes which must be persisted to stable
storage before they can be acknowledged, by calling `zil_commit()`),
`zvol_request()` does not use `zvol_taskq`.  Instead it blocks while
waiting for the ZIL write to disk to complete.  This has the effect of
serializing sync writes to each zvol.  In other words, each zvol will
only process one sync write at a time, waiting for it to be written to
the ZIL before accepting the next request.

The same issue applies to FLUSH operations, for which `zvol_request()`
calls `zil_commit()` directly.

== Description of change ==

This commit changes `zvol_request()` to use
`taskq_dispatch_ent(zvol_taskq)` for sync writes, and FLUSh operations.
Therefore we can have up to 32 threads (the taskq threads)
simultaneously calling `zil_commit()`, for a theoretical performance
improvement of up to 32x.

To avoid the locking issue described in the comment (which this commit
removes), we acquire the rangelock from the taskq callback (e.g.
`zvol_write()`) rather than from `zvol_request()`.  This applies to all
writes (sync and async), reads, and discard operations.  This means that
multiple simultaneously-outstanding i/o's which access the same block
can complete in any order.  This was previously thought to be incorrect,
but a review of the block device interface requirements revealed that
this is fine - the order is inherently not defined.  The shorter hold
time of the rangelock should also have a slight performance improvement.

For an additional slight performance improvement, we use
`taskq_dispatch_ent()` instead of `taskq_dispatch()`, which avoids a
`kmem_alloc()` and eliminates a failure mode.  This applies to all
writes (sync and async), reads, and discard operations.

== Performance results ==

We used a zvol as an iscsi target (server) for a Windows initiator
(client), with a single connection (the default - i.e. not MC/S).

We used `diskspd` to generate a workload with 4 threads, doing 1MB
writes to random offsets in the zvol.  Without this change we get
231MB/s, and with the change we get 728MB/s, which is 3.15x the original
performance.

We ran a real-world workload, restoring a MSSQL database, and saw
throughput 2.5x the original.

We saw more modest performance wins (typically 1.5x-2x) when using MC/S
with 4 connections, and with different number of client threads (1, 8,
32).

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10163
2020-03-31 10:50:44 -07:00
George Amanakis 37c22948e5 Reset l2ad_hand and l2ad_first in l2arc_evict
Increasing l2arc_write_size or l2arc_write_boost can result in
l2arc_write_buffers() not having enough space to perform its writes and
panic zio_write_phys().

Instead of resetting l2ad_hand to l2ad_start at the end of
l2arc_write_buffers() and not taking into account a possible
user-mediated increase of l2arc_write_max, we do this in l2arc_evict(),
right after l2arc_write_size() has run. If there is not enough space to
evict (ie we will exceed l2ad_end) we evict to the end of the device,
reset l2ad_hand to l2ad_start, set l2ad_first to 0 and iterate
l2arc_evict(). We avoid infinite iteration of l2arc_evict() by making
sure in l2arc_write_size() that l2ad_start + size does not exceed
l2ad_end.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10154
2020-03-31 10:46:48 -07:00
Ryan Moeller c96a32e1a3 ZTS: Skip udev actions in zvol_misc when not Linux
udev is only used on Linux.

Skip udev_wait and udev_cleanup when not on Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10165
2020-03-31 10:35:14 -07:00
Ryan Moeller 9a51738b60 Let default arc_c_max be platform dependent
Linux changed the default max ARC size to 1/2 of physical memory to
deal with shortcomings of the Linux SLUB allocator.  Other platforms
do not require the same logic.

Implement an arc_default_max() function to determine a default max ARC
size in platform code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10155
2020-03-27 09:14:46 -07:00
Matthew Ahrens 3f38797338 Compile cityhash code into libzfs
Make the cityhash code compile into libzfs, in preparation for the new
"zstream" command.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10152
2020-03-27 09:11:22 -07:00
Ryan Moeller ef3331e703 ZTS: Wait for free space between quota tests
And in removal tests, sync the specific pool we are waiting on.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10146
2020-03-26 10:48:19 -07:00
Dirkjan Bussink 112c1bff94 Remove checks for null out value in encryption paths
These paths are never exercised, as the parameters given are always
different cipher and plaintext `crypto_data_t` pointers.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fueloep <attila@fueloep.org>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Closes #9661 
Closes #10015
2020-03-26 10:41:57 -07:00
alex 1d2ddb9bb9 zfs_get: change time format string from %k to %H
Issue #10090 reported that snapshots created between midnight and 1 AM
are missing a padded zero in the creation property

This change fixes the bug reported in issue #10090 where snapshots
created between midnight and 1 AM were missing a padded zero in the
creation timestamp output.

The leading zero was missing because the time format string used `%k`
which formats the hour as a decimal number from 0 to 23 where single
digits are preceded by blanks[0] and is fixed by changing it to `%H`
which formats the hour as 00-23.

The difference in output is as below

```
-Thu Mar 26  0:39 2020
+Thu Mar 26 00:39 2020
```

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alex John <alex@stty.io>
Closes #10090 
Closes #10153
2020-03-26 08:28:22 -07:00
Matthew Ahrens 652bdc9b0e Deprecate deduplicated send streams
Dedup send can only deduplicate over the set of blocks in the send
command being invoked, and it does not take advantage of the dedup table
to do so. This is a very common misconception among not only users, but
developers, and makes the feature seem more useful than it is. As a
result, many users are using the feature but not getting any benefit
from it.

Dedup send requires a nontrivial expenditure of memory and CPU to
operate, especially if the dataset(s) being sent is (are) not already
using a dedup-strength checksum.

Dedup send adds developer burden. It expands the test matrix when
developing new features, causing bugs in released code, and delaying
development efforts by forcing more testing to be done.

As a result, we are deprecating the use of `zfs send -D` and receiving
of such streams.  This change adds a warning to the man page, and also
prints the warning whenever dedup send or receive are used.

In a future release, we plan to:
1. remove the kernel code for generating deduplicated streams
2. make `zfs send -D` generate regular, non-deduplicated streams
3. remove the kernel code for receiving deduplicated streams
4. make `zfs receive` of deduplicated streams process them in userland
   to "re-duplicate" them, so that they can still be received.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #7887 
Closes #10117
2020-03-18 13:31:10 -07:00
Ryan Moeller 22df2457a7 Avoid core dump on invalid redaction bookmark
libzfs aborts and dumps core on EINVAL from the kernel when trying to
do a redacted send with a bookmark that is not a redaction bookmark.

Move redacted bookmark validation into libzfs.

Check if the bookmark given for redactions is actually a redaction
bookmark.  Print an error message and exit gracefully if it is not.

Don't abort on EINVAL in zfs_send_one.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10138
2020-03-18 12:54:12 -07:00
Avatat 4df8b2c373 Changed decimals to integers in the arcstat script
Changed interval value type from decimal to integer,
because of deprecation warning in Python 3.8 and above.
Also changed kstat values type from decimal to integer,
because all the values are integers.

Fixed behavior of arcstat when run without args.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Bartosz Zieba <bartosz@zieba.pro>
Closes #10132 
Closes #10142
2020-03-18 11:50:45 -07:00
Brian Behlendorf 5351951274 Fix zfs_rmnode() unlink / rollback issue
If a has rollback has occurred while a file is open and unlinked.
Then when the file is closed post rollback it will not exist in the
rolled back version of the unlinked object.  Therefore, the call to
zap_remove_int() may correctly return ENOENT and should be allowed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6812 
Closes #9739
2020-03-18 11:47:07 -07:00
Brian Behlendorf 6b7028ec51 Fix cstyle warnings
Fix minor cstyle warnings accidentally introduced by 7145123b.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10143
2020-03-17 15:42:27 -07:00
Paul Dagnelie 7145123b0a Separate warning for incomplete and corrupt streams
This change adds a separate return code to zfs_ioc_recv that is used 
for incomplete streams, in addition to the existing return code for 
streams that contain corruption.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #10122
2020-03-17 10:30:33 -07:00
Attila Fülöp 5b3b79559c ICP: gcm-avx: Support architectures lacking the MOVBE instruction
There are a couple of x86_64 architectures which support all needed
features to make the accelerated GCM implementation work but the
MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge
and AMD Bulldozer, Piledriver, and Steamroller.

By using MOVBE only if available and replacing it with a MOV
followed by a BSWAP if not, those architectures now benefit from
the new GCM routines and performance is considerably better
compared to the original implementation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Followup #9749 
Closes #10029
2020-03-17 10:24:38 -07:00
Mariusz Zaborski a57d3d45d6 Add option for forcible unmounting dataset while receiving snapshot.
Currently when the dataset is in use we can't receive snapshots.

    zfs send test/1@asd | zfs recv -FM test/2
    cannot unmount '/test/2': Device busy

This commits add option 'M' which attempts to forcibly unmount the
dataset.  Thanks to this we can enforce receiving snapshots in a
single step.

Note that this functionality is not supported on Linux because the
VFS will prevent active mounted filesystems from being unmounted,
even with the force option.  This is the intended VFS behavior.

Test cases were added to verify the expected behavior based on
the platform.

Discussed-with: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
External-issue: https://reviews.freebsd.org/D22306
Closes #9904
2020-03-17 10:08:32 -07:00
Ryan Moeller 80d98a8f3a ZTS: Use default_cleanup_noexit where needed
And add log_pass appropriately.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10136
2020-03-17 09:55:18 -07:00
Ryan Moeller e0d3284bc9 Exit status 256+signum is actually baked in to ksh
While #10121 did fix the signal numbers for FreeBSD/Darwin, it
incorrectly changed the expected encoding of exit status for commands
that exited on a signal.  The encoding 256+signum is a feature of the
shell.  Only the signal numbers themselves are platform-dependent.

Always use the encoding 256+signum when checking exit status for
signal exits.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10137
2020-03-17 09:49:58 -07:00
Ryan Moeller 4d32abaa87 libzfs: Fix bounds checks for float parsing
UINT64_MAX is not exactly representable as a double.

The closest representation is UINT64_MAX + 1, so we can use a >=
comparison instead of > for the bounds check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10127
2020-03-16 11:56:29 -07:00
Matthew Ahrens 7261fc2e81 Improve zfs receive performance by batching writes
For each WRITE record in the stream, `zfs receive` creates a DMU
transaction (`dmu_tx_create()`) and writes this block's data into the
object.  If per-block overheads (as opposed to per-byte overheads)
dominate performance (as is often the case with small recordsize), the
per-dmu-transaction overheads can be significant.  For example, in some
workloads the `receieve_writer` thread is 100% on CPU, and more than
half of its CPU time is in these per-tx routines (e.g.
dmu_tx_hold_write, dmu_tx_assign, dmu_tx_commit).

To improve performance of `zfs receive`, this commit batches WRITE
records which are to nearby offsets of the same object, and uses one DMU
transaction to write them all.  By default the batch size is 1MB, which
for recordsize=8K reduces the number of DMU transactions by 128x for
full send streams (incrementals will depend on how "clumpy" the changed
blocks are).

This commit improves the performance of `dd if=stream | zfs recv`
from 78,800 blocks/sec to 98,100 blocks/sec (25% improvement).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10099
2020-03-16 11:51:56 -07:00
Brian Behlendorf c94fb10917 Remove CI builder customization from TEST
The default options are reasonable for all of the CI builders.

* TEST_XFSTESTS_SKIP=yes  - This is already the default.
* TEST_ZTEST_TIMEOUT=3600 - Increased ztest run time only increases
  code coverage by a small degree.  Default 900s runs are sufficient.
* Disabling certain tests on 32-bit builders is no longer needed.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10129
2020-03-16 10:46:03 -07:00
Ryan Moeller d3fe62cb35 ZTS: Update flaky tests in zts-report
Some tests which pass on FreeBSD but fail on Linux had been put in the
"maybe" set.  Move these back to "known" under an "if Linux" check so
the expected outcome is clear.

Add some tests that have been found to be flaky on FreeBSD stable/12
to the "maybe" set.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10120
2020-03-13 09:29:10 -07:00
Matthew Ahrens 0fdd6106bb dmu_objset_from_ds must be called with dp_config_rwlock held
The normal lock order is that the dp_config_rwlock must be held before
the ds_opening_lock.  For example, dmu_objset_hold() does this.
However, dmu_objset_open_impl() is called with the ds_opening_lock held,
and if the dp_config_rwlock is not already held, it will attempt to
acquire it.  This may lead to deadlock, since the lock order is
reversed.

Looking at all the callers of dmu_objset_open_impl() (which is
principally the callers of dmu_objset_from_ds()), almost all callers
already have the dp_config_rwlock.  However, there are a few places in
the send and receive code paths that do not.  For example:
dsl_crypto_populate_key_nvlist, send_cb, dmu_recv_stream,
receive_write_byref, redact_traverse_thread.

This commit resolves the problem by requiring all callers ot
dmu_objset_from_ds() to hold the dp_config_rwlock.  In most cases, the
code has been restructured such that we call dmu_objset_from_ds()
earlier on in the send and receive processes, when we already have the
dp_config_rwlock, and save the objset_t until we need it in the middle
of the send or receive (similar to what we already do with the
dsl_dataset_t).  Thus we do not need to acquire the dp_config_rwlock in
many new places.

I also cleaned up code in dmu_redact_snap() and send_traverse_thread().

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9662
Closes #10115
2020-03-12 10:55:02 -07:00
Alexander Motin fa130e010c Fix infinite scan on a pool with only special allocations
Attempt to run scrub or resilver on a new pool containing only special
allocations (special vdev added on creation) caused infinite loop
because of dsl_scan_should_clear() limiting memory usage to 5% of pool
size, which it calculated accounting only normal allocation class.

Addition of special and just in case dedup classes fixes the issue.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #10106 
Closes #8694
2020-03-12 10:52:03 -07:00
Ryan Moeller 94eb65b4c1 ZTS: Use correct signal numbers for status checks
Different operating systems encode exit status in different ways.
The logapi shell library assumes the Solaris meaning of exit codes,
which is not correct on other platforms.

Define the needed constants according to the platform we are running
on and use those to decode process exit status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10121
2020-03-12 10:50:51 -07:00
Ryan Moeller cdbc34fc2b ZTS: Test boundary conditions in alloc_class_012
Issue #9142 describes an error in the checks for device removal that
can prevent removal of special allocation class vdevs in some
situations.

Enhance alloc_class/alloc_class_012_pos to check situations where this
bug occurs.

Update zts-report with knowledge of issue #9142.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10116 
Issue #9142
2020-03-12 10:50:01 -07:00
Ryan Moeller e70b127e05 ZTS: Wait for free space between write_dirs tests
Cleanup for write_dirs involves destroying a dataset filling a pool
and then recreating the dataset for the next test.  Due to the
asynchronous nature of free space accounting, recreating the dataset
can fail for lack of space, causing problems for the next test.

Add wait_freeing $TESTPOOL to wait for the space to be freed and then
sync_pool $TESTPOOL to update the space accounting before attempting
to recreate the test filesystem.

Only use a single disk to create the pool.  Make it a small file so it
does not take too long to fill.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10112
2020-03-12 10:48:46 -07:00
John Poduska e6b28efccc Prevent race condition in dnode_dest (#10101)
dnode_special_close() waits for the refcount of dn_holds to go to zero
without holding the dn_mtx. dnode_rele_and_unlock() does the final
remove to dn_holds with dn_mtx being held:

	refs = zfs_refcount_remove(&dn->dn_holds, tag);
	mutex_exit(&dn->dn_mtx);

So, there is a race condition after the remove until dn_mtx is
dropped. During that time, dnode_destroy() can get called, which ends
up in dnode_dest() calling mutex_destroy() and a panic since the lock
is still held.

This change adds a condvar to wait for the final dnode_rele_and_unlock()
to release the dn_mtx before calling dnode_destroy().

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes #7814
Closes #10101
2020-03-12 10:25:56 -07:00
Mark Roper 1e9231ada8 Prevent deadlock in arc_read in Linux memory reclaim callback
Using zfs with Lustre, an arc_read can trigger kernel memory allocation
that in turn leads to a memory reclaim callback and a deadlock within a
single zfs process. This change uses spl_fstrans_mark and
spl_trans_unmark to prevent the reclaim attempt and the deadlock
(https://zfsonlinux.topicbox.com/groups/zfs-devel/T4db2c705ec1804ba).
The stack trace observed is:

    __schedule at ffffffff81610f2e
    schedule at ffffffff81611558
    schedule_preempt_disabled at ffffffff8161184a
    __mutex_lock at ffffffff816131e8
    arc_buf_destroy at ffffffffa0bf37d7 [zfs]
    dbuf_destroy at ffffffffa0bfa6fe [zfs]
    dbuf_evict_one at ffffffffa0bfaa96 [zfs]
    dbuf_rele_and_unlock at ffffffffa0bfa561 [zfs]
    dbuf_rele_and_unlock at ffffffffa0bfa32b [zfs]
    osd_object_delete at ffffffffa0b64ecc [osd_zfs]
    lu_object_free at ffffffffa06d6a74 [obdclass]
    lu_site_purge_objects at ffffffffa06d7fc1 [obdclass]
    lu_cache_shrink_scan at ffffffffa06d81b8 [obdclass]
    shrink_slab at ffffffff811ca9d8
    shrink_node at ffffffff811cfd94
    do_try_to_free_pages at ffffffff811cfe63
    try_to_free_pages at ffffffff811d01c4
    __alloc_pages_slowpath at ffffffff811be7f2
    __alloc_pages_nodemask at ffffffff811bf3ed
    new_slab at ffffffff81226304
    ___slab_alloc at ffffffff812272ab
    __slab_alloc at ffffffff8122740c
    kmem_cache_alloc at ffffffff81227578
    spl_kmem_cache_alloc at ffffffffa048a1fd [spl]
    arc_buf_alloc_impl at ffffffffa0befba2 [zfs]
    arc_read at ffffffffa0bf0924 [zfs]
    dbuf_read at ffffffffa0bf9083 [zfs]
    dmu_buf_hold_by_dnode at ffffffffa0c04869 [zfs]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Roper <markroper@gmail.com>
Closes #9987
2020-03-12 10:24:43 -07:00
Olaf Faaland 61871518dd zloop.sh should call ZDB with pool name
Commit 54007c79 introduced an error, changing the final
argument to $ZDB from ztest to $ZTEST.  This argument
indicates the pool name, not the script, and so should
not have been changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #10118
2020-03-11 10:02:23 -07:00
Ryan Moeller ddd9ef3a4f ZTS: Add a failsafe callback to run after each test
Tests that get killed do not have an opportunity to clean up.

There are many bad states this can leave the system in, but of
particular gravity is when zinject has been used to induce bad
behavior for one or more of the test disks.

Create a failsafe mechanism in test-runner.py that runs a callback
script after every test. The script is common to all tests so all
tests benefit from the protection.

Add an obligatory `zinject -c all` to clear all zinject state after
every test case is run.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10096
2020-03-10 11:00:56 -07:00
Matthew Ahrens 1dc32a67e9 Improve zfs send performance by bypassing the ARC
When doing a zfs send on a dataset with small recordsize (e.g. 8K),
performance is dominated by the per-block overheads.  This is especially
true with `zfs send --compressed`, which further reduces the amount of
data sent, for the same number of blocks.  Several threads are involved,
but the limiting factor is the `send_prefetch` thread, which is 100% on
CPU.

The main job of the `send_prefetch` thread is to issue zio's for the
data that will be needed by the main thread.  It does this by calling
`arc_read(ARC_FLAG_PREFETCH)`.  This has an immediate cost of creating
an arc_hdr, which takes around 14% of one CPU.  It also induces later
costs by other threads:

 * Since the data was only prefetched, dmu_send()->dmu_dump_write() will
   need to call arc_read() again to get the data.  This will have to
   look up the arc_hdr in the hash table and copy the data from the
   scatter ABD in the arc_hdr to a linear ABD in arc_buf.  This takes
   27% of one CPU.

 * dmu_dump_write() needs to arc_buf_destroy()  This takes 11% of one
   CPU.

 * arc_adjust() will need to evict this arc_hdr, taking about 50% of one
   CPU.

All of these costs can be avoided by bypassing the ARC if the data is
not already cached.  This commit changes `zfs send` to check for the
data in the ARC, and if it is not found then we directly call
`zio_read()`, reading the data into a linear ABD which is used by
dmu_dump_write() directly.

The performance improvement is best expressed in terms of how many
blocks can be processed by `zfs send` in one second.  This change
increases the metric by 50%, from ~100,000 to ~150,000.  When the amount
of data per block is small (e.g. 2KB), there is a corresponding
reduction in the elapsed time of `zfs send >/dev/null` (from 86 minutes
to 58 minutes in this test case).

In addition to improving the performance of `zfs send`, this change
makes `zfs send` not pollute the ARC cache.  In most cases the data will
not be reused, so this allows us to keep caching useful data in the MRU
(hit-once) part of the ARC.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10067
2020-03-10 10:51:04 -07:00
Ryan Moeller 9be70c3784 ZTS: Simplify some libtest functions
Don't echo the results of arithmetic expressions, it's not necessary.

Use hw.clockrate sysctl to get CPU freq instead of parsing dmesg.boot
for a line that might not even be there anymore.

Reduce bookkeeping in fill_fs, making it easier to follow.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10113
2020-03-10 10:44:14 -07:00
Richard Laager 5ecbb293c6 Fix zfs-functions packaging bug
This fixes a bug where the generated zfs-functions was being included
along with original zfs-functions.in in the make dist tarball.  This
caused an unfortunate series of events during build/packaging that
resulted in the RPM-installed /etc/zfs/zfs-functions listing the
paths as:

ZFS="/usr/local/sbin/zfs"
ZED="/usr/local/sbin/zed"
ZPOOL="/usr/local/sbin/zpool"

When they should have been:

ZFS="/sbin/zfs"
ZED="/sbin/zed"
ZPOOL="/sbin/zpool"

This affects init.d (non-systemd) distros like CentOS 6.

/etc/default/zfs and /etc/zfs/zfs-functions are also used by the
initramfs, so they need to be built even when init.d support is not.
They have been moved to the (new) etc/default and (existing) etc/zfs
source directories, respectively.

Fixes: #9443

Co-authored-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
2020-03-10 09:53:20 -07:00
Richard Laager 01243e72a5 initramfs: Eliminate substitutions
These are now handled in zfs-functions, so this is all duplicative and
unnecessary.

Signed-off-by: Richard Laager <rlaager@wiktel.com>
2020-03-10 09:53:20 -07:00
Richard Laager 49afc91387 Delete built init scripts in make clean
Previously, they were being deleted in make distclean.  This brings it
in line with the example:
https://www.gnu.org/software/automake/manual/html_node/Scripts.html

Signed-off-by: Richard Laager <rlaager@wiktel.com>
2020-03-10 09:53:20 -07:00
Richard Laager dc4dd46728 Make init scripts depend on Makefile
This brings it in line with the example:
https://www.gnu.org/software/automake/manual/html_node/Scripts.html

This way, if the substitution code is changed, they should update.

Signed-off-by: Richard Laager <rlaager@wiktel.com>
2020-03-10 09:53:20 -07:00
InsanePrawn ff2f960b24 Systemd mount generator: don't fail keyload from file if already loaded
Previously the generated keyload units for encryption roots with
keylocation=file://* didn't contain the code to detect if the key
was already loaded and would be marked failed in such situations.

Move the code to check whether the key is already loaded
from keylocation=prompt handling to general key loading code.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #10103
2020-03-09 11:09:09 -07:00
Ryan Moeller 2b95e91132 ZTS: Another round of changes for FreeBSD
Highlights:
* is_linux -> is_illumos swaps
* make block_device_wait more clever when paths are given
* slightly optimize default_cleanup_noexit
* remove platform differences in user_run
* temporarily expect non-libfetch behavior for keylocation=/foo/bar
* fix sharenfs exceptions
* don't test multihost property
* fix misc broken platform checks
* clear zinjected faults in removal_resume_export callback

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10092
2020-03-06 09:31:32 -08:00
Ryan Moeller f5f6fb03b7 Change default to overlay=on
Filesystems allow overlay mounts by default on FreeBSD and Linux.

Respect the native convention by switching the default to overlay=on,
while retaining the option to turn the property off for compatibility
with other operating systems' conventions.

Update documentation and tests accordingly.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10030
2020-03-06 09:28:19 -08:00
Ryan Moeller 788398c562 ZTS: Update zts-report exceptions for FreeBSD
The new zfs_sync_trim_* tests are skipped on FreeBSD.
Both of the previously failing tests are now passing.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10105
2020-03-06 09:26:38 -08:00
Brian Behlendorf 5a1abc4b5b ZTS: Speed up write_dirs cleanup
The write_dirs tests fill a filesystem with a bunch of files until it
is full.  In cleanup the files are truncated and removed individually.
These tests already take a while to run.

It is quicker and easier to destroy the whole dataset and create a new
one to replace it in the cleanup functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10098
2020-03-04 15:12:12 -08:00
Brian Behlendorf fa23c5be88 ZTS: Add missing quotes
`default_setup` takes a disk list as the first argument and has
optional additional arguments that control secondary functionality.
A couple of test setups mistakenly call `default_setup $DISKS`.

Add quotes so the second and subsequent disks are correctly included
in the pool as vdevs rather than triggering unwanted behavior from
`default_setup`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10097
2020-03-04 15:10:45 -08:00
Brian Behlendorf 4b06d05298 ZTS: Add zts-report exceptions for FreeBSD
There are three tests we expect to fail only on FreeBSD.
* link_count never exits and eventually times out:
 - @amotin tells me this test is probably not applicable to us
 - Skip on FreeBSD
* userobj feature does not activate immediately after pool upgrade
 - low impact; we are aware of this issue
* removal does not appear to condense on export
 - low impact; we are aware of this issue

Additionally removal_with_zdb passes on FreeBSD, so it is moved to
"maybe".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10093
2020-03-04 15:09:40 -08:00
Brian Behlendorf f49db9b504 zio: dprintf_bp() if errors > 0 in zfs_blkptr_verify()
Also dprintf_bp() in case BLK_VERIFY_HALT of zfs_blkptr_verify_log()
since dprintf_bp() in zfs_blkptr_verify() will never be executed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Justin Keogh <commits@v6y.net>
Closes #10086
2020-03-04 15:08:41 -08:00
Brian Behlendorf d16c029f99 ZTS: Test the correct filesystem_limits behavior
See issue #8226: Property filesystem_limit does not work as documented

There have been previous attempts to fix the behavior on Linux, but so
far the issue is still open.  See PRs #8228, #8280.

The existing tests pass for the incorrect behavior.  This is a problem
on FreeBSD; we are failing the tests because we implement the feature
correctly.

I have adapted the tests based on the work by @loli10k in #8280 and
extended the changes to fix the snapshot_limit test as well.

Linux now fails these tests, so entries linking to the issue have been
added to the "maybe" group in zts-report.py.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10082
2020-03-04 15:07:52 -08:00
Brian Behlendorf 2288d41968 Add trim support to zpool wait
Manual trims fall into the category of long-running pool activities
which people might want to wait synchronously for. This change adds
support to 'zpool wait' for waiting for manual trim operations to
complete. It also adds a '-w' flag to 'zpool trim' which can be used to
turn 'zpool trim' into a synchronous operation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
Closes #10071
2020-03-04 15:07:11 -08:00
Matthew Ahrens b3212d2fa6 Improve performance of zio_taskq_member
__zio_execute() calls zio_taskq_member() to determine if we are running
in a zio interrupt taskq, in which case we may need to switch to
processing this zio in a zio issue taskq.  The call to
zio_taskq_member() can become a performance bottleneck when we are
processing a high rate of zio's.

zio_taskq_member() calls taskq_member() on each of the zio interrupt
taskqs, of which there are 21.  This is slow because each call to
taskq_member() does tsd_get(taskq_tsd), which on Linux is relatively
slow.

This commit improves the performance of zio_taskq_member() by having it
cache the value of tsd_get(taskq_tsd), reducing the number of those
calls to 1/21th of the current behavior.

In a test case running `zfs send -c >/dev/null` of a filesystem with
small blocks (average 2.5KB/block), zio_taskq_member() was using 6.7% of
one CPU, and with this change it is reduced to 1.3%.  Overall time to
perform the `zfs send` reduced by 10% (~150,000 block/sec to ~165,000
blocks/sec).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10070
2020-03-03 10:29:38 -08:00
Ryan Moeller 0a0f9a7dc6 ZTS: Provide for nested cleanup routines
Shared test library functions lack a simple way to ensure proper
cleanup in the event of a failure.  The `log_onexit` cleanup pattern
cannot be used in library functions because it uses one global
variable to store the cleanup command.

An example of where this is a serious issue is when a tunable that
artifically stalls kernel progress gets activated and then some check
fails.  Unless the caller knows about the tunable and sets it back,
the system will be left in a bad state.

To solve this problem, turn the global cleanup variable into a stack.
Provide push and pop functions to add additional cleanup steps and
remove them after it is safe again.

The first use of this new functionality is in attempt_during_removal,
which sets REMOVAL_SUSPEND_PROGRESS.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10080
2020-03-03 10:28:09 -08:00
Ryan Moeller 9bb907bc3f Make spa_history_zone platform-dependent in kernel
This function should only return "linux" on Linux.

Move the kernel part of the function out of common code.
Fix the tests for FreeBSD.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10079
2020-03-02 09:43:30 -08:00
Ryan Moeller 1289fbb557 ZTS: Change issue URL template to OpenZFS org
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10081
2020-03-02 09:42:22 -08:00
Matthew Macy d32eff3a27 Don't open zfs control device exclusively
With the FreeBSD platform changes that were made for #10073
it is no longer necessary on FreeBSD to open the control device
exclusively to get onexit callbacks invoked.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10076
2020-02-28 14:54:14 -08:00
Matthew Macy cf118ae8dc Don't call zrele on passed zp in zfs_xattr_owner_unlinked on FreeBSD
FreeBSD has a somewhat more cumbersome locking and refcounting
protocol for the platform counterpart to znode. We need to not call
zrele on the passed zp, but do need to do so on any intermediate zp.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10075
2020-02-28 14:53:18 -08:00
Matthew Macy ae9f92f6f3 Re-share zfsdev_getminor and zfs_onexit_fd_hold
By adding a zfs_file_private accessor to the common
interfaces and some extensions to FreeBSD platform
code it is now possible to share the implementations
for the aforementioned functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10073
2020-02-28 14:50:32 -08:00
Matthew Ahrens 9cdf7b1f6b Improve zfs destroy performance with zio_t-free zio_free()
When "zfs destroy" is run, it completes quickly, and in the background
we locate the blocks to free and free them.  This background activity
can be observed with `zpool get freeing` and `zpool wait -t free ...`.

This background activity is processed by a single thread (the spa_sync
thread) which calls zio_free() on each of the blocks to free.  With even
modest storage performance, the CPU consumption of zio_free() can be the
performance bottleneck.

Performance of zio_free() can be improved by not actually creating a
zio_t in the common case (non-dedup, non-gang), instead calling
metaslab_free() directly.  This avoids the CPU cost of allocating the
zio_t, and more importantly the cost of adding and later removing this
zio_t from the parent zio's child list.

The result is that performance of background freeing more than doubles,
from 0.6 million blocks per second to 1.3 million blocks per second.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10034
2020-02-28 14:49:44 -08:00
Ryan Moeller 6c0abcfddd ZTS: Fixup shebang in rsend_016, add to common.run
All other ksh scripts use /bin/ksh in the shebang.

Make rsend_016_neg consistent with the rest of the suite.

The test also was absent from any runfiles. Add it to common.run.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10051
2020-02-28 09:48:29 -08:00
Ryan Moeller f0410e9806 ZTS: Eliminate partitioning from zpool_add
Use file vdevs if we are short on $DISKS.
Also fixed vol recursion for FreeBSD in 004.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10060
2020-02-28 09:46:51 -08:00
Brian Behlendorf 3f99a3abc7 Fix CONFIG_MODULES=no Linux kernel config
When configuring as builtin (--enable-linux-builtin) for kernels
without loadable module support (CONFIG_MODULES=n) only the object
file is created.  Never a loadable kmod.

Update ZFS_LINUX_TRY_COMPILE to handle this in a manor similar to
the ZFS_LINUX_TEST_COMPILE_ALL macro.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9887
Closes #10063
2020-02-28 09:23:48 -08:00
Brian Behlendorf bd0d24e09b Linux 5.5 compat: blkg_tryget()
Commit https://github.com/torvalds/linux/commit/9e8d42a0f accidentally
converted the static inline function blkg_tryget() to GPL-only for
kernels built with CONFIG_PREEMPT_RCU=y and CONFIG_BLK_CGROUP=y.

Resolve the build issue by providing our own equivalent functionality
when needed which uses rcu_read_lock_sched() internally as before.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9745
Closes #10072
2020-02-28 08:58:39 -08:00
Ryan Moeller 2ce90dca91 arc_summary: Make get_descriptions per platform
Linux uses modinfo to get tunables descriptions, FreeBSD has to use
sysctl.

Move the existing function definition so it is defined that way on
Linux, and add a definition in terms of sysctl for FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10062
2020-02-27 17:15:06 -08:00
Ryan Moeller 276410387e pyzfs: Add constants for platform-specific errnos
FreeBSD doesn't have EBADE, ECHRNG, or ETIME.

Add constants for these and set them appropriately for the platform.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10061
2020-02-27 17:14:21 -08:00
Matthew Macy 13fac09868 Consolidate arc_buf allocation checks
The following check currently occurs in three separate locations
in dbuf.c.  This change consolidates those checks in to the
dbuf_alloc_arcbuf_from_arcbuf() function.

if (arc_is_encrypted(data)) {
...
} else if (compress_type != ZIO_COMPRESS_OFF) {
...
} else {
...
}

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10057
2020-02-27 17:12:44 -08:00
Ryan Moeller 3d5ba1cf29 ZTS: Misc fixes for FreeBSD
* Set geom debug flags in corrupt_blocks_at_level
* Use the right time zone for history tests
* Add missing commands.cfg entry for diskinfo
* Rewrite get_last_txg_synced to use zdb
* Don't check ulimits for sparse files
* Suspend removal before removing a vdev, not after

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10054
2020-02-27 09:38:34 -08:00
Matthew Ahrens ab9646166d ZTS: Fix zfs_receive_004_neg
`zfs recv` of an incremental stream that already exists is ignored, with
a message like:

    receiving incremental stream of pool/fs@incsnap into pool/fs@incsnap
    snap testpool/testfs@incsnap already exists; ignoring

And the command exits successfully (exit code 0).

The zfs_receive_004_neg test is expecting that a this case will fail,
with nonzero exit code.

The fix is to remove this specific command from the test case.  This
lets us check that the remaining commands do in fact fail.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10055
2020-02-27 09:37:34 -08:00
Brian Behlendorf 2c3a83701d Linux 5.6 compat: time_t
As part of the Linux kernel's y2038 changes the time_t type has been
fully retired.  Callers are now required to use the time64_t type.

Rather than move to the new type, I've removed the few remaining
places where a time_t is used in the kernel code.  They've been
replaced with a uint64_t which is already how ZFS internally
handled these values.

Going forward we should work towards updating the remaining user
space time_t consumers to the 64-bit interfaces.

Reviewed-by: Matthew Macy <mmacy@freebsd.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10052
Closes #10064
2020-02-27 09:31:02 -08:00
Brian Behlendorf ff5587d651 Linux 5.6 compat: ktime_get_raw_ts64()
The getrawmonotonic() and getrawmonotonic64() interfaces have been
fully retired.  Update gethrtime() to use the replacement interface
ktime_get_raw_ts64() which was introduced in the 4.18 kernel.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10052
Closes #10064
2020-02-27 09:30:45 -08:00
Matthew Macy 28caa74b19 Refactor dnode dirty context from dbuf_dirty
* Add dedicated donde_set_dirtyctx routine.
* Add empty dirty record on destroy assertion.
* Make much more extensive use of the SET_ERROR macro.

Reviewed-by: Will Andrews <wca@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9924
2020-02-26 16:09:17 -08:00
Ryan Moeller 647ff8e975 ZTS: Fix zfs_copies_002_pos
The function `get_used_prop` does not exist.

Use `get_prop used` instead.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10059
2020-02-26 14:29:12 -08:00
Ryan Moeller abef699866 ZTS: Adapt casenorm tests for FreeBSD
Several casenorm tests pass on FreeBSD but are expected to fail on
Linux.

Move the passing tests from "fail" to "maybe" so that passing on
FreeBSD is not unexpected.

Invert platform logic so FreeBSD doesn't use illumos-only zlook.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10050
2020-02-26 08:41:30 -08:00
Ryan Moeller 3a192f7d89 ZTS: Misc fixes for FreeBSD
* Check for mountd in is_shared to avoid timeout when not running
* Enhance robustness of some cleanup functions
* Simplify atime lookup
* Skip sharenfs validation for now
* Don't add mountpoint property to inheritance validation on FreeBSD

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10047
2020-02-25 16:23:27 -08:00
Ryan Moeller a33cb7e01a Add missing newline after zfs redact help message
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10045
2020-02-25 16:20:50 -08:00
Olaf Faaland 034313908a ZTS: zed_start should not fail if zed is already running
zed_start may be called in places where zed is not
typically already running, but this is not a requirement
of the tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #9974
2020-02-25 16:02:10 -08:00
Matthew Macy c6a6b4d50a Remove dead code error handling from dsl_crypt.c
Sleepable (KM_SLEEP) allocations cannot fail. Hence
error handling for them is not useful.

Reviewed-By: Tom Caputi <tcaputi@datto.com>
Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10031
2020-02-25 15:59:29 -08:00
Ryan Moeller 2757010166 ZTS: Move atime_003 to linux.run
This test verifies relatime behavior, which is only present on Linux.

Move the test to linux.run

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10046
2020-02-25 15:27:41 -08:00
Matthew Ahrens 521abcc49a Update README for OpenZFS
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10053
2020-02-25 11:43:20 -08:00
Dirkjan Bussink 327000ce04 Remove zfs_getattr and convoff dead code
The `convoff` function is called only in one code path in `zfs_space`.
Each caller of `zfs_space` is called with a `flock64_t` that has
`l_whence` set to `SEEK_SET`. This means that `convoff` always results
in a no-op as the `bfp` parameter has `l_whence` set to `SEEK_SET` and
`int whence` is `SEEK_SET` as well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Closes #10006
2020-02-24 15:38:22 -08:00
Ryan Moeller 92bd4cabd6 ZTS: Misc fixes for FreeBSD
* Force UFS sync before snap in vol rollback tests
* rw is not a valid share option on FreeBSD, use ro instead
* zfs_unmount_nested: mountpoint is in the pool, rmdir *before* export
* Fix some more platform checks
* Fix disappearing group in delegate tests
* Don't try delegating for jailed, only root can set it

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10038
2020-02-24 10:17:55 -08:00
Matthew Ahrens 31a69fbccb Remove unused structs and members in dmu_send.c
There are several structs (and members of structs) related to redaction,
which are no longer used.  This commit removes them.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10039
2020-02-24 09:50:14 -08:00
Ryan Moeller 24fcd9fc5c ZTS: Eliminate partitioning from zpool_destroy
The zpool destroy tests partition a single disk to create two pools.

This can be done using two disks and no partitioning instead.
And temporarily allow vol recursion for FreeBSD while in here.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10036
2020-02-21 16:00:22 -08:00
Ryan Moeller b7dbbf6aa7 ZTS: Refactor is_shared, fix impl on FreeBSD
FreeBSD doesn't have a `share` command.  It does have showmount.

Split the separate platform impls out of is_shared_impl.
Dispatch to the correct platform impl function from is_shared.
Eliminate the use of is_shared_impl from tests.  is_shared works.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10037
2020-02-21 15:59:20 -08:00
Ryan Moeller f5f438194d ZTS: Move privilege tests to sunos.run
These tests are unspported on FreeBSD and Linux for lack of pfexec.

Move the privilege tests to sunos.run and remove the platform checks.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10035
2020-02-21 08:52:44 -08:00
Ryan Moeller 6a60841631 ZTS: Don't use lsblk on FreeBSD
These tests use lsblk to find the sector size of a disk.
FreeBSD doesn't have lsblk.

Use diskinfo -v to get sector size on FreeBSD.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>\
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10033
2020-02-21 08:38:34 -08:00
Ryan Moeller ca7ea23f8a ZTS: Fix userquota_006_pos on FreeBSD
FreeBSD uses `pw` for account management. `userquota_006_pos`
erroneously invokes the non-existent `groupdel` command on FreeBSD.

Use `pw groupdel -n` instead of `groupdel` on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10032
2020-02-20 08:14:24 -08:00
Ryan Moeller b11375d74a ZTS: Check the right mount options on FreeBSD
FreeBSD does not support the "devices" and "nodevices" mount options.

Do not check these options on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10028
2020-02-20 08:12:23 -08:00
Ryan Moeller 325a551232 ZTS: Fix faulty slog_replay_fs_001 test
This test is supposed to verify zil operations. For TX_WRITE, writes
must be synchronous in order to be entered in the zil. Linux seems to
be doing sync writes even when they are not asked for, but on FreeBSD
the test does not do what is intended.

Use dd oflag=sync for the parts of this test that are supposed to
result in TX_WRITE zil entries.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10022
2020-02-20 08:11:51 -08:00
Arvind Sankar 65635c3403 Fix icp include directories for in-tree build
When zfs is built in-tree using --enable-linux-builtin, the compile
commands are executed from the kernel build directory. If the build
directory is different from the kernel source directory, passing
-Ifs/zfs/icp will not find the headers as they are not present in the
build directory.

Fix this by adding @abs_top_srcdir@ to pull the headers from the zfs
source tree instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Closes #10021
2020-02-20 08:10:47 -08:00
Ryan Moeller 8136956716 ZTS: Eliminate partitioning from zpool_create etc
These tests can be made to work without a bunch of complex
partitioning of physical disks.

Use the 3 disks directly, creating a few file disks if needed for a
compelling reason.

Reduce the use of shared variables that don't have a clear utility.

Catch the fallout in tests that include cfg/shlib from zpool_create.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10002
2020-02-20 08:10:13 -08:00
Ryan Moeller 873cd182de ZTS: Fix zpool_create/create-o_ashift on FreeBSD
For some unknown reason, egrep was misbehaving with this pattern on
FreeBSD.  The command works fine run interactively from a shell, but
in the test the output of egrep is empty.

Work around the issue by using a filter in the awk script instead.

While here, add a bit of diagnostic output and other simplifications
to the awk script as well.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10023
2020-02-19 10:27:23 -08:00
Ryan Moeller 392556f0ef ZTS: Avoid nonportable cmp flag
FreeBSD doesn't have the -n flag for cmp.

Read the area for the first four labels from the disk to a separate
file to compare instead of using the special flag to limit the size.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10024
2020-02-19 09:03:31 -08:00
Mariusz Zaborski 89adffb7bc Add notice that forcefully unmount is not supported on Linux
The Linux VFS will never allow a filesystem which is in use to
be unmounted.  This behavior differs from other platforms like
FreeBSD which allow a filesystem to be force unmounted.  This 
will result in errors being returned to applications actively
using the filesystem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #10013
2020-02-18 13:36:23 -08:00
Ryan Moeller 43849fdf3f ZTS: Move free to Linux commands list
FreeBSD does not have the free command. This command is only used by
Linux in a perf hostinfo function.

Move free from the list of common commands to the list of Linux
commands.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10011
2020-02-18 11:23:41 -08:00
Ryan Moeller 5f087dda78 Enable zpool events tunables and tests on FreeBSD
We have have made the necessary changes in our module code to expose
zevents through both devd and the zpool events ioctl. Now the tunables
can be exposed and zpool events tests can be enabled on both platforms.

A few minor tweaks to the tests were needed to accommodate the way wc
formats output on FreeBSD.

zed remains to be ported.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10008
2020-02-18 11:22:56 -08:00
Matthew Macy 8b3547a481 Factor out some dbuf subroutines and add state change tracing
Create dedicated dbuf_read_hole and dbuf_read_bonus.
Additionally, add a dtrace probe to allow state change tracing.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Will Andrews <wca@FreeBSD.org>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Authored-by: Will Andrews <wca@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9923
2020-02-18 11:21:37 -08:00
Richard Laager f244846462 Prefer org.openzfs for features and properties
Moving forward, we wish to use org.openzfs (no dash) rather than
org.open-zfs or org.zfsonlinux for feature GUIDs and property names.
The existing feature GUIDs cannot be changed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #10003
2020-02-18 09:36:50 -08:00
Ryan Moeller fb63fc0c03 ZTS: Move cksum to common system commands
The cksum command is used by delegate tests. We have it on FreeBSD,
so it should not have been moved to the Linux commands list.

Move it back to the common commands list.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #10007
2020-02-16 12:49:49 -08:00
DeHackEd d09dc5980c Honour sync=disabled when relinking tpmfiles
Unlinked files don't respect synchronous flush commands, but when they get relinked
their state is unknown. Previously we force flushed all such files even when
sync=disabled. Correct this case.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: DHE <git@dehacked.net>
Closes #10005
2020-02-16 12:44:08 -08:00
InsanePrawn ecbbdac799 Systemd mount generator: Generate noauto units; add control properties
This commit refactors the systemd mount generators and makes the
following major changes:

- The generator now generates units for datasets marked canmount=noauto,
  too. These units are NOT WantedBy local-fs.target.
  If there are multiple noauto datasets for a path, no noauto unit will
  be created. Datasets with canmount=on are prioritized.

- Introduces handling of new user properties which are now included in
  the zfs-list.cache files:
    - org.openzfs.systemd:requires:
      List of units to require for this mount unit
    - org.openzfs.systemd:requires-mounts-for:
      List of mounts to require by this mount unit
    - org.openzfs.systemd:before:
      List of units to order after this mount unit
    - org.openzfs.systemd:after:
      List of units to order before this mount unit
    - org.openzfs.systemd:wanted-by:
      List of units to add a Wants dependency on this mount unit to
    - org.openzfs.systemd:required-by:
      List of units to add a Requires dependency on this mount unit to
    - org.openzfs.systemd:nofail:
      Toggles between a wants and a requires dependency.
    - org.openzfs.systemd:ignore:
      Do not generate a mount unit for this dataset.

  Consult the updated man page for detailed documentation.

- Restructures and extends the zfs-mount-generator(8) man page with the
  above properties, information on unit ordering and a license header.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9649
2020-02-14 15:32:55 -08:00
InsanePrawn 9d2f3b7f94 Systemd mount generator: Silence shellcheck warnings
Silences a warning about an intentionally unquoted variable.
Fixes a warning caused by strings split across lines by slightly
refactoring keyloadcmd.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9649
2020-02-14 15:32:33 -08:00
Jason King 13b5a4d5c0 Support setting user properties in a channel program
This adds support for setting user properties in a
zfs channel program by adding 'zfs.sync.set_prop'
and 'zfs.check.set_prop' to the ZFS LUA API.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Co-authored-by: Sara Hartse <sara.hartse@delphix.com>
Contributions-by: Jason King <jason.king@joyent.com>
Signed-off-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Jason King <jason.king@joyent.com>
Closes #9950
2020-02-14 13:41:42 -08:00
Matthew Ahrens 4fe3a842bb Remove limit on number of async zio_frees of non-dedup blocks
The module parameter zfs_async_block_max_blocks limits the number of
blocks that can be freed by the background freeing of filesystems and
snapshots (from "zfs destroy"), in one TXG.  This is useful when freeing
dedup blocks, becuase each zio_free() of a dedup block can require an
i/o to read the relevant part of the dedup table (DDT), and will also
dirty that block.

zfs_async_block_max_blocks is set to 100,000 by default.  For the more
typical case where dedup is not used, this can have a negative
performance impact on the rate of background freeing (from "zfs
destroy").  For example, with recordsize=8k, and TXG's syncing once
every 5 seconds, we can free only 160MB of data per second, which may be
much less than the rate we can write data.

This change increases zfs_async_block_max_blocks to be unlimited by
default.  To address the dedup freeing issue, a new tunable is
introduced, zfs_max_async_dedup_frees, which limits the number of
zio_free()'s of dedup blocks done by background destroys, per txg.  The
default is 100,000 free's (same as the old zfs_async_block_max_blocks
default).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10000
2020-02-14 08:39:46 -08:00
Ryan Moeller 0f1832106d Make zpool.d/iostat work on FreeBSD
There are slight differences in the iostat commands between FreeBSD and
Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9979
2020-02-14 08:37:40 -08:00
Andrew J. Hesford db0ad393b1 Use POSIX stdout/stderr redirect in configure macro
This PR fixes an issue wherein redirecting stdout and stderr when 
building kernel modules in configure tests relied on a bashism that 
does not work as expected when /bin/sh is not bash.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Andrew J. Hesford <ajh@sideband.org>
Closes #9990 
Closes #9998
2020-02-14 08:30:29 -08:00
Ryan Moeller 4f4ddf98ee ZTS: Misc test fixes for FreeBSD
Add missing logic for FreeBSD to a few test scripts.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9994
2020-02-13 13:52:34 -08:00
Ryan Moeller 71439163bb ZTS: Don't include zpool_create.shlib in zpool_add
The zpool_add tests include zpool_create.shlib for a few silly
variables.

Don't use those variables for the file names. Include zpool_add.kshlib
for whatever variables we still need.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9997
2020-02-13 12:11:25 -08:00
Ryan Moeller 3bdc4f6314 ZTS: Eliminate partitioning from zpool_remove
These tests do not need to use partitions.

Get rid of the partitioning and just use the disks directly.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9996
2020-02-13 12:10:35 -08:00
Ryan Moeller 3e725f0ad2 ZTS: Eliminate partitioning from write_dirs
These tests do not need to use partitions.

Get rid of the partitioning and just use the disks directly.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9995
2020-02-13 12:08:59 -08:00
Ryan Moeller 5ceccda5cb ZTS: Cleanup some cleanup functions
Cleanup functions should make a best effort to clean up as much as
possible.

Do a consistency pass in a bunch of tests to make the cleanup
functions less prone to failure and fix a few typos here and there.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9993
2020-02-13 12:05:32 -08:00
Ryan Moeller 523bc0d548 Fix a typo/whitespace in tests README
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9991
2020-02-13 12:04:47 -08:00
Ryan Moeller b90b01cbc8 ZTS: Use ECKSUM instead of EBADE in libzfs test
Linux defines ECKSUM as EBADE, FreeBSD defines it as EINTEGRITY.

Test for ECKSUM instead of EBADE so we don't have to define EBADE for
this test on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9992
2020-02-13 12:03:01 -08:00
Richard Laager d1d65bb367 zfs-mount-generator: Fix escaping for /
The correct name for the mount unit for / is "-.mount", not ".mount".

Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Antonio Russo <antonio.e.russo@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #9970
2020-02-13 11:55:59 -08:00
Matthew Ahrens f49b7a0d8e fix zstreamdump -C
zstreamdump -C always fails.  It is not calculating the checksums, but
it's still trying to verify that the (non-calculated) checksum matches
the one stored in the send stream.

This change makes zstreamdump -C not verify checksums.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9983
2020-02-13 11:24:57 -08:00
Matthew Ahrens 2adc6b35ae Missed wakeup when growing kmem cache
When growing the size of a (VMEM or KVMEM) kmem cache, spl_cache_grow()
always does taskq_dispatch(spl_cache_grow_work), and then waits for the
KMC_BIT_GROWING to be cleared by the taskq thread.

The taskq thread (spl_cache_grow_work()) does:
1. allocate new slab and add to list
2. wake_up_all(skc_waitq)
3. clear_bit(KMC_BIT_GROWING)

Therefore, the waiting thread can wake up before GROWING has been
cleared.  It will see that the growing has not yet completed, and go
back to sleep until it hits the 100ms timeout.

This can have an extreme performance impact on workloads that alloc/free
more than fits in the (statically-sized) magazines.  These workloads
allocate and free slabs with high frequency.

The problem can be observed with `funclatency spl_cache_grow`, which on
some workloads shows that 99.5% of the time it takes <64us to allocate
slabs, but we spend ~70% of our time in outliers, waiting for the 100ms
timeout.

The fix is to do `clear_bit(KMC_BIT_GROWING)` before
`wake_up_all(skc_waitq)`.

A future investigation should evaluate if we still actually need to
taskq_dispatch() at all, and if so on which kernel versions.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9989
2020-02-13 11:23:02 -08:00
Alexander Motin 465e4e795e Remove duplicate dbufs accounting
Since AVL already has embedded element counter, use dn_dbufs_count
only for dbufs not counted there (bonus buffers) and just add them.
This removes two atomics per dbuf life cycle.

According to profiler it reduces time spent by dbuf_destroy() inside
bottlenecked dbuf_evict_thread() from 13.36% to 9.20% of the core.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9949
2020-02-13 11:20:42 -08:00
Ryan Moeller 610eec452d ZTS: Move user_namespace test to linux.run
Namespaces is a Linux feature not available on other platforms.

Move the user_namespace test out of common.run to linux.run.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9982
2020-02-12 13:06:00 -08:00
Ryan Moeller 834f274fbf ZTS: Interpret env vars in faketty on FreeBSD
This was missed in review. On FreeBSD, script does not understand
environment variables being passed as a command.

Use env to make faketty handle env vars on FreeBSD.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9981
2020-02-12 13:04:51 -08:00
Ryan Moeller 9c536b9a78 ZTS: Fix zdb_display_block on FreeBSD
Missed this in the review, but wc output on FreeBSD is indented,
so string comparisons mismatch when comparing to an unindented number.

Compare counts as integers instead of strings.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9980
2020-02-12 13:01:55 -08:00
Ryan Moeller e7be5c47bd Move zfs_version_kernel to platform code
Linux uses sysfs to determine the module version, FreeBSD uses a
different method.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9978
2020-02-12 13:00:19 -08:00
Ryan Moeller 1bbeb6d755 ZTS: Move zpool_split_wholedisks to linux.run
This test uses the scsi_debug Linux kernel module.

Move the test to linux.run until we have an alternative to scsi_debug
worked out on FreeBSD.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9984
2020-02-12 12:59:28 -08:00
Justin Keogh 12f7b90c93 zdb: Always print symlink target
When zdb is printing paths, also print the symlink target if it exists.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Justin Keogh <commits@v6y.net>
Closes #9925
2020-02-12 11:36:05 -08:00
Christian Schwarz 948f0c4419 zcp: add zfs.sync.bookmark
Add support for bookmark creation and cloning.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9571
2020-02-11 13:19:17 -08:00
Christian Schwarz a73f361fdb Implement bookmark copying
This feature allows copying existing bookmarks using

    zfs bookmark fs#target fs#newbookmark

There are some niche use cases for such functionality,
e.g. when using bookmarks as markers for replication progress.

Copying redaction bookmarks produces a normal bookmark that
cannot be used for redacted send (we are not duplicating
the redaction object).

ZCP support for bookmarking (both creation and copying) will be
implemented in a separate patch based on this work.

Overview:

- Terminology:
    - source = existing snapshot or bookmark
    - new/bmark = new bookmark
- Implement bookmark copying in `dsl_bookmark.c`
  - create new bookmark node
  - copy source's `zbn_phys` to new's `zbn_phys`
  - zero-out redaction object id in copy
- Extend existing bookmark ioctl nvlist schema to accept
  bookmarks as sources
  - => `dsl_bookmark_create_nvl_validate` is authoritative
- use `dsl_dataset_is_before` check for both snapshot
  and bookmark sources
- Adjust CLI
  - refactor shortname expansion logic in `zfs_do_bookmark`
- Update man pages
  - warn about redaction bookmark handling
- Add test cases
  - CLI
  - pyyzfs libzfs_core bindings

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9571
2020-02-11 13:19:12 -08:00
Matthew Macy 7b49bbc816 Address Coverity warnings in #9902
Coverity reports the variable may be NULL, but due to the
way the dirty records are handled this cannot be the case.
Add a comment and VERIFY to make this clear and silence
the warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9962
2020-02-11 13:12:41 -08:00
Brian Behlendorf dceeca5bbd Add missing dmu_buf_unlock_parent() calls to dbuf_read_impl()
As explained by the comment in dbuf_read() and above dbuf_read_impl().
Under all circumstances the parent lock specified by dblt should be
dropped when existing dbuf_read_impl().  This was not being done for
two exist paths.  Additionally, ensure the mutex is unlocked before
dropping the parent lock.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9968
2020-02-10 14:54:12 -08:00
Paul Zuchowski bc67cba7c0 Fix zdb -R with 'b' flag
zdb -R :b fails due to the indirect block being compressed,
and the 'b' and 'd' flag not working in tandem when specified.
Fix the flag parsing code and create a zfs test for zdb -R
block display.  Also fix the zio flags where the dotted notation
for the vdev portion of DVA (i.e. 0.0:offset:length) fails.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9640
Closes #9729
2020-02-10 14:00:05 -08:00
Graham Christensen dda702fd16 bash scripts: use /usr/bin/env for bash shebangs
Not all systems / distros have a `/bin/bash`, and these scripts are
more difficult to run at development time.

For example, my system is NixOS which doesn't have a /bin/bash. This
is not a problem for NixOS building ZFS as a package: the build
environment automatically replaces these shebangs with corrected
paths.

The problem is much more annoying at development time: either the
scripts don't run, or I correct them for my local machine and deal with
a perpetually dirty work tree.

Before committing this patch I confirmed there are existing scripts
which use `/usr/bin/env` to locate bash, so I am thinking this is a
safe transformation.

There are a handful of other shebangs in this repository which don't
work on my system. This patch is useful on its own specifically for
`commitcheck.sh`, otherwise I can't validate my commits before
submission.

Here are the remaining shebangs which NixOS systems won't have:

       1274 #!/bin/ksh -p
         91 #!/bin/ksh
         89 #! /bin/ksh -p
          2 #!/bin/sed -f
          1 #!/usr/bin/perl -w
          1 #!/usr/bin/ksh
          1 #!/bin/nawk -f

plus this which will create an invalid shebang in
`tests/zfs-tests/tests/functional/mv_files/mv_files_common.kshlib`:

        echo "#!/bin/ksh" > $TEST_BASE_DIR/exitsZero.ksh

I chose to leave those alone for now, and gauge the interest in this
much smaller patch first.

The fixes for these are easy enough by simply using `/usr/bin/env ksh`:

         91 #!/bin/ksh
          1 #!/usr/bin/ksh

The fix for the other set is much trickier. Quoting the GNU coreutils
manual:

    Most operating systems (e.g. GNU/Linux, BSDs) treat all text after
    the first space as a single argument. When using env in a script it
    is thus not possible to specify multiple arguments.

and not all `env`'s support arguments.

Mine (GNU Coreutils 8.31) does, though this feature is new since
April 2018, GNU Coreutils 8.30:
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=668306ed86c8c79b0af0db8b9c882654ebb66db2

and worse, requires the -S argument:

    -S, --split-string=S  process and split S into separate arguments;
                          used to pass multiple arguments on shebang
                          lines

Example:

    $ seq 1 2 | $(nix-build '<nixpkgs>' -A coreutils)/bin/env "sort -nr"
    /nix/[...]-coreutils-8.31/bin/env: ‘sort -nr’: No such file or directory
    /nix/[...]-coreutils-8.31/bin/env: use -[v]S to pass options in shebang lines

    $ seq 1 2 | $(nix-build '<nixpkgs>' -A coreutils)/bin/env "-S sort -nr"
    2
    1

GNU Coreutils says FreeBSD's `env` does, though I wonder if FreeBSD's
would be unhappy with the `-S`:
https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html#env-invocation

BusyBox v1.30.1 does not, and does not have a `-S`-like option:

    $ seq 1 2 | $(nix-build '<nixpkgs>' -A busybox)/bin/env "sort -nr"
    env: can't execute 'sort -nr': No such file or directory

Toybox 0.8.1 also does not, and also does not have a `-S` option:

    $ seq 1 2 | $(nix-build '<nixpkgs>' -A toybox)/bin/env "sort -nr"
    env: exec sort -nr: No such file or directory

---

At any rate, if this patch merges and the remaining ~1,500 are updated,
the much larger patch should probably include a checkstyle-like test
asserting all new shebangs use `/usr/bin/env`. I also don't mind
dealing with NixOS weirdness if the project would prefer that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Graham Christensen <graham@grahamc.com>
Closes #9893
2020-02-10 13:13:46 -08:00
Ryan Moeller 57940b435c Share some code for spa deadman tunables
We need to do the same thing to update all spas on any OS for these
tunables, so let's share the code.

While here let's match the types of the literals initializing the
variables with the type of the variable.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9964
2020-02-10 13:11:30 -08:00
Ryan Moeller 572b5b302a ZTS: Test zvol I/O in different volmodes
We found that our zvol code had some issues with volmode=dev that were
not revealed by ZTS.

Add some basic I/O operations to exercise more code paths in the
volmode test.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9953
2020-02-10 13:10:25 -08:00
Attila Fülöp 31b160f0a6 ICP: Improve AES-GCM performance
Currently SIMD accelerated AES-GCM performance is limited by two
factors:

a. The need to disable preemption and interrupts and save the FPU
state before using it and to do the reverse when done. Due to the
way the code is organized (see (b) below) we have to pay this price
twice for each 16 byte GCM block processed.

b. Most processing is done in C, operating on single GCM blocks.
The use of SIMD instructions is limited to the AES encryption of the
counter block (AES-NI) and the Galois multiplication (PCLMULQDQ).
This leads to the FPU not being fully utilized for crypto
operations.

To solve (a) we do crypto processing in larger chunks while owning
the FPU. An `icp_gcm_avx_chunk_size` module parameter was introduced
to make this chunk size tweakable. It defaults to 32 KiB. This step
alone roughly doubles performance. (b) is tackled by porting and
using the highly optimized openssl AES-GCM assembler routines, which
do all the processing (CTR, AES, GMULT) in a single routine. Both
steps together result in up to 32x reduction of the time spend in
the en/decryption routines, leading up to approximately 12x
throughput increase for large (128 KiB) blocks.

Lastly, this commit changes the default encryption algorithm from
AES-CCM to AES-GCM when setting the `encryption=on` property.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Jason King <jason.king@joyent.com>
Reviewed-By: Tom Caputi <tcaputi@datto.com>
Reviewed-By: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #9749
2020-02-10 12:59:50 -08:00
Matthew Macy fa3922df75 Factor out dbuf_sync_bonus
Factor the portion of dbuf_sync_leaf() responsible for handling bonus
buffers out in to its own dbuf_sync_bonus() helper function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9909
2020-02-07 14:22:29 -08:00
Igor K 818d4a87fd ZTS: Add an is_dilos function for future ZTS updates
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9960
2020-02-07 12:32:52 -08:00
Ryan Moeller 9825e7ad1d ZTS: Use wc -c instead of --bytes for portability
FreeBSD does not have the long opts for wc.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9963
2020-02-07 12:31:38 -08:00
Brian Behlendorf 795699a6cc Linux 5.6 compat: timestamp_truncate()
The timestamp_truncate() function was added, it replaces the existing
timespec64_trunc() function.  This change renames our wrapper function
to be consistent with the upstream name and updates the compatibility
code for older kernels accordingly.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9956
Closes #9961
2020-02-07 11:04:32 -08:00
Brian Behlendorf 0dd7364853 Linux 5.6 compat: struct proc_ops
The proc_ops structure was introduced to replace the use of of the
file_operations structure when registering proc handlers.  This
change creates a new kstat_proc_op_t typedef for compatibility
which can be used to pass around the correct structure.

This change additionally adds the 'const' keyword to all of the
existing proc operations structures.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9961
2020-02-07 11:03:53 -08:00
Alexander Motin e0ce98d57c Reduce number of atomic_add() calls in aggsum
Previous code used 4 atomics to do aggsum_flush_bucket() and 2 more to
re-borrow after the flush.  But since asc_borrowed and asc_delta are
accessed only while holding asc_lock, it makes no any sense to modify
as_lower_bound and as_upper_bound in multiple steps.  Instead of that
the new code uses only 2 atomics in all the cases, one per as_*_bound
variable.  I think even that is overkill, simple atomic store and
load could be used here, since all modifications are done under the
as_lock, but there are no such primitives in ZFS code now.

While there, make borrow code consider previous borrow value, so that
on mixed request patterns reduce chance of needing to borrow again if
much larger request follows tiny one that needed borrow.

Also reduce as_numbuckets from uint64_t to u_int.  It makes no sense
to use so large division operation on every aggsum_add().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9930
2020-02-06 13:21:06 -08:00
Romain Dolbeau 77122f9d68 Replace static per-cpu with dynamic per-cpu data
This solves the issue of loading the spl module on RISC-V.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9942
2020-02-06 09:26:13 -08:00
Romain Dolbeau af09c050e9 Fix static data to link with -fno-common
-fno-common is the new default in GCC 10, replacing -fcommon in
GCC <= 9, so static data must only be allocated once.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9943
2020-02-06 09:25:29 -08:00
Ryan Moeller 5b7e6a3617 Fix unknown cc flag -fno-ipa-sra
Clang does not recognize -fno-ipa-sra, so add a check for it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9946
2020-02-06 09:24:37 -08:00
Gerardwx 81acb1edcb Suggest using visudo to edit
Suggest visudo which allows editing the sudoers file in a safe fashion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gerardwx <gerardw@alum.mit.edu>
Closes #9918
2020-02-05 11:31:53 -08:00
Alexander Motin cbd8f5b759 Few microoptimizations to dbuf layer
Move db_link into the same cache line as db_blkid and db_level.
It allows significantly reduce avl_add() time in dbuf_create() on
systems with large RAM and huge number of dbufs per dnode.

Avoid few accesses to dbuf_caches[].size, which is highly congested
under high IOPS and never stays in cache for a long time.  Use local
value we are receiving from zfs_refcount_add_many() any way.

Remove cache_size_bytes_max bump from dbuf_evict_one().  I don't see
a point to do it on dbuf eviction after we done it on insertion in
dbuf_rele_and_unlock().

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9931
2020-02-05 11:08:44 -08:00
Matthew Macy cccbed9f98 Convert dbuf dirty record record list to a list_t
Additionally pull in state machine comments about
upcoming async cow work.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9902
2020-02-05 11:07:19 -08:00
Alexander Motin 741db5a346 Prepare ks_data before calling kstat_install()
It violated sequence described in kstat.h, and at least on FreeBSD
kstat_install() uses provided names to create the sysctls.  If the
names are not available at the time, it ends up bad.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9933
2020-02-04 08:49:12 -08:00
韩朴宇 52c487a0e4 Use -Werror to check if the compiler supports specific options
Be default, clang treats unknown warning option as warning.
We need to use -Werror to make it an error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: 12101111 <w12101111@gmail.com>
Closes #9927
2020-02-04 08:47:37 -08:00
Ryan Moeller 8c4987c489 Restore aclmode and remove acltype on FreeBSD
This replaces the placeholder ZFS_PROP_PRIVATE with ZFS_PROP_ACLMODE,
matching what is done in the NFSv4 ACLs PR (#9709).

On FreeBSD we hide ZFS_PROP_ACLTYPE, while on Linux we hide
ZFS_PROP_ACLMODE.

The tests already assume this arrangement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9913
2020-02-04 08:40:07 -08:00
Ryan Moeller 07bc2bc231 Fix const-correctness in raidz math
Clang warns (errors) that "cast from 'const void *' to 'struct v *'
drops const qualifier."

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9917
2020-02-03 10:52:41 -08:00
Matthew Ahrens ec21397127 async zvol minor node creation interferes with receive
When we finish a zfs receive, dmu_recv_end_sync() calls
zvol_create_minors(async=TRUE).  This kicks off some other threads that
create the minor device nodes (in /dev/zvol/poolname/...).  These async
threads call zvol_prefetch_minors_impl() and zvol_create_minor(), which
both call dmu_objset_own(), which puts a "long hold" on the dataset.
Since the zvol minor node creation is asynchronous, this can happen
after the `ZFS_IOC_RECV[_NEW]` ioctl and `zfs receive` process have
completed.

After the first receive ioctl has completed, userland may attempt to do
another receive into the same dataset (e.g. the next incremental
stream).  This second receive and the asynchronous minor node creation
can interfere with one another in several different ways, because they
both require exclusive access to the dataset:

1. When the second receive is finishing up, dmu_recv_end_check() does
dsl_dataset_handoff_check(), which can fail with EBUSY if the async
minor node creation already has a "long hold" on this dataset.  This
causes the 2nd receive to fail.

2. The async udev rule can fail if zvol_id and/or systemd-udevd try to
open the device while the the second receive's async attempt at minor
node creation owns the dataset (via zvol_prefetch_minors_impl).  This
causes the minor node (/dev/zd*) to exist, but the udev-generated
/dev/zvol/... to not exist.

3. The async minor node creation can silently fail with EBUSY if the
first receive's zvol_create_minor() trys to own the dataset while the
second receive's zvol_prefetch_minors_impl already owns the dataset.

To address these problems, this change synchronously creates the minor
node.  To avoid the lock ordering problems that the asynchrony was
introduced to fix (see #3681), we create the minor nodes from open
context, with no locks held, rather than from syncing contex as was
originally done.

Implementation notes:

We generally do not need to traverse children or prefetch anything (e.g.
when running the recv, snapshot, create, or clone subcommands of zfs).
We only need recursion when importing/opening a pool and when loading
encryption keys.  The existing recursive, asynchronous, prefetching code
is preserved for use in these cases.

Channel programs may need to create zvol minor nodes, when creating a
snapshot of a zvol with the snapdev property set.  We figure out what
snapshots are created when running the LUA program in syncing context.
In this case we need to remember what snapshots were created, and then
try to create their minor nodes from open context, after the LUA code
has completed.

There are additional zvol use cases that asynchronously own the dataset,
which can cause similar problems.  E.g. changing the volmode or snapdev
properties.  These are less problematic because they are not recursive
and don't touch datasets that are not involved in the operation, there
is still potential for interference with subsequent operations.  In the
future, these cases should be similarly converted to create the zvol
minor node synchronously from open context.

The async tasks of removing and renaming minors do not own the objset,
so they do not have this problem.  However, it may make sense to also
convert these operations to happen synchronously from open context, in
the future.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-65948
Closes #7863
Closes #9885
2020-02-03 09:33:14 -08:00
Prakash Surya 147622b2f1 Disable "-fipa-sra" when using "--enable-debuginfo"
To better enable dynamic tracing tools (e.g. "bpftrace") this change
disables the "-fipa-sra" compilation optimization. This way, function
signatures are not changed by the compiler, which allows us to better
attach to kprobes and kretprobes with dynamic tracing tools. Otherwise,
the compiler may append ".isra" to the function name, and possibly
change the function arguments as well.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9921
2020-02-03 09:21:05 -08:00
Ryan Moeller 591505dc27 ZTS: Only use ext4 on Linux
And while here, add a workaround for FreeBSD to ensure dirty data is
synced before taking a snapshot, by remounting read-only, then remount
again read-write after the snapshot.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9914
2020-01-31 09:00:47 -08:00
Ryan Moeller fe7c15985b Left-align index props
Index type props display as strings, which should be aligned to the
left not to the right.

Before:
```
FreeBSD-13_0-CURRENT-r356528 ➜  ~ zfs list -ro name,aclmode,mountpoint
NAME        ACLMODE  MOUNTPOINT
p0      passthrough  /p0
p0/foo      discard  /p0/foo
```

After:
```
FreeBSD-13_0-CURRENT-r356528 ➜  ~ zfs list -ro name,aclmode,mountpoint
NAME    ACLMODE      MOUNTPOINT
p0      passthrough  /p0
p0/foo  discard      /p0/foo
```

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9912
2020-01-31 08:55:51 -08:00
Ryan Moeller 12430796d6 Use the correct spelling of 'jailed' in tests
FreeBSD has jails, not zones.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9899
2020-01-31 08:52:29 -08:00
Ryan Moeller a3bddd49f8 ZTS: Fix a few defaults
Linux was missing a default value for DEV_DSKDIR. Set it to /dev.
Fix resulting fallout.

SLICE_PREFIX seems like a good candidate for including in the defaults.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9898
2020-01-31 08:51:23 -08:00
Ryan Moeller 9d8ce2457d ZTS: FreeBSD tunable for DISABLE_IVSET_GUID_CHECK
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9907
2020-01-29 11:25:15 -08:00
Ryan Moeller 395a6b19d3 ZTS: Simplify zero_partitions on FreeBSD
We can avoid a great deal of `sleep 3` by simply destroying the whole
partition table in one shot with gpart.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9908
2020-01-29 11:24:18 -08:00
Ryan Moeller 74b4d349b3 ZTS: Reverse constrained path lookup order
FreeBSD base system zfs utils are in /sbin. ZoF utils install to
/usr/local/sbin.

Ensure we link to the ZoF utils not the base utils when searching for
utils to constrain paths to for the tests.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9906
2020-01-29 11:23:20 -08:00
Ryan Moeller 0ecd910923 ZTS: Don't use edonr on FreeBSD
FreeBSD doesn't support feature@edonr.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9901
2020-01-28 08:38:02 -08:00
Ryan Moeller 7a298ae975 ZTS: Eliminate random and shuf, consolidate code
Both GNU and FreeBSD sort have -R to randomize input.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9900
2020-01-28 08:36:33 -08:00
Graham Christensen 25df8fb42f Add an .editorconfig; document git whitespace settings
Most of the projects I work on don't use tabs, and while authoring my
first patch I had to wrestle with my editor to not introduce
whitespace editors.

The `.editorconfig` file is supported by a large number of editors
out of the box, and many more with plugins.

As a first-time contributor, I can't say for certain these settings
are totally correct, but thus far git and my editor are satisfied
enough.

I considered adding `git config --local format.signOff true` but
wanted to respect the warning:

    format.signOff
        A boolean value which lets you enable the -s/--signoff
        option of format-patch by default.  Note: Adding the
        Signed-off-by: line to a patch should be a conscious act and
        means that you certify you have the rights to submit this
        work under the same open source license. Please see the
        SubmittingPatches document for further discussion.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Graham Christensen <graham@grahamc.com>
Closes #9892
2020-01-27 13:32:52 -08:00
Ryan Moeller e4c439021f ZTS: Move more tests out of common.run
These tests won't run on all platforms as currently implemented:
 * add_nested_replacing_spare (needs zed)
 * fault (needs zed)
 * mmp (needs multihost_history)
 * umount_unlink_drained (needs procfs)
 * zpool_expand (needs udev events and zed)
 * zpool_reopen (needs scsi_debug)
 * zvol_swap_003_pos (needs to modify vfstab)
 * zvol_swap_00[56]_pos (needs swaphigh/swaplen)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9871
2020-01-27 13:29:25 -08:00
Ned Bass a3403164d7 zdb: add support for object ranges for zdb -d
Allow a range of object identifiers to dump with -d. This may
be useful when dumping a large dataset and you want to break
it up into multiple phases, or to resume where a previous scan
left off. Object type selection flags are supported to reduce
the performance overhead of verbosely dumping unwanted objects,
and to reduce the amount of post-processing work needed to
filter out unwanted objects from zdb output.

This change extends existing syntax in a backward-compatible
way. That is, the base case of a range is to specify a single
object identifier to dump. Ranges and object identifiers can
be intermixed as command line parameters.

Usage synopsis:

    Object ranges take the form <start>:<end>[:<flags>]
        start    Starting object number
        end      Ending object number, or -1 for no upper bound
        flags    Optional flags to select object types:
         A    All objects (this is the default)
         d    ZFS directories
         f    ZFS files
         m    SPA space maps
         z    ZAPs
         -    Negate effect of next flag

Examples:

 # Dump all file objects
 zdb -dd tank/fish 0:-1:f

 # Dump all file and directory objects
 zdb -dd tank/fish 0:-1:fd

 # Dump all types except file and directory objects
 zdb -dd tank/fish 0:-1:A-f-d

 # Dump object IDs in a specific range
 zdb -dd tank/fish 1000:2000

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #9832
2020-01-24 11:00:46 -08:00
Tony Nguyen 8e9e90bba3 Performance tests, some variables missing PERF_ prefix
Adding the expected PERF_ prefix to RANDSEED, COMPPERCENT,
and COMPCHUNK.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #9877
2020-01-23 21:18:02 -08:00
John Wren Kennedy 0d37c2bb2e ZTS: zpool offline requires whole disk name
When used with non-loop devices, zdb_004_pos fails because the disk
argument provided is the partition rather than the expected whole disk
name.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #9876
2020-01-23 21:14:55 -08:00
Christian Schwarz 20ea8540a6 dsl_bookmark_create_check: fix NULL pointer deref if dbca_errors == NULL
Discovered in preparation of zcp support for creating bookmarks.
Handle the case where dbca_errors is NULL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9880
2020-01-23 21:13:42 -08:00
Christian Schwarz 3aea3c9d54 entity_namecheck: doc comment: include space as allowed character
The helper function valid_char already allows it but
the doc comment was out of date.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9879
2020-01-23 21:11:54 -08:00
Christian Schwarz 603059e41a scripts/zfs-test.sh: example for -t
Add an example for running a single test case.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9878
2020-01-23 21:11:05 -08:00
Ryan Moeller dbfec5cc09 ZTS: Get xattr tests running on FreeBSD
This mostly involves reworking platform checks to make illumos the
exception (thanks to their unusual way of exposing xattrs). Other
platforms are able to take advantage of the recently added xattr
wrappers in libtest.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9872
2020-01-23 17:14:40 -08:00
Romain Dolbeau 35b07497c6 Add AltiVec RAID-Z
Implements the RAID-Z function using AltiVec SIMD.
This is basically the NEON code translated to AltiVec.

Note that the 'fletcher' algorithm requires 64-bits
operations, and the initial implementations of AltiVec
(PPC74xx a.k.a. G4, PPC970 a.k.a. G5) only has up to
32-bits operations, so no 'fletcher'.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9539
2020-01-23 11:01:24 -08:00
Ryan Moeller 1a69856034 ZTS: Add missing export for DEV_DSKDIR default
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9873
2020-01-23 09:38:09 -08:00
Christian Schwarz 0ea03c7c82 dmu_send: redacted: fix memory leak on invalid redaction/from bookmark
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9867
2020-01-23 09:34:31 -08:00
Christian Schwarz f658f61c72 cmd/zfs: redact: better error message for common usage errors
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9867
2020-01-23 09:33:53 -08:00
Christian Schwarz 7b53e2e5a9 cmd/zfs: send: meaningful error message for incorrect redaction bookmark
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #9867
2020-01-23 09:33:10 -08:00
Matthew Macy 3d91490f7c Simplify FreeBSD's locking requirements in zfs_replay.c
Now that the FreeBSD zfs_vnops code avoids asserting that
a vnode lock is held when z_replay is true we can limit
the FreeBSD specific changes to the couple of changes
where it is necessary to drop the vnode locks because
a function returns with it held.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9865
2020-01-22 17:55:56 -08:00
Jason King e2ef1cbf04 Support inheriting properties in channel programs
This adds support in channel programs to inherit properties analogous
to `zfs inherit` by adding `zfs.sync.inherit` and `zfs.check.inherit`
functions to the ZFS LUA API.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason King <jason.king@joyent.com>
Closes #9738
2020-01-22 17:03:17 -08:00
Richard Laager 79add96766 Order zfs-import-*.service after multipathd
If someone is using both multipathd and ZFS, they are probably using
them together.  Ordering the zpool imports after multipathd is ready
fixes import issues for multipath configurations.

Tested-by: Mike Pastore <mike@oobak.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #9863
2020-01-22 12:45:25 -08:00
Matthew Macy 5206b8228e Disable get_numeric_property for xattr on FreeBSD
FreeBSD doesn't have a mount flag for determining the
disposition of xattr. Disable so that it is fetched
by the default route so that 'zfs get xattr' returns
the correct value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9862
2020-01-21 15:06:10 -08:00
Matthew Macy af26a86958 Update tunable macro usage for disable_ivset_guid_check
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9861
2020-01-21 15:05:23 -08:00
Matthew Macy d3c1e45b7a Re-consolidate zio_delay_interrupt
With recent SPL changes there is no longer any need for a per
platform version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9860
2020-01-21 15:04:13 -08:00
Ryan Moeller 09436c5d88 ZTS: Eliminate partitioning in zpool_import setup
There doesn't seem to be a need for this complexity.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9854
2020-01-17 12:41:45 -08:00
Ryan Moeller 914bd02f29 ZTS: Make DVA pattern in zdb tests more robust
Ensure the capture ends at the first DVA in case there are multiple
DVAs on the same line by only capturing up to the first '>' character.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9851
2020-01-17 12:41:03 -08:00
Brian Behlendorf 70835c5b75 Unify target_cpu handling
Over the years several slightly different approaches were used
in the Makefiles to determine the target architecture.  This
change updates both the build system and Makefile to handle
this in a consistent fashion.

TARGET_CPU is set to i386, x86_64, powerpc, aarch6 or sparc64
and made available in the Makefiles to be used as appropriate.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9848
2020-01-17 12:40:09 -08:00
Brian Behlendorf e5030fbc28 ZTS: Enable zpool_create_008_pos.ksh
Remove the blkid version check from zpool_create_008_pos.ksh
so the test case will not be skipped.

All versions of blkid tested by the CI are either new enough
to not suffer from this issue, or have been patched as is
the case with CentOS 7 (libblkid-2.23.2-61).

Additionally, add a block_device_wait after device partitioning
to ensure the expected partitions will exist.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9853
2020-01-17 12:38:13 -08:00
Ryan Moeller 31712a7ea8 ZTS: Fix incorrect is_physical_device usage
This check isn't meant to be used for command substitution.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9844
2020-01-16 13:26:26 -08:00
Paul Zuchowski f12e42cccf zdb -d should accept the numeric objset id
As an alternative to the dataset name, zdb now allows the decimal 
or hexadecimal objset ID to be specified.  When permanent errors
are reported as 2 hexadecimal numbers (objset ID : object ID) in 
zpool status; you can now use 'zdb <pool>[/objset ID] object' to
determine the names of the objset and object which have the error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9733
2020-01-16 09:22:49 -08:00
Tony Nguyen 1b64627e73 ZFS performance suite should use JSON fio output
Making the default FIO output format be JSON thus easier to post process
performance results. To get previous 'normal' output format,
PERF_FIO_FORMAT can be set prior to invoking zfs-tests.sh. For example:

'PERF_FIO_FORMAT=normal ./zfs-tests.sh -T perf -r ./runfiles/perf.run'

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #9847
2020-01-16 09:16:16 -08:00
Ryan Moeller 1bb5f5e2b4 ZTS: Fix ksh path in btree tests
Every other ksh script has /bin/ksh in the shebang.
If we're going to assume a path, we can at least be consistent.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9845
2020-01-15 16:23:29 -08:00
Ryan Moeller 6f481612f3 ZTS: Avoid using PCRE with grep in zdb tests
On FreeBSD grep does not support Perl extensions

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9841
2020-01-15 09:27:22 -08:00
Ryan Moeller 83c30248ef ZTS: Fix is_physical_device on FreeBSD
This should have been using egrep.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9840
2020-01-15 09:26:26 -08:00
Ryan Moeller 900010daf1 ZTS: Remove some path workarounds for FreeBSD
These are no longer needed after fixing device name matching for whole
disks in libzutil on FreeBSD.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9839
2020-01-15 09:24:55 -08:00
Ryan Moeller 2476f10306 ZTS: Catalog tunable names for tests in tunables.cfg
Update tests to use the variables for tunable names.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9831
2020-01-14 14:57:28 -08:00
Tom Caputi 61152d1069 Fix errata #4 handling for resuming streams
Currently, the handling for errata #4 has two issues which allow
the checks for this issue to be bypassed using resumable sends.
The first issue is that drc->drc_fromsnapobj is not set in the
resuming code as it is in the non-resuming code. This causes
dsl_crypto_recv_key_check() to skip its checks for the
from_ivset_guid. The second issue is that resumable sends do not
clean up their on-disk state if they fail the checks in
dmu_recv_stream() that happen before any data is received.

As a result of these two bugs, a user can attempt a resumable send
of a dataset without a from_ivset_guid. This will fail the initial
dmu_recv_stream() checks, leaving a valid resume state. The send
can then be resumed, which skips those checks, allowing the receive
to be completed.

This commit fixes these issues by setting drc->drc_fromsnapobj in
the resuming receive path and by ensuring that resumablereceives
are properly cleaned up if they fail the initial dmu_recv_stream()
checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9818 
Closes #9829
2020-01-14 12:25:20 -08:00
Richard Laager f744f36ce5 Document zfs change-key caveats
As discussed on the 2019-01-07 OpenZFS Leadership Meeting, we need to be
clear about the limitations of `zfs change-key`.  Changing the user key
does not change the master key, nor does it currently overwrite the old
wrapped master key on disk.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #9819
2020-01-14 10:11:07 -08:00
loli10K 7e2da7786e KMC_KVMEM disrupts kv_alloc() memory alignment expectations
On kernels with KASAN enabled the following failure can be observed as
soon as the zfs module is loaded:

  VERIFY(IS_P2ALIGNED(ptr, PAGE_SIZE)) failed
  PANIC at spl-kmem-cache.c:228:kv_alloc()

The problem is kmalloc() has never guaranteed aligned allocations; this
requirement resulted in zfsonlinux/spl@8b45dda which removed all
kmalloc() usage in kv_alloc().

Until a GFP_ALIGNED flag (or equivalent functionality) is provided by
the kernel this commit partially reverts 66955885 and 6d948c35 to
prevent k(v)malloc() allocations in kv_alloc().

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Michael Niewöhner <foss@mniewoehner.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9813
2020-01-14 09:09:59 -08:00
Kyle Evans 68a192e4b7 libzfs: add zfs_mount_at() function
zfs_mount_at() mounts a dataset at an arbitrary mountpoint rather than
at the configured mountpoint. This may be used by consumers that wish to
temporarily expose a dataset at another mountpoint without altering
dataset/pool properties.

This will be used by FreeBSD's libbe be_mount(), which mounts a boot
environment at an arbitrary mountpoint.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #9833
2020-01-14 08:49:54 -08:00
Brian Behlendorf e458fcca75 Change http://zfsonlinux.org links to https://zfsonlinux.org
Update the project website links contained in to repository to
reference the secure https://zfsonlinux.org address.

Reviewed-By: Richard Laager <rlaager@wiktel.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9837
2020-01-13 16:43:59 -08:00
Brian Behlendorf a4c3e3c74f ZTS: Remove obsolete zts-report.py exceptions
The disk_reason and udev_reason exceptions can be removed since
they apply to now unsupported kernel versions (<v3.10).

The checks in the test cases were kept for the purposes of
documentation and as useful sanity checks for the test environment.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9828
2020-01-13 16:42:23 -08:00
Ryan Moeller 602f667285 ZTS: Clean up properties.shlib a bit
Fixes the last property having an empty value on FreeBSD and makes the
code a bit more readable.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9834
2020-01-13 15:21:42 -08:00
Ryan Moeller 6e1c594d64 ZTS: Create xattr helpers to hide platform
Create xattr helpers to hide platform and update usage in tests.

This does not generally aim to enable all xattr tests yet, but it is a
necessary step in that direction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9826
2020-01-10 13:24:59 -08:00
Tom Caputi ba0ba69e50 Add 'zfs send --saved' flag
This commit adds the --saved (-S) to the 'zfs send' command.
This flag allows a user to send a partially received dataset,
which can be useful when migrating a backup server to new
hardware. This flag is compatible with resumable receives, so
even if the saved send is interrupted, it can be resumed.
The flag does not require any user / kernel ABI changes or any
new feature flags in the send stream format.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Reviewed-by: Christian Schwarz <me@cschwarz.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9007
2020-01-10 10:16:58 -08:00
Brian Behlendorf 9ab6109fb5 ZTS: Improve zts-auto_offline_001_pos
The zts-auto_offline_001_pos test could exceed the 10 minute test
limit and be KILLED by the test infrastructure.  To prevent this
speed up the test case by:

* Removing redundant pool configurations.  Each of the following
  vdev types is tested once: mirror, raidz, cache, and special.

* The block_device_wait function need only wait on the block
  device which has been removed as part of the test.

Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9827
2020-01-10 09:14:49 -08:00
Kjeld Schouten-Lebbing 36e5b4a94b Performance tests, fio enhancements
- Set fixed chunk pattern, for sane compression
- Adjust buffer to blocksize, for cross blocksize repeatability
- Use fixed seed, for improved repeatability
- Move comp-percent and comp-chunk to variables
- set variables (mostly) to old defaults

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9793
2020-01-09 11:21:33 -08:00
Ryan Moeller 90ae48733c ZTS: Provide an alternative to shuf for FreeBSD
There was a shuf package but the upstream for the port has recently
disappeared, so it is no longer available.

Create a function to hide the usage of shuf. Implement using seq|random
on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9824
2020-01-09 09:31:17 -08:00
Nick Black 4abd7d80b2 Remove parameter names from libzfs.h signatures
Most of libzfs.h doesn't provide names for the parameters
in its signatures. These few functions included them. That
wouldn't be a problem, per se, but the 'lines' parameter
conflicts with the 'lines' #define from terminfo's term.h,
present for at least a decade. This makes it difficult to
compile code making use of both ZFS and terminfo.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Nick Black <dankamongmen@gmail.com>
Closes #9821
2020-01-08 17:50:05 -08:00
Ryan Moeller f8d55b95a5 ZTS: Eliminate functions named 'random'
The name overlaps with a command needed by FreeBSD.
There is also no sense having two 'random' functions that do nearly
the same thing, so consolidate to just the more general one and name
it 'random_int_between'.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9820
2020-01-08 09:08:30 -08:00
lorenz 028e3b3b1a Avoid here-documents in systemd mount generator
On some systems - openSUSE, for example - there is not yet a writeable
temporary file system available, so bash bails out with an error,

  'cannot create temp file for here-document: Read-only file system',

on the here documents in zfs-mount-generator. The simple fix is to
change these into a multi-line echo statement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Richard Laager <rlaager@wiktel.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes #9802
2020-01-07 17:03:00 -08:00
DeHackEd 67709516db zfs-module-parameters(5): add all missing items
I ran a report against the output of `modinfo zfs.ko`. This commit adds
everything missing and corrects a few renamed module parameters.
Specifically:

* zfs_checksums_per second renamed in ad796b8a3
* vdev_ms_count_limit renamed in c853f382d

Also fixes some variable type inconsistencies (unsigned int => uint)

Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #9809
2020-01-07 12:41:53 -08:00
Ryan Moeller 798b2b8477 ZTS: Add sunos.run to dist data
This was missed when the file was introduced.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9815
2020-01-07 12:38:12 -08:00
loli10K c24fa4b19a Fix "zpool add -n" for dedup, special and log devices
For dedup, special and log devices "zpool add -n" does not print
correctly their vdev type:

~# zpool add -n pool dedup /tmp/dedup special /tmp/special log /tmp/log
would update 'pool' to the following configuration:
	pool
	  /tmp/normal
	  /tmp/dedup
	  /tmp/special
	  /tmp/log

This could lead storage administrators to modify their ZFS pools to
unexpected and unintended vdev configurations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9783 
Closes #9390
2020-01-06 15:40:06 -08:00
Brian Behlendorf bc9cef11fd Fix QAT allocation failure return value
When qat_compress() fails to allocate the required contiguous memory
it mistakenly returns success.  This prevents the fallback software
compression from taking over and (un)compressing the block.

Resolve the issue by correctly setting the local 'status' variable
on all exit paths.  Furthermore, initialize it to CPA_STATUS_FAIL
to ensure qat_compress() always fails safe to guard against any
similar bugs in the future.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9784
Closes #9788
2020-01-06 11:17:53 -08:00
Brian Behlendorf 581ca28169 ZTS: Cleanup partition tables
The cleanup_devices function should remove any partitions created
on the device and force the partition table to be reread.  This
is needed to ensure that blkid has an up to date version of what
devices and partitions are used by zfs.

The cleanup_devices call was removed from inuse_008_pos.ksh since
it operated on partitions instead of devices and was not needed.

Lastly ddidecode may be called by parted and was therefore added
to the constrained path.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9806
2020-01-06 11:14:19 -08:00
Brian Behlendorf 33dc49e0c0 Fix ZPOOL_VDEV_NAME_PATH option description
The corresponding zpool status option is -P and not -p.  Update
this description to reference the correct option.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9803
2020-01-06 10:43:32 -08:00
Ryan Moeller 665684d721 ZTS: Add helper for disk device check
Replace `test -b` and equivalents with `is_disk_device`, so that `-c`
is used instead on FreeBSD which has no block cache layer for devices.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9795
2020-01-03 09:10:17 -08:00
Ryan Moeller d7164b27be ZTS: Move dumpdev tests to sunos.run
Neither FreeBSD nor Linux support dumping to zvols.

DilOS still uses these tests, so the files are kept and the tests have
been relocated to sunos.run.

An `is_illumos` function was added to libtest.shlib to eliminate some
awkward platform checks.

A few functions that are not expected to be used outside of illumos
have been sanitized of extraneous FreeBSD adaptations.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9794
2020-01-03 09:08:23 -08:00
Brian Behlendorf cc618d179e Static symbols exported by ICP
The crypto_cipher_init_prov and crypto_cipher_init are declared static
and should not be exported by the ICP.  This resolves the following
warnings observed when building with the 5.4 kernel.

WARNING: "crypto_cipher_init" [.../icp] is a static EXPORT_SYMBOL
WARNING: "crypto_cipher_init_prov" [.../icp] is a static EXPORT_SYMBOL

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9791
2020-01-02 18:08:45 -08:00
Ryan Moeller 54007c791f Add FreeBSD core handling in zloop.sh
And use the correct path to libtool and ztest.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9790
2020-01-02 13:48:06 -08:00
Kjeld Schouten-Lebbing 82e996c261 Improve Pull Request guidelines
- Splits PR advice into two sections.
- Add "co-authored-by" instructions.
- Add description of draft PR and when using it is appropriate.
- Reword ZFS Test Suite checklist question.
- Link to zfs-tests.sh and zloop.sh.

Reviewed-By: Marcel Schilling <marcel.schilling@mdc-berlin.de>
Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Richard Laager <rlaager@wiktel.com>
Co-Authored-By: Marcel Schilling <marcel.schilling@mdc-berlin.de>
Co-Authored-By: Brian Behlendorf <behlendorf1@llnl.gov>
Co-Authored-By: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9753
2019-12-30 09:24:41 -08:00
Ned Bass 8b3438e503 zdb: print block checksums with 6 d's of verbosity
Include checksums in the output of 'zdb -dddddd' along
with other indirect block information already displayed.

Example output follows (with long lines trimmed):

$ zdb -dddddd tank/fish 128
Dataset tank/fish [ZPL], ID 259, cr_txg 10, 16.2M, 93 objects, rootbp DV

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
       128    2   128K   128K   634K     512     1M  100.00  ZFS plain f
                                               168   bonus  System attri
    dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
    dnode maxblkid: 7
    path    /c
    uid     0
    gid     0
    atime    Sat Dec 21 10:49:26 2019
    mtime    Sat Dec 21 10:49:26 2019
    ctime    Sat Dec 21 10:49:26 2019
    crtime    Sat Dec 21 10:49:26 2019
    gen    41
    mode    100755
    size    964592
    parent    34
    links    1
    pflags    40800000104
Indirect blocks:
               0 L1  0:2c0000:400 0:c021e00:400 20000L/400P F=8 B=41/41
               0  L0 0:227800:13800 20000L/13800P F=1 B=41/41 cksum=167a
           20000  L0 0:25ec00:17c00 20000L/17c00P F=1 B=41/41 cksum=2312
           40000  L0 0:276800:18400 20000L/18400P F=1 B=41/41 cksum=24e0
           60000  L0 0:2a7800:18800 20000L/18800P F=1 B=41/41 cksum=25be
           80000  L0 0:28ec00:18c00 20000L/18c00P F=1 B=41/41 cksum=2579
           a0000  L0 0:24d000:11c00 20000L/11c00P F=1 B=41/41 cksum=140a
           c0000  L0 0:23b000:12000 20000L/12000P F=1 B=41/41 cksum=164e
           e0000  L0 0:221e00:5a00 20000L/5a00P F=1 B=41/41 cksum=9de790

        segment [0000000000000000, 0000000000100000) size    1M

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #9765
2019-12-30 09:14:40 -08:00
Ben Cordero 153db76197 zfs-load-key.sh: ${ZFS} is not the zfs binary
A change[1] was merged yesterday that should refer
to the zfs binary in the initramfs, but is actually
an unset shell variable.

This commit changes this line to call `zfs` directly
like the surrounding code.

[1]: cb5b875b273235a4a3ed28e16f416d5bb8865166

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ben Cordero <bencord0@condi.me>
Closes #9780
2019-12-29 11:25:00 -08:00
Brian Behlendorf 2c47bd575a ZTS: Fix pool_state cleanup
The externally faulted vdev should be brought back online and have
its errors cleared before the pool is destroyed.  Failure to do so
will leave a vdev with a valid active label.  This vdev may then
not be used to create a new pool without the -f flag potentially
leading to subsequent test failures.

Additionally remove an unreachable log_pass from setup.ksh.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9777
2019-12-28 08:43:23 -08:00
Brian Behlendorf edb24bec3b ZTS: Replace /var/tmp with $TEST_BASE_DIR
Remove a few hardcoded instances of /var/tmp.  This should use
the $TEST_BASE_DIR in order to allow the ZTS to be optionally
run in an alternate directory using `zfs-tests.sh -d <path>`.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9775
2019-12-27 16:41:16 -08:00
Brian Behlendorf a6f6ef8bda ZTS: zfs_program_json
As of Python 3.5 the default behavior of json.tool was changed to
preserve the input order rather than lexical order.  The test case
expects the output to be sorted so apply the --sort-keys option
to the json.tool command when using Python 3.5 and the option is
supported.

    https://docs.python.org/3/library/json.html#module-json.tool

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9774
2019-12-27 12:12:41 -08:00
Brian Behlendorf 590ff61a55 ZTS: devices_001_pos and devices_002_neg
Update the devices_001_pos and devices_002_neg test cases such that the
special block device file created is backed by a ZFS volume.  Specifying
a specific device allows the major and minor numbers to be easily
determined.  Furthermore, this avoids the potentially dangerous behavior
of opening the first block device we happen to find under /dev/.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9773
2019-12-27 12:11:27 -08:00
Steve Mokris d5c97f3de7 Avoid some crashes when importing a pool with corrupt metadata
- Skip invalid DVAs when importing pools in readonly mode
  (in addition to when the config is untrusted).

- Upon encountering a DVA with a null VDEV, fail gracefully
  instead of panicking with a NULL pointer dereference.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Steve Mokris <smokris@softpixel.com>
Closes #9022
2019-12-26 10:57:05 -08:00
sam-lunt ad353e2147 In initramfs, do not prompt if keylocation is "file://"
If the encryption key is stored in a file, the initramfs should not
prompt for the password. For example, this could be the case if the boot
partition is stored on removable media that is only present at boot time

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Sam Lunt <samuel.j.lunt@gmail.com>
Closes #9764
2019-12-26 10:55:20 -08:00
Nick Black 8cda5c5ce9 libspl: declare aok extern in header
Rather than defining a new instance of 'aok' in every compilation
unit which includes this header, there is a single instance
defined in zone.c, and the header now only declares an extern.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Nick Black <dankamongmen@gmail.com>
Closes #9752
2019-12-26 10:52:14 -08:00
Brian Behlendorf 635a01aafd Cancel initialize and TRIM before vdev_metaslab_fini()
Any running 'zpool initialize' or TRIM must be cancelled prior
to the vdev_metaslab_fini() call in spa_vdev_remove_log() which
will unload the metaslabs and set ms->ms_group == NULL.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8602
Closes #9751
2019-12-26 10:50:23 -08:00
Brian Behlendorf 80bde2c4ba ZTS: Test case failures
* large_dnode_008_pos - Force a pool sync before invoking zdb to
  ensure the updated dnode blocks have been persisted to disk.

* refreserv_raidz - Wait for the /dev/zvol links to be both created
  and removed, this is important because the same device volume
  names are being used repeatedly.

* btree_test - Add missing .gitignore file for btree_test binary.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9769
2019-12-26 10:49:07 -08:00
Brian Behlendorf c587b2c7d8 Update maximum kernel version to 5.4
Increase the maximum supported kernel version to 5.4.  This was
verified using the Fedora 5.4.2-300.fc31.x86_64 kernel.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9754
Closes #9759
2019-12-23 14:24:36 -08:00
Ryan Moeller 54aefa6abf Fix test pattern in zts-report.py
The pattern was not updated to match when the test output changed to
include a platform identifier for platform specific tests.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9750
2019-12-20 09:31:59 -08:00
Tony Hutter 9fb2771aa5 Colorize zpool status output
If the ZFS_COLOR env variable is set, then use ANSI color
output in zpool status:

- Column headers are bold
- Degraded or offline pools/vdevs are yellow
- Non-zero error counters and faulted vdevs/pools are red
- The 'status:' and 'action:' sections are yellow if they're
  displaying a warning.

This also includes a new 'faketty' function in libtest.shlib that is
compatible with FreeBSD (code provided by @freqlabs).

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #9340
2019-12-19 16:26:07 -08:00
Brian Behlendorf 5e8ac05590 ZTS: Various test case fixes
* devices_001_pos and devices_002_neg - Failing after FreeBSD ZTS
  merged due to missing 'function' keyword for create_dev_file_linux.

* pool_state - Occasionally fails due to an insufficient delay
  before checking 'zpool status'.  Increasing the delay from 1 to 3
  seconds resolved the issue in local testing.

* procfs_list_basic - Fails when run in-tree because the logged
  command is actually 'lt-zfs'.  Updated the regex accordingly.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9748
2019-12-19 15:32:56 -08:00
John Wren Kennedy 523fc80069 Tests for btree implementation used by range trees
Additional test cases for the btree implementation, see #9181.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #9717
2019-12-19 11:53:54 -08:00
Ryan Moeller a3640486ff Update zfs.sh work on FreeBSD
Extend the zfs.sh script to load and unload zfs kmods on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9746
2019-12-19 09:31:16 -08:00
Brian Behlendorf d16a207f2e cppcheck: (warning) Possible null pointer dereference: nvh
Move the 'nvh = (void *)buf' assignment after the 'buf == NULL'
check to resolve the warning.  Interestingly, cppcheck 1.88
correctly determines that the existing code is safe, while
cppcheck 1.86 reports the warning.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:25:57 -08:00
Brian Behlendorf 487bddad67 cppcheck: (error) Address of local auto-variable assigned
Suppress autoVariables warnings in the lua interpreter.  The usage
here while unconventional in intentional and the same as upstream.

[module/lua/ldebug.c:327]: (error) Address of local auto-variable
    assigned to a function parameter.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:25:42 -08:00
Brian Behlendorf 1e49b288cb cppcheck: (error) Null pointer dereference: who_perm
As indicated by the VERIFY the local who_perm variable can never
be NULL in parse_fs_perm().  Due to the existence of the is_set
conditional, which is always true, cppcheck 1.88 was reporting
a possible NULL reference.  Resolve the issue by removing the
extraneous is_set variable.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:25:23 -08:00
Brian Behlendorf 070402f112 cppcheck: (warning) Possible null pointer dereference: dnp
The dnp argument can only be set to NULL when the DNODE_DRY_RUN flag
is set.  In which case, an early return path will be executed and a
NULL pointer dereference at the given location is impossible.  Add
an additional ASSERT to silence the cppcheck warning and document
that dbp must never be NULL at the point in the function.

[module/zfs/dnode.c:1566]: (warning) Possible null pointer deref: dnp

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:25:13 -08:00
Brian Behlendorf fe20400db5 cppcheck: (error) Memory leak: vtoc
Resolve the reported memory leak by using a dedicated local vptr
variable to store the pointer reported by calloc().  Only assign the
passed **vtoc function argument on success, in all other cases vptr
is freed.

[lib/libefi/rdwr_efi.c:403]: (error) Memory leak: vtoc
[lib/libefi/rdwr_efi.c:422]: (error) Memory leak: vtoc
[lib/libefi/rdwr_efi.c:440]: (error) Memory leak: vtoc
[lib/libefi/rdwr_efi.c:454]: (error) Memory leak: vtoc
[lib/libefi/rdwr_efi.c:470]: (error) Memory leak: vtoc

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:25:04 -08:00
Ubuntu abfdb83607 cppcheck: (error) Shifting signed 64-bit value by 63 bits
As of cppcheck 1.82 surpress the warning regarding shifting too many
bits for __divdi3() implemention.  The algorithm used here is correct.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:24:42 -08:00
Ubuntu 7cf1fe6331 cppcheck: (error) Uninitialized variable
As of cppcheck 1.82 warnings are issued when using the list_for_each_*
functions with an uninitialized variable.  Functionally, this is fine
but to resolve the warning initialize these variables.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:24:29 -08:00
Ubuntu 5215fdd43c cppcheck: (error) Uninitialized variable
Resolve the following uninitialized variable warnings.  In practice
these were unreachable due to the goto.  Replacing the goto with a
return resolves the warning and yields more readable code.

[module/icp/algs/modes/ccm.c:892]: (error) Uninitialized variable: ccm_param
[module/icp/algs/modes/ccm.c:893]: (error) Uninitialized variable: ccm_param
[module/icp/algs/modes/gcm.c:564]: (error) Uninitialized variable: gcm_param
[module/icp/algs/modes/gcm.c:565]: (error) Uninitialized variable: gcm_param
[module/icp/algs/modes/gcm.c:599]: (error) Uninitialized variable: gmac_param
[module/icp/algs/modes/gcm.c:600]: (error) Uninitialized variable: gmac_param

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9732
2019-12-18 17:23:54 -08:00
Garrett Fields b8a899e94f Exchanged two "${ZFS} get -H -o value" commands
Initramfs uses "get_fs_value()" elsewhere.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Garrett Fields <ghfields@gmail.com>
Closes #9736
2019-12-18 12:32:31 -08:00
Matthew Macy 7839c4b5e1 Update ZTS to work on FreeBSD
Update the common ZTS scripts and individual test cases as needed 
in order to allow them to be run on FreeBSD.  The high level goal
is to provide compatibility wrappers whenever possible to minimize
changes to individual test cases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9692
2019-12-18 12:29:43 -08:00
Romain Dolbeau 118fc3ef07 Minor performance fix for NEON RAID-Z
The NEON code replicates too closely the SSE code, including
a masked 16-bits shift. But NEON, like AltiVec (#9539), has
unsigned 8-bits shift, so use that instead and drop the masking.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9725
2019-12-17 19:34:52 -08:00
Thomas Geppert fe564845c0 Create symbolic links in /dev/disk/by-vdev for nvme disk devices
The existing rules miss nvme disk devices because of the trailing
digits in the KERNEL device name, e.g. nvme0n1. Partitions of nvme
disk devices are already properly handled by the existing rule for
ENV{DEVTYPE}=="partition".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Thomas Geppert <geppi@digitx.de>
Closes #9730
2019-12-17 17:50:20 -08:00
Kjeld Schouten-Lebbing 3035f14d82 Moves Codecov Ignore to LCOV
Rely on ax_code_coverage to exclude test directories.

- Removes broken codecov ignore
- Places ignore section in ax_code_coverage
- Forwards users from codecov to LCOV for ignores

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9726
2019-12-17 17:47:58 -08:00
Garrett Fields e8b199faec Force systems with kernel option "quiet" to display prompt for password
On systems that utilize TTY for password entry, if the kernel 
option "quiet" is set, the system would appear to freeze on a 
blank screen, when in fact it is waiting for password entry 
from the user.

Since TTY is the fallback method, this has no effect on systemd 
or plymouth password prompting.

By temporarily setting "printk" to "7", running the command, 
then resuming with the original "printk" state, the user can 
see the password prompt.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Garrett Fields <ghfields@gmail.com>
Closes #9731
2019-12-17 17:45:06 -08:00
Richard Laager ad97643773 initramfs: setup keymapping and video for prompts
From Steve Langasek <steve.langasek@canonical.com>:
> The poorly-named 'FRAMEBUFFER' option in initramfs-tools controls
> whether the console_setup and plymouth scripts are included and used
> in the initramfs. These are required for any initramfs which will be
> prompting for user input: console_setup because without it the user's
> configured keymap will not be set up, and plymouth because you are
> not guaranteed to have working video output in the initramfs without
> it (e.g. some nvidia+UEFI configurations with the default GRUB
> behavior).

> The zfs initramfs script may need to prompt the user for passphrases
> for encrypted zfs datasets, and we don't know definitively whether
> this is the case or not at the time the initramfs is constructed (and
> it's difficult to dynamically populate initramfs config variables
> anyway), therefore the zfs-initramfs package should just set
> FRAMEBUFFER=yes in a conf snippet the same way that the
> cryptsetup-initramfs package does
> (/usr/share/initramfs-tools/conf-hooks.d/cryptsetup).

https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1856408

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Steve Langasek <steve.langasek@canonical.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #9723
2019-12-16 16:54:51 -08:00
Matthew Macy ba434b18ec Fix zfs_xattr_owner_unlinked on FreeBSD and comment
Explain FreeBSD VFS' unfortunate idiosyncratic locking requirements.
There is no functional change for other platforms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9720
2019-12-16 09:49:05 -08:00
Tomohiro Kusumi ddb4e69db5 Don't fail to apply umask for O_TMPFILE files
Apply umask to `mode` which will eventually be applied to inode.
This is needed since VFS doesn't apply umask for O_TMPFILE files.

(Note that zpl_init_acl() applies `ip->i_mode &= ~current_umask();`
only when POSIX ACL is used.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8997 
Closes #8998
2019-12-13 15:02:23 -08:00
Tom Caputi c317c8c811 Allow empty ds_props_obj to be destroyed
Currently, 'zfs list' and 'zfs get' commands can be slow when
working with snapshots that have a ds_props_obj. This is
because the code that discovers all of the properties for these
snapshots needs to read this object for each snapshot, which
almost always ends up causing an extra random synchronous read
for each snapshot. This performance penalty exists even if the
properties on that snapshot have been unset because the object
is normally only freed when the snapshot is freed, even though
it is only created when it is needed.

This patch allows the user to regain 'zfs list' performance on
these snapshots by destroying the ds_props_obj when it no longer
has any entries left. In practice on a production machine, this
optimization seems to make 'zfs list' about 55% faster.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9704
2019-12-13 11:51:39 -08:00
Matthew Macy 13a9a6f5e8 Make zfs_replay.c work on FreeBSD
FreeBSD's vfs currently doesn't permit file systems
to do their own locking. To avoid having to have
duplicate zfs functions with and without locking add
locking here. With luck these changes can be removed
in the future.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9715
2019-12-13 07:54:10 -08:00
Matthew Ahrens 9bb0d89c5c Fix use-after-free of vd_path in spa_vdev_remove()
After spa_vdev_remove_aux() is called, the config nvlist is no longer
valid, as it's been replaced by the new one (with the specified device
removed).  Therefore any pointers into the nvlist are no longer valid.
So we can't save the result of
`fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH)` (in vd_path) across the
call to spa_vdev_remove_aux().

Instead, use spa_strdup() to save a copy of the string before calling
spa_vdev_remove_aux.

Found by AddressSanitizer:

ERROR: AddressSanitizer: heap-use-after-free on address ...
READ of size 34 at 0x608000a1fcd0 thread T686
    #0 0x7fe88b0c166d  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d)
    #1 0x7fe88a5acd6e in spa_strdup spa_misc.c:1447
    #2 0x7fe88a688034 in spa_vdev_remove vdev_removal.c:2259
    #3 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
    #4 0x55ffbc769fba in ztest_execute ztest.c:6714
    #5 0x55ffbc779a90 in ztest_thread ztest.c:6761
    #6 0x7fe889cbc6da in start_thread
    #7 0x7fe8899e588e in __clone

0x608000a1fcd0 is located 48 bytes inside of 88-byte region
freed by thread T686 here:
    #0 0x7fe88b14e7b8 in __interceptor_free
    #1 0x7fe88ae541c5 in nvlist_free nvpair.c:874
    #2 0x7fe88ae543ba in nvpair_free nvpair.c:844
    #3 0x7fe88ae57400 in nvlist_remove_nvpair nvpair.c:978
    #4 0x7fe88a683c81 in spa_vdev_remove_aux vdev_removal.c:185
    #5 0x7fe88a68857c in spa_vdev_remove vdev_removal.c:2221
    #6 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229
    #7 0x55ffbc769fba in ztest_execute ztest.c:6714
    #8 0x55ffbc779a90 in ztest_thread ztest.c:6761
    #9 0x7fe889cbc6da in start_thread

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9706
2019-12-11 15:38:21 -08:00
Ryan Moeller 957c7aa23c Relocate common quota functions to shared code
The quota functions are common to all implementations and can be
moved to common code.  As a simplification they were moved to the
Linux platform code in the initial refactoring.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9710
2019-12-11 12:12:08 -08:00
Matthew Macy 4bc721965f Add FreeBSD jail support hooks
Add the 'zfs jail/unjail' subcommands along with the relevant 
documentation from FreeBSD.  This feature is not supported on
Linux and still requires the match kernel ioctls which will
be included when the FreeBSD platform code is integrated.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9686
2019-12-11 11:58:37 -08:00
Matthew Macy 657ce25357 Eliminate Linux specific inode usage from common code
Change many of the znops routines to take a znode rather
than an inode so that zfs_replay code can be largely shared
and in the future the much of the znops code may be shared.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9708
2019-12-11 11:53:57 -08:00
Paul Zuchowski f0bf435176 zio_decompress_data always ASSERTs successful decompression
This interferes with zdb_read_block trying all the decompression
algorithms when the 'd' flag is specified, as some are
expected to fail.  Also control the output when guessing
algorithms, try the more common compression types first, allow
specifying lsize/psize, and fix an uninitialized variable.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9612 
Closes #9630
2019-12-10 15:51:58 -08:00
Fabian-Gruenbichler b119e2c6f1 SIMD: Use alloc_pages_node to force alignment
fxsave and xsave require the target address to be 16-/64-byte aligned.

kmalloc(_node) does not (yet) offer such fine-grained control over
alignment[0,1], even though it does "the right thing" most of the time
for power-of-2 sizes. unfortunately, alignment is completely off when
using certain debugging or hardening features/configs, such as KASAN,
slub_debug=Z or the not-yet-upstream SLAB_CANARY.

Use alloc_pages_node() instead which allows us to allocate page-aligned
memory. Since fpregs_state is padded to a full page anyway, and this
code is only relevant for x86 which has 4k pages, this approach should
not allocate any unnecessary memory but still guarantee the needed
alignment.

0: https://lwn.net/Articles/787740/
1: https://lore.kernel.org/linux-block/20190826111627.7505-1-vbabka@suse.cz/

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9608 
Closes #9674
2019-12-10 12:53:25 -08:00
Matthew Macy 362ae8d11f Abstract away platform specific superblock references
The zfsvfs->z_sb field is Linux specified and should be abstracted.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9697
2019-12-10 09:21:07 -08:00
John Poduska 7bda69a1a9 ZTS: Fixes for spurious failures of resilver_restart_001 test
The resilver restart test was reported as failing about 2% of the
time. Two issues were found:

- The event log wasn't large enough, so resilver events were missing
- One 'zpool sync' wasn't enough for resilver to start after zinject

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: John Poduska <jpoduska@datto.com>
Issue #9588 
Closes #9677 
Closes #9703
2019-12-10 09:10:36 -08:00
Matthew Macy 3c502d3b75 Exclude data from cores unconditionally and metadata conditionally
This change allows us to align the code dump logic across platforms.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9691
2019-12-09 12:29:56 -08:00
Matthew Macy ea79e90f99 Mark dsl_dataset_deactivate_feature_impl static
The dsl_dataset_deactivate_feature_impl() function is private and
should be marked as such.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9696
2019-12-09 12:26:33 -08:00
Brian Behlendorf a25861dcae ZTS: Fix zpool_reopen_001_pos
Update the vdev_disk_open() retry logic to use a specified number
of milliseconds to be more robust.  Additionally, on failure log
both the time waited and requested timeout to the internal log.

The default maximum allowed open retry time has been increased
from 500ms to 1000ms.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9680
2019-12-09 11:09:14 -08:00
Kjeld Schouten 5927917987 Fixes typo from #9681
Going from #9672 to 9681 I made a typo.  This removes that typo.

    The pattern folder /* will not match recursively in the folder.
    Please use this folder /**/*

    source: https://docs.codecov.io/docs/ignoring-paths

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9700
2019-12-09 10:10:55 -08:00
Matthew Macy 0dcef9b966 Disable sysfs feature checks on FreeBSD
The sysfs infrastructure for reporting supported features and
properties is Linux specific.  Disable it on FreeBSD until it can
be extended to be more portable.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9684
2019-12-06 09:44:29 -08:00
Kjeld Schouten 69b7e3f92d Set send_realloc_files.ksh to use properties.shlib
This sets send_realloc_files.ksh to use properties.shlib
(like the other compression related tests)

It was missing from #9645

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Issue #9645
Closes #9679
2019-12-06 09:37:48 -08:00
Attila Fülöp 3ac34ca375 ICP: Fix out of bounds write
If gcm_mode_encrypt_contiguous_blocks() is called more than once
in succession, with the accumulated lengths being less than
blocksize, ctx->copy_to will be incorrectly advanced. Later, if
out is NULL, the bcopy at line 114 will overflow
ctx->gcm_copy_to since ctx->gcm_remainder_len is larger than the
ctx->gcm_copy_to buffer can hold.

The fix is to set ctx->copy_to only if it's not already set.

For ZoL the issue may be academic, since in all my testing I wasn't
able to hit neither of both conditions needed to trigger it, but
other consumers can easily do so.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #9660
2019-12-06 09:36:19 -08:00
Kjeld Schouten f784828416 Fix codecov ignore, wrong syntax
The current codecov ignore syntax is incorrect.
Corrected it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9681
2019-12-06 09:35:02 -08:00
Matthew Macy 1f654753ba Remove stale ASSERTV comment
Followup for #9671, remove stale comment.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Issue #9671 
Closes #9682
2019-12-06 09:33:27 -08:00
Matthew Macy f95704ca5e Disable EDONR on FreeBSD
FreeBSD uses its own crypto framework in-kernel which, at this time,
has no EDONR implementation.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9664
2019-12-05 13:10:29 -08:00
Matthew Macy 054a049841 Add ZFS_IOC offsets for FreeBSD
FreeBSD requires three additional ioctls, they are ZFS_IOC_NEXTBOOT,
ZFS_IOC_JAIL, and ZFS_IOC_UNJAIL.  These have been added after the
Linux-specific ioctls.  The range 0x80-0xFF has been reserved for 
future optional platform-specific ioctls.  Any platform may choose
to implement these as appropriate.

None of the existing ioctl numbers have been changed to maintain
compatibility.  For Linux no vectors have been registered for the
new ioctls and they are reported as unsupported.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9667
2019-12-05 13:06:51 -08:00
Matthew Macy e64e84eca5 Refactor deadman set failmode to be cross platform
Update zfs_deadman_failmode to use the ZFS_MODULE_PARAM_CALL
wrapper, and split the common and platform specific portions.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9670
2019-12-05 12:40:45 -08:00
Matthew Macy 2a8ba608d3 Replace ASSERTV macro with compiler annotation
Remove the ASSERTV macro and handle suppressing unused 
compiler warnings for variables only in ASSERTs using the 
__attribute__((unused)) compiler annotation.  The annotation
is understood by both gcc and clang.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9671
2019-12-05 12:37:00 -08:00
George Amanakis 12395c7b0b Fix reporting of L2ARC hits/misses in arc_summary3
arc_summary3 reports L2ARC hits and misses as Bytes, whereas they
should be reported as events. arc_summary2 reports these correctly.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #9669
2019-12-04 13:24:55 -08:00
Matthew Macy be627fc847 Refactor zfs_context.h to build on FreeBSD
- on Linux move Linux specific headers to zfs_context_os.h
- on FreeBSD move FreeBSD specific definitions to zfs_context_os.h
- remove duplicate tsd_ definitions
- remove unused AT_TYPE

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9668
2019-12-04 13:12:57 -08:00
Kjeld Schouten 618b6adfbf Refactor compression algorithm selection for tests
- Moves compression algorithms for tests to properties.shlib
- Removes all compression algorithms levels from general tests
- Replaces on with lz4 for compression tests
- Removes random algorithm selection, if not needed
- Cleans copyright header formatting

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9645
2019-12-04 13:10:12 -08:00
Paul Zuchowski 5a08977374 Fix zdb_read_block using zio after it is destroyed
The checksum display code of zdb_read_block uses a zio
to read in the block and then calls zio_checksum_compute.
Use a new zio in the call to zio_checksum_compute not the zio
from the read which has been destroyed by zio_wait.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9644
Closes #9657
2019-12-03 14:37:15 -08:00
Attila Fülöp 54c8366e39 ICP: Fix null pointer dereference and use after free
In gcm_mode_decrypt_contiguous_blocks(), if vmem_alloc() fails,
bcopy is called with a NULL pointer destination and a length > 0.
This results in undefined behavior. Further ctx->gcm_pt_buf is
freed but not set to NULL, leading to a potential write after
free and a double free due to missing return value handling in
crypto_update_uio(). The code as is may write to ctx->gcm_pt_buf
in gcm_decrypt_final() and may free ctx->gcm_pt_buf again in
aes_decrypt_atomic().

The fix is to slightly rework error handling and check the return
value in crypto_update_uio().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #9659
2019-12-03 10:28:47 -08:00
Kjeld Schouten 7af72863fd Codecov tweaks
Modify the Codecov settings to provide a more realistic and stable
report.  The following change were made:

- Precision has been limited to whole percents only, but will round
  to nearest. This means 0.0-0.49 will round to zero (no change) and
  0.51 will round to 1%.

- Exclude the tests/zfs-tests directory from the report.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9650
2019-12-03 10:23:48 -08:00
Alexander Motin 5ff2249fa5 Fix use-after-free in case of L2ARC prefetch failure
In case L2ARC read failed, l2arc_read_done() creates _different_ ZIO
to read data from the original storage device.  Unfortunately pointer
to the failed ZIO remains in hdr->b_l1hdr.b_acb->acb_zio_head, and if
some other read try to bump the ZIO priority, it will crash.

The problem is reproducible by corrupting L2ARC content and reading
some data with prefetch if l2arc_noprefetch tunable is changed to 0.
With the default setting the issue is probably not reproducible now.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9648
2019-12-03 09:59:30 -08:00
Brian Behlendorf 624222ae31 Increase allowed 'special_small_blocks' maximum value
There may be circumstances where it's desirable that all blocks
in a specified dataset be stored on the special device.  Relax
the artificial 128K limit and allow the special_small_blocks
property to be set up to 1M.  When blocks >1MB have been enabled
via the zfs_max_recordsize module option, this limit is increased
accordingly.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9131
Closes #9355
2019-12-03 09:58:03 -08:00
Matthew Macy b3673342c7 Wrap module_param_call() routines under __linux__
The module_param_call() functionality is currently still
Linux-specific and should be wrapped accordingly.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9666
2019-12-03 09:56:15 -08:00
Matthew Macy bff8fb395b Mark write_record static
The write_record() function is private and should be marked as such.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9665
2019-12-03 09:51:44 -08:00
Matthew Macy 74d1d74959 Move linux qsort def to platform header
Moving qsort to the platform header allows each platform to
provide an appropriate sorting implementation.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9663
2019-12-03 09:49:40 -08:00
Michael Niewöhner e69bb31b71 Adapt gitignore for modules
Remove the specific gitignore rules for module left-overs and add a
generic one in modules/.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9656
2019-12-02 13:23:47 -08:00
Matthew Macy 5142032106 Move zfs_cmd_t copyin/copyout to platform code
FreeBSD needs to cope with multiple version of the zfs_cmd_t
structure. Allowing the platform code to pre and post
process the cmd structure makes it possible to work with
legacy tooling.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9624
2019-12-02 10:08:27 -08:00
Matthew Macy 42a826eed3 Add FreeBSD required defines to mntent.h
Linux and FreeBSD use different names for suid / setuid.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9632
2019-11-30 15:49:09 -08:00
Matthew Macy 758699b6f1 Restructure nvlist_nv_alloc to work on FreeBSD
KM_PUSHPAGE is an Illumosism - On FreeBSD it's
aliased to the same malloc flag as KM_SLEEP.
The compiler naturally rejects multiple case
statements with the same value.  This is effectively
a no-op since all callers pass a specific KM_* flag.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9643
2019-11-30 15:45:06 -08:00
Ryan Moeller 101f9b1771 Add FreeBSD code to arc_summary and arcstat
Adding the FreeBSD code allows arc_summary and arcstat
to be used on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9641
2019-11-30 15:43:23 -08:00
Matthew Macy c54687cd23 Make asm-x86_64/atomic.S build on FreeBSD
Include the required headers for FreeBSD.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9634
2019-11-30 15:42:00 -08:00
Matthew Macy f348c78f97 Mark Linux fallocate extensions as specific to Linux
fallocate(2) is a Linux-specific system call which in unavailable
on other platforms.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9633
2019-11-30 15:40:22 -08:00
Matthew Macy 77323bcf53 Add FreeBSD support to zio_crypto.h
Minimal compatibility changes for FreeBSD.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9631
2019-11-30 15:38:16 -08:00
Matthew Macy a5b762ab1d Resolve ZoF differences in zfs_ioctl.h
FreeBSD needs to be able to pass the jail id to the jail/unjail ioctls
and the struct file in the device structure is unused.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9625
2019-11-30 15:35:54 -08:00
Igor K a7c358845b ZTS: Fix attach-o_ashift.ksh for multiple platforms
The `-o ashift` option must appear after attach to be properly
interpreted by getopt(3) on all platforms.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9636
2019-11-27 14:55:43 -08:00
Matthew Macy d6f67df63c Minor diff reduction with ZoF in include/sys
- move linux/ includes to platform headers
- add void * io_bio to zio for tracking the underlying bio
- add freebsd specific fields to abd_scatter

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9615
2019-11-27 11:11:03 -08:00
InsanePrawn c940bf0c37 Fix encryption logic in systemd mount generator
Previously the generator would skip a dataset if it wasn't mountable by
'zfs mount -a' (legacy/none mountpoint, canmount off/noauto). This also
skipped the generation of key-load units for such datasets, breaking
the dependency handling for mountable child datasets.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9611
2019-11-27 10:54:49 -08:00
InsanePrawn 70d2dd922b Fix non-absolute path in systemd mount generator
Systemd will ignore units that try to execute programs from non-absolute
paths. Use hardcoded /bin/sh instead.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9611
2019-11-27 10:54:24 -08:00
InsanePrawn d8ce455c1e Fix small typo in systemd mount generator
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9611
2019-11-27 10:53:37 -08:00
Paul Zuchowski 7c1bf0cf27 Implement -A (ignore ASSERTs) for zdb
The command line switch -A (ignore ASSERTs) has always been available 
in zdb but was never connected up to the correct global variable.

There are times when you need zdb to ignore asserts and keep dumping 
out whatever information it can get despite the ASSERT(s) failing. 
It was always intended to be part of zdb but was incomplete.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9610
2019-11-27 10:45:56 -08:00
Brian Behlendorf 9e17e6f254 Remove zfs_vdev_elevator module option
As described in commit f81d5ef6 the zfs_vdev_elevator module
option is being removed.  Users who require this functionality
should update their systems to set the disk scheduler using a
udev rule.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8664
Closes #9417
Closes #9609
2019-11-27 10:35:49 -08:00
jwpoduska 3c819a2c7d Prevent unnecessary resilver restarts
If a device is participating in an active resilver, then it will have a
non-empty DTL. Operations like vdev_{open,reopen,probe}() can cause the
resilver to be restarted (or deferred to be restarted later), which is
unnecessary if the DTL is still covered by the current scan range. This
is similar to the logic in vdev_dtl_should_excise() where the DTL can
only be excised if it's max txg is in the resilvered range.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: John Poduska <jpoduska@datto.com>
Issue #840 
Closes #9155
Closes #9378
Closes #9551
Closes #9588
2019-11-27 10:15:01 -08:00
Paul Zuchowski 894f6696b4 Add display of checksums to zdb -R
The function zdb_read_block (zdb -R) was always intended to have a :c 
flag which would read the DVA and length supplied by the user, and 
display the checksum. Since we don't know which checksum goes with 
the data, we should calculate and display them all.

For each checksum in the table, read in the data at the supplied 
DVA:length, calculate the checksum, and display it. Update the man 
page and create a zfs test for the new feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9607
2019-11-27 10:08:18 -08:00
Mauricio Faria de Oliveira 0c46813805 Check for unlinked znodes after igrab()
The changes in commit 41e1aa2a / PR #9583 introduced a regression on
tmpfile_001_pos: fsetxattr() on a O_TMPFILE file descriptor started
to fail with errno ENODATA:

    openat(AT_FDCWD, "/test", O_RDWR|O_TMPFILE, 0666) = 3
    <...>
    fsetxattr(3, "user.test", <...>, 64, 0) = -1 ENODATA

The originally proposed change on PR #9583 is not susceptible to it,
so just move the code/if-checks around back in that way, to fix it.

Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Original-patch-by: Heitor Alves de Siqueira <halves@canonical.com>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Closes #9602
2019-11-21 12:24:03 -08:00
Matthew Macy da92d5cbb3 Add zfs_file_* interface, remove vnodes
Provide a common zfs_file_* interface which can be implemented on all 
platforms to perform normal file access from either the kernel module
or the libzpool library.

This allows all non-portable vnode_t usage in the common code to be 
replaced by the new portable zfs_file_t.  The associated vnode and
kobj compatibility functions, types, and macros have been removed
from the SPL.  Moving forward, vnodes should only be used in platform
specific code when provided by the native operating system.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9556
2019-11-21 09:32:57 -08:00
Brian Behlendorf 67a6c3bc9f ZTS: tst.terminate_by_signal increase test threshold
The tst.terminate_by_signal test case may occasionally fail when
running in a less consistent virtual environment.  For all observed
failures the process was terminated correctly but it took longer than
expected resulting in too many snapshot being created.

To minimize the likelyhood of this occuring increase the threshold
from 50 to 90 snapshots.  The larger limit will still verifiy that
the channel program was correctly terminated early.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9601
2019-11-20 17:26:32 -08:00
George Melikov 540312df7f ZTS: Casenorm fix unicode interpretation
Use `printf` to properly interpret unicode characters.

Illumos uses a utility called `zlook` to allow additional flags to be 
provided to readdir and lookup for testing.  This functionality could 
be ported to Linux, but even without it several of the tests can be 
enabled by instead using the standard `test` command.

Additional, work is required to enable the remaining test cases.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Issue #7633 
Closes #8812
2019-11-19 16:23:27 -08:00
InsanePrawn 8221bcf1e4 Remove requirement for -d 1 for zfs list and zfs get with bookmarks
df58307 removed the need to specify -d 1 when zfs list and zfs get are
called with -t snapshot on a datset. This commit extends the same
behaviour to -t bookmark.

This commit also introduces the 'snap' shorthand for snapshots from
zfs list to zfs get.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9589
2019-11-18 16:44:28 -08:00
Brian Behlendorf 7ae3f8dc8f Partially revert 5a6ac4c
Reinstate the zpl_revalidate() functionality to resolve a regression
where dentries for open files during a rollback are not invalidated.

The unrelated functionality for automatically unmounting .zfs/snapshots
was not reverted.  Nor was the addition of shrink_dcache_sb() to the
zfs_resume_fs() function.

This issue was not immediately caught by the CI because the test case
intended to catch it was included in the list of ZTS tests which may
occasionally fail for unrelated reasons.  Remove all of the rollback
tests from this list to help identify the frequency of any spurious
failures.

The rollback_003_pos.ksh test case exposes a real issue with the
long standing code which needs to be investigated.  Regardless,
it has been enable with a small workaround in the test case itself.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9587
Closes #9592
2019-11-18 13:05:56 -08:00
Heitor Alves de Siqueira 41e1aa2a06 Break out of zfs_zget early if unlinked znode
If zp->z_unlinked is set, we're working with a znode that has been
marked for deletion. If that's the case, we can skip the "goto again"
loop and return ENOENT, as the znode should not be discovered.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Heitor Alves de Siqueira <halves@canonical.com>
Closes #9583
2019-11-15 09:56:05 -08:00
InsanePrawn cc1a1e17d9 Remove inappropiate error message suggesting to use '-r'
Removes an incorrect error message from libzfs that suggests applying
'-r' when a zfs subcommand is called with a filesystem path while
expecting either a snapshot or bookmark path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #9574
2019-11-15 09:52:11 -08:00
Kjeld Schouten 64c77c4bad Change zed.service to zfs-zed.service in man page
zed.service does not exist
replaced with correct service name in man.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes #9581
2019-11-13 10:23:23 -08:00
loli10K 7ba964cc3f Prevent NULL pointer dereference in blkg_tryget() on EL8 kernels
blkg_tryget() as shipped in EL8 kernels does not seem to handle NULL
@blkg as input; this is different from its mainline counterpart where
NULL is accepted.  To prevent dereferencing a NULL pointer when dealing
with block devices which do not set a root_blkg on the request queue
perform the NULL check in vdev_bio_associate_blkg().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9546 
Closes #9577
2019-11-13 10:19:06 -08:00
Michael Niewöhner 8aaa10a9a0 Check for __GFP_RECLAIM instead of GFP_KERNEL
Check for __GFP_RECLAIM instead of GFP_KERNEL because zfs modifies
IO and FS flags which breaks the check for GFP_KERNEL.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9034
2019-11-13 10:05:23 -08:00
Michael Niewöhner 6d948c3519 Add kmem_cache flag for forcing kvmalloc
This adds a new KMC_KVMEM flag was added to enforce use of the
kvmalloc allocator in kmem_cache_create even for large blocks, which
may also increase performance in some specific cases (e.g. zstd), too.

Default to KVMEM instead of VMEM in spl_kmem_cache_create.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9034
2019-11-13 10:05:23 -08:00
Michael Niewöhner 66955885e2 Make use of kvmalloc if available and fix vmem_alloc implementation
This patch implements use of kvmalloc for GFP_KERNEL allocations, which
may increase performance if the allocator is able to allocate physical
memory, if kvmalloc is available as a public kernel interface (since
v4.12). Otherwise it will simply fall back to virtual memory (vmalloc).

Also fix vmem_alloc implementation which can lead to slow allocations
since the first attempt with kmalloc does not make use of the noretry
flag but tells the linux kernel to retry several times before it fails.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9034
2019-11-13 10:05:10 -08:00
Michael Niewöhner c025008df5 Add missing documentation for some KMC flags
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9034
2019-11-13 09:34:51 -08:00
Brian Behlendorf 94a570e39c Fix zpool create -o <property> error message
When `zpool create -o <property>` is run without root permissions
and the pool property requested is not specifically enumerated in
zpool_valid_proplist().  Then an incorrect error message referring
to an invalid property is printed rather than the expected permission
denied error.

Specifying a pool property at create time should be handled the same
way as filesystem properties in zfs_valid_proplist().  There should
not be default zfs_error_aux() set for properties which are not
listed.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9550 
Closes #9568
2019-11-13 09:23:14 -08:00
Ross Williams c5ebfbbe19 Reorganize zpool(8) man page into sections
Moved subcommand topics into individual manpages. Reordered and 
grouped the list of subcommands by topic.

Moved concepts overview to `zpoolconcepts.8` and the long list of
available pool properties to `zpoolprops.8`.

Internal cross-references copied from `zpool.8` needed to be 
converted to `.Xr` external references to new subcommand manual 
pages.

Move `autotrim` into lexical order, autotrim tacked onto the end
of a list. Now it is in alphabetical order.

Clarify attach/detach description. Description was too specific to
command syntax. Overview clarifies reason for attaching or detaching
a device.

Clarify replace description, don't refer to subcommand arguments.

Clarify split command description, say what split actually does and
why you'd want to do it.

Clarify description of upgrade, and simplify the zpool.8 wording of
the zpool-upgrade(8) description.

Clarify description of import, detail what zpool-import(8) actually 
does.

Add appropriate SEE ALSO sections. Divided zpool subcommand manual 
pages need their own SEE ALSO sections. Also modified fsck.zfs.8 
to point directly to zfs-scrub.8 and zed.8.in to include a direct
reference to zfs-events.8

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ross Williams <ross@ross-williams.net>
Closes #9564
2019-11-13 09:21:07 -08:00
Ross Williams 870fc32fc9 Reorganize zfs(8) man page into sections
Most subcommands got their own manpages (e.g. create). Some related
commands grouped into a single manpage and symlinks created (e.g. set, 
get, and inherit). I did this when topics were either too short to 
warrant their own file or so interrelated that a user would want to
refer between commands in the same file.

Corrected .Sx internal references to .Xr cross refs; lots of .Sx 
references from when text was all in zfs.8 needed to be changed to 
.Xr zfs-$SUBCOMMAND 8 cross references.

Divided subcommand list in zfs(8) into sections of related 
functionality. This required writing new descriptions for some 
commands.

Preserved ".Os Linux", `.Os` macro parsing behavior differs between 
mandoc from the "BSD" mandoc package (available on Ubuntu) and man 
from Ubuntu's man-db package, which calls groff to format the manpages.

Groff handles the `.Os` macro differently and wrongly, defaulting 
it to "BSD" in `/usr/share/groff/*/tmac/mdoc/doc-common`, instead of
getting the default from `uname`.

A future set of changes will introduce build-time preprocessing of
manpages for platform-specific documentation and can insert the
correct operating system name.

Added SEE ALSO sections, the newly-divided zfs-*.8 subcommand man
pages needed their own SEE ALSO sections pointing to related
subcommands and, in some cases, documentation from other packages
(e.g. zfs-share.8).

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ross Williams <ross@ross-williams.net>
Closes #9559
2019-11-12 11:17:40 -08:00
Matthew Macy 1f2f46be95 Add wrapper stub for zfs_cmd ioctl to libzpool
FreeBSD needs a wrapper for handling zfs_cmd ioctls.
In libzfs this is handled by zfs_ioctl. However, here
we need to wrap the call directly.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9511
2019-11-12 10:40:39 -08:00
Brian Behlendorf 066e825221 Linux compat: Minimum kernel version 3.10
Increase the minimum supported kernel version from 2.6.32 to 3.10.
This removes support for the following Linux enterprise distributions.

    Distribution     | Kernel | End of Life
    ---------------- | ------ | -------------
    Ubuntu 12.04 LTS | 3.2    | Apr 28, 2017
    SLES 11          | 3.0    | Mar 32, 2019
    RHEL / CentOS 6  | 2.6.32 | Nov 30, 2020

The following changes were made as part of removing support.

* Updated `configure` to enforce a minimum kernel version as
  specified in the META file (Linux-Minimum: 3.10).

    configure: error:
        *** Cannot build against kernel version 2.6.32.
        *** The minimum supported kernel version is 3.10.

* Removed all `configure` kABI checks and matching C code for
  interfaces which solely predate the Linux 3.10 kernel.

* Updated all `configure` kABI checks to fail when an interface is
  missing which was in the 3.10 kernel up to the latest 5.1 kernel.
  Removed the HAVE_* preprocessor defines for these checks and
  updated the code to unconditionally use the verified interface.

* Inverted the detection logic in several kABI checks to match
  the new interface as it appears in 3.10 and newer and not the
  legacy interface.

* Consolidated the following checks in to individual files. Due
  the large number of changes in the checks it made sense to handle
  this now.  It would be desirable to group other related checks in
  the same fashion, but this as left as future work.

  - config/kernel-blkdev.m4 - Block device kABI checks
  - config/kernel-blk-queue.m4 - Block queue kABI checks
  - config/kernel-bio.m4 - Bio interface kABI checks

* Removed the kABI checks for sops->nr_cached_objects() and
  sops->free_cached_objects().  These interfaces are currently unused.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9566
2019-11-12 08:59:06 -08:00
Ryan Moeller 035ebb3653 Allow platform dependent path stripping for vdevs
On Linux the full path preceding devices is stripped when formatting
vdev names. On FreeBSD we only want to strip "/dev/". Hide the
implementation details of path stripping behind zfs_strip_path().

Make zfs_strip_partition_path() static in Linux implementation while
here, since it is never used outside of the file it is defined in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9565
2019-11-11 12:15:44 -08:00
Pavel Snajdr 5a6ac4cffc Remove zpl_revalidate
This patch removes the need for zpl_revalidate altogether.

There were 3 main reasons why we used d_revalidate:

1. periodic automounted snapshots umount deferral
2. negative dentries created before snapshot rollback
3. stale inodes referenced by dentry cache after snapshot rollback

Periodic snapshots deferral solution introduces zfs_exit_fs function,
which is called as a part of ZFS_EXIT(zfsvfs_t) macro.

Negative dentries and stale inodes are solved by flushing the dcache
for the particular dataset on zfs_resume_fs call.

This patch also removes now unused HAVE_S_D_OP configure test.

Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #8774 
Closes #9549
2019-11-11 09:34:21 -08:00
Alexander Motin f15d6a5457 Improve logging of 128KB writes
Before my ZIL space optimization few years ago 128KB writes were logged
as two 64KB+ records in two 128KB log blocks.  After that change it
became ~127KB+/1KB+ in two 128KB log blocks to free space in the second
block for another record.  Unfortunately in case of 128KB only writes,
when space in the second block remained unused, that change increased
write latency by unbalancing checksum computation and write times
between parallel threads.  It also didn't help with SLOG space
efficiency in that case.

This change introduces new 68KB log block size, used for both writes
below 67KB and 128KB-sharp writes.  Writes of 68-127KB are still using
one 128KB block to not increase processing overhead.  Writes above
131KB are still using full 128KB blocks, since possible saving there
is small.  Mixed loads will likely also fall back to previous 128KB,
since code uses maximum of the last 16 requested block sizes.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Closes #9409
2019-11-11 09:27:59 -08:00
Ryan Moeller 2f1ca8a32a Isolate code specific to Linux in cmd/
Use sys.platform to choose the correct implementation of functions and
values of variables for the platform being run on.

Reword some comments to avoid describing implementation details in the
wrong places.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9561
2019-11-11 09:24:04 -08:00
Witaut Bajaryn 6c7023a532 Skip loading already loaded key
Don't ask for the password / try to load the key if the key for the 
encryptionroot is already loaded.  The user might have loaded the key 
manually or by other means before the scripts get called.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Witaut Bajaryn <vitaut.bayaryn@gmail.com>
Closes #9495
Closes #9529
2019-11-08 14:34:07 -08:00
M. Zhou 734de7ced1 Add a notice in /etc/defaults/zfs for systemd users
Some systemd users may want to change configurations in 
/etc/defaults/zfs, but these settings won't affect systemd 
services.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mo Zhou <cdluminate@gmail.com>
Closes #9544
2019-11-06 11:36:33 -08:00
Romain Dolbeau 4254e40729 Preliminary support for RV64G
This adds basic support for RISC-V, specifically RV64G.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9540
2019-11-06 10:56:09 -08:00
Matthew Macy 27ece2ee4d Move platform specific parts of zfs_znode.h to platform code
Some of the znode fields are different and functions
consuming an inode don't exist on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9536
2019-11-06 10:54:25 -08:00
Pavel Zakharov 1c47c2c42c zvol_wait should ignore redacted zvols
zvol_wait waits for zvol links to be created under /dev/zvol for each zvol.
Links are not created for redacted zvols so we should ignore those.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Closes #9545
2019-11-06 10:51:19 -08:00
Prakash Surya ae38e00968 Add tracepoints for taskq entry lifetime events
This adds some new DTRACE_PROBE* endpoints so that we can observe taskq
latencies on a system. Additionally, a new "taskqlatency.bt" script is
added to do this observation via "bpftrace". Lastly, a "zfs-trace.sh"
script is added to wrap "bpftrace" with the proper options required to
run and use "taskqlatency.bt".

For example, with these changes in place, a user can run the following:

    $ cd ./contrib/bpftrace
    $ sudo ./zfs-trace.sh taskqlatency.bt
    Attaching 6 probes...
    ^C

Here's some example output, showing latency information for time spent
executing the taskq entry's function:

    @exec_lat_us[dp_sync_taskq, userquota_updates_task]:
    [2, 4)                 5 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [4, 8)                 0 |                                                    |
    [8, 16)                1 |@@@@@@@@@@                                          |
    [16, 32)               2 |@@@@@@@@@@@@@@@@@@@@                                |

    @exec_lat_us[z_wr_int_h, zio_execute]:
    [8, 16)               16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [16, 32)               2 |@@@@@@                                              |

    @exec_lat_us[z_wr_iss_h, zio_execute]:
    [16, 32)               4 |@@@@@@@@@@@@@@@@                                    |
    [32, 64)              13 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [64, 128)              1 |@@@@                                                |

    @exec_lat_us[z_ioctl_int, zio_execute]:
    [2, 4)                 1 |@@@@                                                |
    [4, 8)                11 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [8, 16)                8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@               |

    @exec_lat_us[dp_sync_taskq, sync_dnodes_task]:
    [2, 4)                 1 |@@@@@@                                              |
    [4, 8)                 7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       |
    [8, 16)                8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [16, 32)               2 |@@@@@@@@@@@@@                                       |
    [32, 64)               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
    [64, 128)              1 |@@@@@@                                              |
    [128, 256)             0 |                                                    |
    [256, 512)             1 |@@@@@@

Here's some example output, showing latency information for time spent
waiting on the taskq, prior to starting execution of entry's function:

    @queue_lat_us[dp_sync_taskq]:
    [2, 4)                 1 |@@@@                                                |
    [4, 8)                 7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      |
    [8, 16)                2 |@@@@@@@@                                            |
    [16, 32)               3 |@@@@@@@@@@@@@                                       |
    [32, 64)              12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [64, 128)              6 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
    [128, 256)             0 |                                                    |
    [256, 512)             1 |@@@@                                                |

    @queue_lat_us[z_wr_iss]:
    [4, 8)                 4 |@@@@                                                |
    [8, 16)               13 |@@@@@@@@@@@@@@@                                     |
    [16, 32)               6 |@@@@@@@                                             |
    [32, 64)               2 |@@                                                  |
    [64, 128)             12 |@@@@@@@@@@@@@@                                      |
    [128, 256)            15 |@@@@@@@@@@@@@@@@@@                                  |
    [256, 512)            33 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@             |
    [512, 1K)             27 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                    |
    [1K, 2K)               7 |@@@@@@@@                                            |
    [2K, 4K)              14 |@@@@@@@@@@@@@@@@                                    |
    [4K, 8K)              14 |@@@@@@@@@@@@@@@@                                    |
    [8K, 16K)             23 |@@@@@@@@@@@@@@@@@@@@@@@@@@@                         |
    [16K, 32K)            43 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

    @queue_lat_us[z_wr_int]:
    [2, 4)                10 |@@@@@                                               |
    [4, 8)                71 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           |
    [8, 16)               88 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [16, 32)              50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                       |
    [32, 64)              65 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@              |
    [64, 128)             43 |@@@@@@@@@@@@@@@@@@@@@@@@@                           |
    [128, 256)            19 |@@@@@@@@@@@                                         |
    [256, 512)             3 |@                                                   |
    [512, 1K)              1 |                                                    |

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9525
2019-11-01 13:14:54 -07:00
Prakash Surya e5d1c27e30 Enable use of DTRACE_PROBE* macros in "spl" module
This change modifies some of the infrastructure for enabling the use of
the DTRACE_PROBE* macros, such that we can use tehm in the "spl" module.

Currently, when the DTRACE_PROBE* macros are used, they get expanded to
create new functions, and these dynamically generated functions become
part of the "zfs" module.

Since the "spl" module does not depend on the "zfs" module, the use of
DTRACE_PROBE* in the "spl" module would result in undefined symbols
being used in the "spl" module. Specifically, DTRACE_PROBE* would turn
into a function call, and the function being called would be a symbol
only contained in the "zfs" module; which results in a linker and/or
runtime error.

Thus, this change adds the necessary logic to the "spl" module, to
mirror the tracing functionality available to the "zfs" module. After
this change, we'll have a "trace_zfs.h" header file which defines the
probes available only to the "zfs" module, and a "trace_spl.h" header
file which defines the probes available only to the "spl" module.

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9525
2019-11-01 13:13:43 -07:00
Matthew Macy 4a2ed90013 Wrap Linux module macros
MODULE_VERSION is already defined on FreeBSD. Wrap all of the
used MODULE_* macros for the sake of consistency and portability.

Add a user space noop version to reduce the need for _KERNEL ifdefs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9542
2019-11-01 10:41:03 -07:00
Matthew Macy bd4dde8ef7 Prefix struct rangelock
A struct rangelock already exists on FreeBSD.  Add a zfs_ prefix as
per our convention to prevent any conflict with existing symbols.
This change is a follow up to 2cc479d0.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9534
2019-11-01 10:37:33 -07:00
Matthew Macy bbc18de83a Remove ECKSUM alias in zinject
The custom ECKSUM errno is defined as appropriate by the
platform specific os/linux/spl/sys/errno.h header.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9537
2019-11-01 10:31:42 -07:00
Matthew Macy 32682b0c03 Fix icp build on FreeBSD
- ROTATE_LEFT is not used by amd64, move it down within
  the scope it's used to silence a clang warning.

- __unused is an alias for the compiler annotation
  __attribute__((__unused__)) on FreeBSD.  Rename the
  field to ____unused.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9538
2019-11-01 10:27:53 -07:00
Matthew Macy 156f74fc03 Return an error code from zfs_acl_chmod_setattr
The FreeBSD implementation can fail, allow this function to
fail and add the required error handling for Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9541
2019-11-01 10:19:11 -07:00
Matthew Macy c4ae27c763 Move sha2.h to platform code
FreeBSD has its own sha routines that the port uses.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9530
2019-10-31 15:45:58 -07:00
Brian Behlendorf ab44be142a ZTS: Fix removal_with_errors
The removal_with_errors.ksh test case could occasionally complete
the removal process instead of canceling due to an injected error.

To prevent this false positive, export and import the pool between
test phases to flush the ARC cache.  Furthermore, double the amount
of data in the pool to increase the removal time.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9528
2019-10-31 12:57:42 -07:00
Matthew Macy a5308e252d Don't cast away const
Follow up to 511fce6b which missed a cast.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9533
2019-10-31 10:38:03 -07:00
Matthew Macy 59055a0164 Include prototypes for vdev_initialize
Address two prototype related warnings emitted by clang.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9535
2019-10-31 10:09:01 -07:00
Matthew Macy 2a3aa5a109 Factor Linux specific code out of spa_misc.c
Move these Linux module parameter get/set helpers in to
platform specific code.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9457
2019-10-31 09:52:22 -07:00
alaviss 936e2d6d3e dracut/zfs-load-key.sh: properly remove prefixes
Removes the 'ZFS=' prefix from $BOOTFS instead of $root. This makes sure
that the 'zfs:' prefix remains stripped so that users with
'root=zfs:dataset' cmdline can have key loaded on boot again.

Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Dacian Reece-Stremtan <dacianstremtan@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes #9520
2019-10-30 14:38:41 -07:00
Brian Behlendorf f63e2fe132 Fix contrib/zcp/Makefile.am
Remove the stray leading + from the Makefile.  This was
preventing the autosnap.lua channel program from being
properly included by `make dist`.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9527
2019-10-30 12:37:49 -07:00
Romain Dolbeau 0b2a642351 Add AVX512BW variant of fletcher
It is much faster than AVX512F when byteswapping on Skylake-SP
and newer, as we can do the byteswap in a single vshufb instead
of many instructions.

Reviewed by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net>
Closes #9517
2019-10-30 12:26:14 -07:00
Tom Caputi bae11ba8dc Fix 'zfs change-key' with unencrypted child
Currently, when you call 'zfs change-key' on an encrypted dataset
that has an unencrypted child, the code will trigger a VERIFY.
This VERIFY is leftover from before we allowed unencrypted
datasets to exist underneath encrypted ones. This patch fixes the
issue by simply replacing the VERIFY with an early return when
recursing through datasets.

Reviewed by: Jason King <jason.brian.king@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9524
2019-10-30 11:27:28 -07:00
Matthew Macy d46f0deb03 Add wrapper for Linux BLKFLSBUF ioctl
FreeBSD has no analog. Buffered block devices were removed a decade
plus ago.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9508
2019-10-28 09:53:39 -07:00
Matthew Macy 4a22ba5be0 Minor spa portability fixes
- FreeBSD's rootpool import code uses spa_config_parse
- Move the zvol_create_minors call out from under the
  spa_namespace_lock in spa_import. It isn't needed and it causes
  a lock order reversal on FreeBSD.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9499
2019-10-28 09:51:53 -07:00
Chunwei Chen 7125a109dc Fix zpool history unbounded memory usage
In original implementation, zpool history will read the whole history
before printing anything, causing memory usage goes unbounded. We fix
this by breaking it into read-print iterations.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #9516
2019-10-28 09:49:44 -07:00
loli10K e35704647e Fix for ARC sysctls ignored at runtime
This change leverage module_param_call() to run arc_tuning_update()
immediately after the ARC tunable has been updated as suggested in
cffa8372 code review.

A simple test case is added to the ZFS Test Suite to prevent future
regressions in functionality.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9487  
Closes #9489
2019-10-26 15:22:19 -07:00
Matthew Macy 6963414d70 Remove unneeded header from libzpool/kernel.c
The sys/signal.h header doesn't exist on FreeBSD, nor is
it needed on Linux.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9510
2019-10-26 10:07:59 -07:00
Matthew Macy 9e2ca0b3c6 Fix header guard typo
Fix header guard typo accidentally introduced by #9497.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9514
2019-10-26 10:04:10 -07:00
Ryan Moeller fdf5593232 ZTS: Move more tests to linux.run
Tests that rely on special filesystems that are specific to Linux
should only be run on Linux.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9512
2019-10-25 20:03:46 -07:00
Matthew Macy 5f3eceb98f Remove unused headers from uu_string.c
The malloc.h include is gratuitous and runs in to the following error
on FreeBSD:

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9509
2019-10-25 13:53:50 -07:00
Matthew Macy 24cf9f4eb2 Use zfs_ioctl() in zinject.c
Consistently use the `zfs_ioctl()` wrapper since `ioctl()` cannot be
called directly due to differing semantics between platforms.

Follow up PR to #9492.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9507
2019-10-25 13:50:34 -07:00
Matthew Macy 0ee89a1252 Remove non-portable pointer is valid assert
This assert makes non portable assumptions about the state of memory
returned by the memory allocator.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9506
2019-10-25 13:46:07 -07:00
Matthew Macy c392c5aec0 Move final zvol_remove_minors to common code
This logic is not platform dependent and should reside in the
common code.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9505
2019-10-25 13:42:54 -07:00
Matthew Macy 1952fe0e25 Move platform dependent errno aliases
EBADE, EBADR, and ENOANO do not exist on FreeBSD

The libspl errno.h is similarly platform dependent.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9498
2019-10-25 13:40:50 -07:00
Matthew Macy 68a1b1589a Remove sdt.h
It's mostly a noop on ZoL and it conflicts with platforms that 
support dtrace.  Remove this header to resolve the conflict.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9497
2019-10-25 13:38:37 -07:00
Tom Caputi b4238327b4 Fix incremental recursive encrypted receive
Currently, incremental recursive encrypted receives fail to work
for any snapshot after the first. The reason for this is because
the check in zfs_setup_cmdline_props() did not properly realize
that when the user attempts to use '-x encryption' in this
situation, they are not really overriding the existing encryption
property and instead are attempting to prevent it from changing.
This resulted in an error message stating: "encryption property
'encryption' cannot be set or excluded for raw or incremental
streams".

This problem is fixed by updating the logic to expect this use
case.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9494
2019-10-24 10:51:01 -07:00
Ryan Moeller 28f7427ab1 ZTS: Move tmpfile tests to linux.run
O_TMPFILE is not available on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9503
2019-10-24 10:49:19 -07:00
Ryan Moeller 4c6225b688 ZTS: Consistency pass for .ksh extensions
* Use .ksh extension for ksh scripts, not .sh
* Remove .ksh extension from tests in common.run

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9502
2019-10-24 10:47:47 -07:00
Brian Behlendorf 10fa254539 Linux 4.14, 4.19, 5.0+ compat: SIMD save/restore
Contrary to initial testing we cannot rely on these kernels to
invalidate the per-cpu FPU state and restore the FPU registers.
Nor can we guarantee that the kernel won't modify the FPU state
which we saved in the task struck.

Therefore, the kfpu_begin() and kfpu_end() functions have been
updated to save and restore the FPU state using our own dedicated
per-cpu FPU state variables.

This has the additional advantage of allowing us to use the FPU
again in user threads.  So we remove the code which was added to
use task queues to ensure some functions ran in kernel threads.

Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #9346
Closes #9403
2019-10-24 10:17:33 -07:00
Matthew Macy b834b58ae6 Use zfs_ioctl with zfs_cmd_t in libzfs
Consistently use the `zfs_ioctl()` wrapper since `ioctl()` cannot be
called directly due to differing semantics between platforms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9492
2019-10-23 17:29:43 -07:00
chrisrd 05d07ba9a7 Don't call arc_buf_destroy on unallocated arc_buf
Fixes an obvious issue of calling arc_buf_destroy() on an
unallocated arc_buf.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #9453
2019-10-23 15:26:17 -07:00
Matthew Macy 64b2e7d7ec Use platform independent error code for libzfs_run_process_impl
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9493
2019-10-23 13:48:31 -07:00
Matthew Macy 685abd595c Use correct format string when printing int8
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9486
2019-10-20 20:37:30 -07:00
Matthew Macy 8b2d097c17 Remove gratuitous Linux only include in ztest & zdb
We don't need to include stdio_ext.h

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9483
2019-10-19 17:08:19 -07:00
John Wren Kennedy 4063440b73 ZTS: Written props test fails with 4k disks
With 4k disks, this test will fail in the last section because the
expected human readable value of 20.0M is reported as 20.1M. Rather than
use the human readable property, switch to the parsable property and
verify that the values are reasonably close.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #9477
2019-10-18 13:27:02 -04:00
Serapheim Dimitropoulos 9f3c72a2a8 Name anonymous enum of KMC_BIT constants
Giving a name to this enum makes it discoverable from
debugging tools like DRGN and SDB. For example, with
the name proposed on this patch we can iterate over
these values in DRGN:
```
>>> prog.type('enum kmc_bit').enumerators
(('KMC_BIT_NOTOUCH', 0), ('KMC_BIT_NODEBUG', 1),
('KMC_BIT_NOMAGAZINE', 2), ('KMC_BIT_NOHASH', 3),
('KMC_BIT_QCACHE', 4), ('KMC_BIT_KMEM', 5),
('KMC_BIT_VMEM', 6), ('KMC_BIT_SLAB', 7),
...
```
This enables SDB to easily pretty-print the flags of
the spl_kmem_caches in the system like this:
```
> spl_kmem_caches -o "name,flags,total_memory"
name                                       flags total_memory
------------------------ ----------------------- ------------
abd_t                    KMC_NOMAGAZINE|KMC_SLAB        4.5MB
arc_buf_hdr_t_full       KMC_NOMAGAZINE|KMC_SLAB       12.3MB
... <cropped> ...
ddt_cache                               KMC_VMEM      583.7KB
ddt_entry_cache          KMC_NOMAGAZINE|KMC_SLAB         0.0B
... <cropped> ...
zio_buf_1048576             KMC_NODEBUG|KMC_VMEM         0.0B
... <cropped> ...
```

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9478
2019-10-18 13:25:44 -04:00
Serapheim Dimitropoulos 851eda3566 Update skc_obj_alloc for spl kmem caches that are backed by Linux
Currently, for certain sizes and classes of allocations we use
SPL caches that are backed by caches in the Linux Slab allocator
to reduce fragmentation and increase utilization of memory. The
way things are implemented for these caches as of now though is
that we don't keep any statistics of the allocations that we
make from these caches.

This patch enables the tracking of allocated objects in those
SPL caches by making the trade-off of grabbing the cache lock
at every object allocation and free to update the respective
counter.

Additionally, this patch makes those caches visible in the
/proc/spl/kmem/slab special file.

As a side note, enabling the specific counter for those caches
enables SDB to create a more user-friendly interface than
/proc/spl/kmem/slab that can also cross-reference data from
slabinfo. Here is for example the output of one of those
caches in SDB that outputs the name of the underlying Linux
cache, the memory of SPL objects allocated in that cache,
and the percentage of those objects compared to all the
objects in it:
```
> spl_kmem_caches | filter obj.skc_name == "zio_buf_512" | pp
name        ...            source total_memory util
----------- ... ----------------- ------------ ----
zio_buf_512 ... kmalloc-512[SLUB]       16.9MB    8
```

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9474
2019-10-18 13:24:28 -04:00
Matthew Macy c9c9c1e213 OpenZFS restructuring - ARC memory pressure
Factor Linux specific memory pressure handling out of ARC.  Each
platform will have different available interfaces for managing memory
pressure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9472
2019-10-18 13:23:19 -04:00
Ryan Moeller 4313a5b4c5 Detect if sed supports --in-place
Not all versions of sed have the --in-place flag. Detect support for
the flag during ./configure and provide a fallback mechanism for those
systems where sed's behavior differs. The autoconf variable
${ac_inplace} can be used to choose the correct flags for editing a
file in place with sed.

Replace violating usages in Makefile.am with ${ac_inplace}.

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9463
2019-10-16 19:19:48 -07:00
Matthew Macy 08f530c699 Make zfsdev_getminor signature cross platform
Only pass the file descriptor to make zfsdev_get_miror() portable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9466
2019-10-16 18:43:52 -07:00
Matthew Macy 0e939e434a Move linux specific mmp module_param_call handler to platform code
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9465
2019-10-16 18:37:31 -07:00
Matthew Macy cf2eba666e Add default case to lua kernel code
Some platforms, e.g. FreeBSD, support user space setjmp
semantics in kernel.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9450
2019-10-16 18:36:15 -07:00
Matthew Macy 47d57dbccf Fix signature for private functions without header declarations
Clang will complain if a function has no prior declaration

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9467
2019-10-16 18:34:19 -07:00
Matthew Macy 177c79dfbe Remove dead code and cleanup scoping in dmu_send.c
This addresses a number of problems with dmu_send.c:

* bp_span is unused which makes clang complain
* dump_write conflicts with FreeBSD's existing core dump code
* range_alloc is private to the file and not declared in any headers
  causing clang to complain

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9432
2019-10-13 19:25:19 -07:00
Paul Dagnelie 511fce6b1f Don't call sizeof on void
We get the sizeof the appropriate type, and don't cast away const.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9455
2019-10-13 19:21:51 -07:00
Matthew Macy cdbba101f4 Move zfs_onexit_fd_hold to platform code
FreeBSD has a very different implementation.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9442
2019-10-13 19:19:39 -07:00
Matthew Macy c324701332 Move zio_delay_interrupt to platform code
FreeBSD has its own implementation as do other platforms.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9439
2019-10-13 19:15:27 -07:00
Brian Behlendorf d292307212 Modify sharenfs=on default behavior
While it may sometimes be convenient to export an NFS filesystem with
no_root_squash it should not be the default behavior.  Align the
default behavior with the Linux NFS server defaults.  To restore
the previous behavior use 'zfs set sharenfs="no_root_squash,..."'.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9397
Closes #9425
2019-10-13 19:13:26 -07:00
chrisrd 2c6fa6eafb Typo fix in comment: dso_dryrun
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #9452
2019-10-11 10:33:13 -07:00
Brian Behlendorf ea25ed2365 ZTS: Fix zpool_status_-s
After commit 5e74ac51 which split and reordered the run files the
`zpool_status_-s` test began failing.  The new ordering placed
the test after a previous test which used `zpool replace` to replace
a disk but did not clear its label.  This resulted in the next test,
`zpool_status_-s`, failing because of the potentially active
pool being detected on the replaced vdev.

    /dev/loop0 is part of potentially active pool 'testpool'

Use the default_mirror_setup_noexit() and default_cleanup_noexit()
functions to create the pool in `zpool_status_-s`.  They use the -f
flag by default.

In the `scrub_after_resilver` test wipe the label during cleanup
to prevent future failures if the tests are again reordered.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9451
2019-10-11 10:22:20 -07:00
Paul Dagnelie 516a83f886 Function name and comment updates
Rename certain functions for more consistency when they share common 
features. Make comments clearer about what arguments should be passed 
to the insert and add functions.

Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9441
2019-10-11 10:13:21 -07:00
Matthew Macy bce795ad7a Remove linux/mod_compat.h from common code
It is no longer necessary; mod_compat.h is included from zfs_context.h.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9449
2019-10-11 10:10:20 -07:00
Matthew Macy af1698f59b Expose dmu_buf_hold_array_by_dnode to platform code
FreeBSD uses this in its pager ops routines

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9431
2019-10-11 10:06:18 -07:00
Richard Yao 803884217f Implement ZPOOL_IMPORT_UDEV_TIMEOUT_MS
Since 0.7.0, zpool import would unconditionally block on udev for 30
seconds. This introduced a regression in initramfs environments that
lack udev (particularly mdev based environments), yet use a zfs userland
tools intended for the system that had been built against udev. Gentoo's
genkernel is the main example, although custom user initramfs
environments would be similarly impacted unless special builds of the
ZFS userland utilities were done for them.  Such environments already
have their own mechanisms for blocking until device nodes are ready
(such as genkernel's scandelay parameter), so it is unnecessary for
zpool import to block on a non-existent udev until a timeout is reached
inside of them.

Rather than trying to intelligently determine whether udev is available
on the system to avoid unnecessarily blocking in such environments, it
seems best to just allow the environment to override the timeout.  I
propose that we add an environment variable called
ZPOOL_IMPORT_UDEV_TIMEOUT_MS. Setting it to 0 would restore the 0.6.x
behavior that was more desirable in mdev based initramfs environments.
This allows the system user land utilities to be reused when building
mdev-based initramfs archives.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #9436
2019-10-11 09:52:48 -07:00
Ryan Moeller d96635a5c7 Clarify loop variable name in zfs copies test
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9445
2019-10-11 09:50:46 -07:00
Ryan Moeller 767f65cf73 Fix some style nits in tests
Mostly whitespace changes, no functional changes intended.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9447
2019-10-11 09:49:48 -07:00
loli10K 715c996d3b Fix pool creation with feature@allocation_classes disabled
When "feature@allocation_classes" is not enabled on the pool no vdev
with "special" or "dedup" allocation type should be allowed to exist in
the vdev tree.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9427 
Closes #9429
2019-10-10 16:39:41 -07:00
Matthew Macy 2516a87821 Move get_temporary_prop to platform code
Temporary property handling at the VFS layer requires
platform specific code.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9401
2019-10-10 15:59:34 -07:00
Matthew Macy 6501906280 Add kmem cache accessors
Make the metaslab platform agnostic again by adding
accessor functions which can be implemented by each
platform.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9404
2019-10-10 15:45:52 -07:00
Matthew Macy eedb3a62b9 Make zil_async_to_sync visible to platform code
FreeBSD's zvol platform code requires access to the
zil_async_to_sync() function.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9440
2019-10-10 15:39:44 -07:00
Brian Behlendorf f3dc4a85e9 Update zfs program command usage
Update the zfs(8) man page to clearly describe that arguments for
channel programs are to be listed after the -- sentinel which
terminates argument processing.  This behavior is supported by
getopt on Linux, FreeBSD, and Illumos according to each platforms
respective man pages.

    zfs program [-jn] [-t instruction-limit] [-m memory-limit]
        pool script [--] arg1 ...

Reviewed-by: Clint Armstrong <clint@clintarmstrong.net>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9056 
Closes #9428
2019-10-10 09:49:45 -07:00
Matthew Macy e4f5fa1229 Fix strdup conflict on other platforms
In the FreeBSD kernel the strdup signature is:

```
char	*strdup(const char *__restrict, struct malloc_type *);
```

It's unfortunate that the developers have chosen to change
the signature of libc functions - but it's what I have to
deal with.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9433
2019-10-10 09:47:06 -07:00
Matthew Macy c5858ff946 Make clang happy with vdev_raidz_ code
The macros are used to generate code for conditions without a
corresponding branch. This is not a problem in practice, but
clang has no way of knowing that. Add a default branch with a
VERIFY(0) to indicate that it "can't happen"

```
In file included from \
/usr/home/mmacy/devel/ZoF/module/zfs/vdev_raidz_math_sse2.c:607:
/usr/home/mmacy/devel/ZoF/module/zfs/vdev_raidz_math_impl.h:281:3: \
error: no case matching constant switch condition '3' [-Werror]
```

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9434
2019-10-10 09:45:37 -07:00
Ryan Moeller 381d91de7d Sort list of generated files in configure.ac
No functional change, just a quality of life improvement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9426
2019-10-09 10:54:22 -07:00
Ryan Moeller 5e74ac51c7 Move platform independent tests to a shared runfile
Tests that aren't limited to running on Linux can be moved to a common
runfile to be shared with other platforms.

The test runner and wrapper script are enhanced to allow specifying
multiple runfiles as a comma-separated list. The default runfiles are
now "common.run,PLATFORM.run" where PLATFORM is determined at run time.

Sections in runfiles that share a path with another runfile can append
a colon separator and an identifier to the path in the section
name, ie `[tests/functional/atime:Linux]`, to avoid overriding the tests
specified by other runfiles.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9391
2019-10-09 10:39:26 -07:00
Paul Dagnelie ca5777793e Reduce loaded range tree memory usage
This patch implements a new tree structure for ZFS, and uses it to 
store range trees more efficiently.

The new structure is approximately a B-tree, though there are some 
small differences from the usual characterizations. The tree has core 
nodes and leaf nodes; each contain data elements, which the elements 
in the core nodes acting as separators between its children. The 
difference between core and leaf nodes is that the core nodes have an 
array of children, while leaf nodes don't. Every node in the tree may 
be only partially full; in most cases, they are all at least 50% full 
(in terms of element count) except for the root node, which can be 
less full. Underfull nodes will steal from their neighbors or merge to 
remain full enough, while overfull nodes will split in two. The data 
elements are contained in tree-controlled buffers; they are copied 
into these on insertion, and overwritten on deletion. This means that 
the elements are not independently allocated, which reduces overhead, 
but also means they can't be shared between trees (and also that 
pointers to them are only valid until a side-effectful tree operation 
occurs). The overhead varies based on how dense the tree is, but is 
usually on the order of about 50% of the element size; the per-node 
overheads are very small, and so don't make a significant difference. 
The trees can accept arbitrary records; they accept a size and a 
comparator to allow them to be used for a variety of purposes.

The new trees replace the AVL trees used in the range trees today. 
Currently, the range_seg_t structure contains three 8 byte integers 
of payload and two 24 byte avl_tree_node_ts to handle its storage in 
both an offset-sorted tree and a size-sorted tree (total size: 64 
bytes). In the new model, the range seg structures are usually two 4 
byte integers, but a separate one needs to exist for the size-sorted 
and offset-sorted tree. Between the raw size, the 50% overhead, and 
the double storage, the new btrees are expected to use 8*1.5*2 = 24 
bytes per record, or 33.3% as much memory as the AVL trees (this is 
for the purposes of storing metaslab range trees; for other purposes, 
like scrubs, they use ~50% as much memory).

We reduced the size of the payload in the range segments by teaching 
range trees about starting offsets and shifts; since metaslabs have a 
fixed starting offset, and they all operate in terms of disk sectors, 
we can store the ranges using 4-byte integers as long as the size of 
the metaslab divided by the sector size is less than 2^32. For 512-byte
sectors, this is a 2^41 (or 2TB) metaslab, which with the default
settings corresponds to a 256PB disk. 4k sector disks can handle 
metaslabs up to 2^46 bytes, or 2^63 byte disks. Since we do not 
anticipate disks of this size in the near future, there should be 
almost no cases where metaslabs need 64-byte integers to store their 
ranges. We do still have the capability to store 64-byte integer ranges 
to account for cases where we are storing per-vdev (or per-dnode) trees, 
which could reasonably go above the limits discussed. We also do not 
store fill information in the compact version of the node, since it 
is only used for sorted scrub.

We also optimized the metaslab loading process in various other ways
to offset some inefficiencies in the btree model. While individual
operations (find, insert, remove_from) are faster for the btree than 
they are for the avl tree, remove usually requires a find operation, 
while in the AVL tree model the element itself suffices. Some clever 
changes actually caused an overall speedup in metaslab loading; we use 
approximately 40% less cpu to load metaslabs in our tests on Illumos.

Another memory and performance optimization was achieved by changing 
what is stored in the size-sorted trees. When a disk is heavily 
fragmented, the df algorithm used by default in ZFS will almost always 
find a number of small regions in its initial cursor-based search; it 
will usually only fall back to the size-sorted tree to find larger 
regions. If we increase the size of the cursor-based search slightly, 
and don't store segments that are smaller than a tunable size floor 
in the size-sorted tree, we can further cut memory usage down to 
below 20% of what the AVL trees store. This also results in further 
reductions in CPU time spent loading metaslabs.

The 16KiB size floor was chosen because it results in substantial memory 
usage reduction while not usually resulting in situations where we can't 
find an appropriate chunk with the cursor and are forced to use an 
oversized chunk from the size-sorted tree. In addition, even if we do 
have to use an oversized chunk from the size-sorted tree, the chunk 
would be too small to use for ZIL allocations, so it isn't as big of a 
loss as it might otherwise be. And often, more small allocations will 
follow the initial one, and the cursor search will now find the 
remainder of the chunk we didn't use all of and use it for subsequent 
allocations. Practical testing has shown little or no change in 
fragmentation as a result of this change.

If the size-sorted tree becomes empty while the offset sorted one still 
has entries, it will load all the entries from the offset sorted tree 
and disregard the size floor until it is unloaded again. This operation 
occurs rarely with the default setting, only on incredibly thoroughly 
fragmented pools.

There are some other small changes to zdb to teach it to handle btrees, 
but nothing major.
                                           
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sebastien Roy seb@delphix.com
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9181
2019-10-09 10:36:03 -07:00
Igor K d0a84ba92b ZTS: Fix mmp_hostid test
Correctly use the `mntpnt_fs` variable, and include additional
logic to ensure the /etc/hostid is correct set up and cleaned up.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9349
2019-10-08 13:40:17 -07:00
George Melikov 7b50929851 module/Makefile.in: don't run xargs if empty
If stdin if empty - don't run xargs command,
otherwise we can get `cp: missing file operand`
error.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #9418
2019-10-08 10:10:23 -07:00
Brian Behlendorf 94bcf6f5e3 ZTS: Fix trim/trim_config and trim/autotrim_config
There have been occasional CI failures which occur when the trimmed
vdev size exactly matches the target size.  Resolve this by slightly
relaxing the conditional and checking for -ge rather than -gt.  In
all of the cases observer, the values match exactly.  For example:

    Failure /mnt/trim-vdev1 is 768 MB which is not -gt than 768 MB

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9399
2019-10-04 12:38:07 -07:00
Brian Behlendorf 6bd4f4545d Fix automount for root filesystems
Commit 093bb64 resolved an automount failures for chroot'd processes
but inadvertently broke automounting for root filesystems where the
vfs_mntpoint is NULL.  Resolve the issue by checking for NULL in order
to generate the correct path.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9381
Closes #9384
2019-10-04 12:30:51 -07:00
Matthew Macy 2cc479d049 Rename rangelock_ functions to zfs_rangelock_
A rangelock KPI already exists on FreeBSD.  Add a zfs_ prefix as
per our convention to prevent any conflict with existing symbols.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9402
2019-10-03 15:54:29 -07:00
Tony Nguyen 64b6c47d90 dbuf_hold_impl() cleanup to improve cached read performance
Currently every dbuf_hold_impl() incurs kmem_alloc() and kmem_free()
which can be costly for cached read performance.

This change reverts the dbuf_hold_impl() fix stack commit, i.e.
fc5bb51f08 to eliminate the extra
kmem_alloc() and kmem_free() operations and improve cached read
performance. With the change, each dbuf_hold_impl() frame uses 40 bytes
more, total of 800 for 20 recursive levels. Linux kernel stack sizes are
8K and 16K for 32bit and 64bit, respectively, so stack overrun risk is
limited.

Sample stack output comparisons with 50 PB file and recordsize=512
Current code
 11)     2240      64   arc_alloc_buf+0x4a/0xd0 [zfs]
 12)     2176     264   dbuf_read_impl.constprop.16+0x2e3/0x7f0 [zfs]
 13)     1912     120   dbuf_read+0xe5/0x520 [zfs]
 14)     1792      56   dbuf_hold_impl_arg+0x572/0x630 [zfs]
 15)     1736      64   dbuf_hold_impl_arg+0x508/0x630 [zfs]
 16)     1672      64   dbuf_hold_impl_arg+0x508/0x630 [zfs]
 17)     1608      40   dbuf_hold_impl+0x23/0x40 [zfs]
 18)     1568      40   dbuf_hold_level+0x32/0x60 [zfs]
 19)     1528      16   dbuf_hold+0x16/0x20 [zfs]

dbuf_hold_impl() cleanup
 11)     2320      64   arc_alloc_buf+0x4a/0xd0 [zfs]
 12)     2256     264   dbuf_read_impl.constprop.17+0x2e3/0x7f0 [zfs]
 13)     1992     120   dbuf_read+0xe5/0x520 [zfs]
 14)     1872      96   dbuf_hold_impl+0x50f/0x5e0 [zfs]
 15)     1776     104   dbuf_hold_impl+0x4df/0x5e0 [zfs]
 16)     1672     104   dbuf_hold_impl+0x4df/0x5e0 [zfs]
 17)     1568      40   dbuf_hold_level+0x32/0x60 [zfs]
 18)     1528      16   dbuf_hold+0x16/0x20 [zfs]

Performance observations on 8K recordsize filesystem:
- 8/128/1024K at 1-128 sequential cached read, ~3% improvement

Testing done on Ubuntu 18.04 with 4.15 kernel, 8vCPUs and SSD storage on
VMware ESX.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #9351
2019-10-03 15:33:38 -07:00
Matthew Macy 73cdcc6323 OpenZFS restructuring - libzfs
Factor Linux specific functionality out of libzfs.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Macy <mmacy@FreeBSD.org>
Closes #9377
2019-10-03 10:33:16 -07:00
Matthew Macy 7c5eff9400 OpenZFS restructuring - libzutil
Factor Linux specific functionality out of libzutil.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9356
2019-10-03 10:20:44 -07:00
Brian Behlendorf e1c216fb0c ZTS: Fix upgrade_readonly_pool
Update cleanup_upgrade to use destroy_dataset and destroy_pool
when performing cleanup.  These wrappers retry if the pool is busy
preventing occasional failures like those observed when running
tests upgrade_readonly_pool.  For example:

    SUCCESS: test enabled == enabled
    User accounting upgrade is not executed on readonly pool
    NOTE: Performing local cleanup via log_onexit (cleanup_upgrade)
    cannot destroy 'testpool': pool is busy
    ERROR: zpool destroy testpool exited 1

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9400
2019-10-03 09:39:13 -07:00
Didier Roche 8ae8b2a144 Workaround to avoid a race when /var/lib is a persistent dataset
If /var/lib is a dataset not under <pool>/ROOT/<root_dataset>, as
proposed in the ubuntu root on zfs upstream guide
(https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS),
we end up with a race where some services, like systemd-random-seed
are writing under /var/lib, while zfs-mount is called. zfs mount will
then potentially fail because of /var/lib isn't empty and so, can't be
mounted.
Order those 2 units for now (more may be needed) as we can't declare
virtually a provide mount point to match
"RequiresMountsFor=/var/lib/systemd/random-seed" from
systemd-random-seed.service.
The optional generator for zfs 0.8 fixes it, but it's not enabled
by default nor necessarily required.

Example:
- rpool/ROOT/ubuntu (mountpoint = /)
- rpool/var/ (mountpoint = /var)
- rpool/var/lib  (mountpoint = /var/lib)

Both zfs-mount.service and systemd-random-seed.service are starting
After=systemd-remount-fs.service. zfs-mount.service should be done
before local-fs.target while systemd-random-seed.service should finish
before sysinit.target (which is a later target).
Ideally, we would have a way for zfs mount -a unit to declare all paths
or move systemd-random-seed after local-fs.target.

Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Didier Roche <didrocks@ubuntu.com>
Closes #9360
2019-10-02 10:51:55 -07:00
Matthew Macy d31277abb1 OpenZFS restructuring - libspl
Factor Linux specific pieces out of libspl.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9336
2019-10-02 10:39:48 -07:00
Matthew Macy 6360e2779e Add inode accessors to common code
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9389
2019-10-02 09:15:12 -07:00
Matthew Macy 13a4027a7c OpenZFS restructuring - arc_stats
Make arc_stats visible to platform code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9386
2019-10-01 16:35:05 -07:00
Matthew Macy 7111c86ca3 Enable clang to use intrinsics for lz4
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9385
2019-10-01 13:17:32 -07:00
dacianstremtan bd76e6817c Fix for zfs-dracut regression
Line 31 and 32 overwrote the ${root} variable which broke mount-zfs.sh
We have create a new variable for the dataset instead of overwriting the
${root} variable in zfs-load-key.sh${root} variable in zfs-load-key.sh

Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dacian Reece-Stremtan <dacianstremtan@gmail.com>
Closes #8913 
Closes #9379
2019-10-01 12:54:27 -07:00
Brian Behlendorf 608f8749a1 Perform KABI checks in parallel
Reduce the time required for ./configure to perform the needed
KABI checks by allowing kbuild to compile multiple test cases in
parallel.  This was accomplished by splitting each test's source
code from the logic handling whether that code could be compiled
or not.

By introducing this split it's possible to minimize the number of
times kbuild needs to be invoked.  As importantly, it means all of
the tests can be built in parallel.  This does require a little extra
care since we expect some tests to fail, so the --keep-going (-k)
option must be provided otherwise some tests may not get compiled.
Furthermore, since a failure during the kbuild modpost phase will
result in an early exit; the final linking phase is limited to tests
which passed the initial compilation and produced an object file.

Once everything has been built the configure script proceeds as
previously.  The only significant difference is that it now merely
needs to test for the existence of a .ko file to determine the
result of a given test.  This vastly speeds up the entire process.

New test cases should use ZFS_LINUX_TEST_SRC to declare their test
source code and ZFS_LINUX_TEST_RESULT to check the result.  All of
the existing kernel-*.m4 files have been updated accordingly, see
config/kernel-current-time.m4 for a basic example.  The legacy
ZFS_LINUX_TRY_COMPILE macro has been kept to handle special cases
but it's use is not encouraged.

                  master (secs)   patched (secs)
                  -------------   ----------------
autogen.sh        61              68
configure         137             24  (~17% of current run time)
make -j $(nproc)  44              44
make rpms         287             150

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8547 
Closes #9132
Closes #9341
2019-10-01 12:50:34 -07:00
Prakash Surya 99573cc053 Timeout waiting for ZVOL device to be created
We've seen cases where after creating a ZVOL, the ZVOL device node in
"/dev" isn't generated after 20 seconds of waiting, which is the point
at which our applications gives up on waiting and reports an error.

The workload when this occurs is to "refresh" 400+ ZVOLs roughly at the
same time, based on a policy set by the user. This refresh operation
will destroy the ZVOL, and re-create it based on a snapshot.

When this occurs, we see many hundreds of entries on the "z_zvol" taskq
(based on inspection of the /proc/spl/taskq-all file). Many of the
entries on the taskq end up in the "zvol_remove_minors_impl" function,
and I've measured the latency of that function:

Function = zvol_remove_minors_impl
msecs               : count     distribution
  0 -> 1          : 0        |                                        |
  2 -> 3          : 0        |                                        |
  4 -> 7          : 1        |                                        |
  8 -> 15         : 0        |                                        |
 16 -> 31         : 0        |                                        |
 32 -> 63         : 0        |                                        |
 64 -> 127        : 1        |                                        |
128 -> 255        : 45       |****************************************|
256 -> 511        : 5        |****                                    |

That data is from a 10 second sample, using the BCC "funclatency" tool.
As we can see, in this 10 second sample, most calls took 128ms at a
minimum. Thus, some basic math tells us that in any 20 second interval,
we could only process at most about 150 removals, which is much less
than the 400+ that'll occur based on the workload.

As a result of this, and since all ZVOL minor operations will go through
the single threaded "z_zvol" taskq, the latency for creating a single
ZVOL device can be unreasonably large due to other ZVOL activity on the
system. In our case, it's large enough to cause the application to
generate an error and fail the operation.

When profiling the "zvol_remove_minors_impl" function, I saw that most
of the time in the function was spent off-cpu, blocked in the function
"taskq_wait_outstanding". How this works, is "zvol_remove_minors_impl"
will dispatch calls to "zvol_free" using the "system_taskq", and then
the "taskq_wait_outstanding" function is used to wait for all of those
dispatched calls to occur before "zvol_remove_minors_impl" will return.

As far as I can tell, "zvol_remove_minors_impl" doesn't necessarily have
to wait for all calls to "zvol_free" to occur before it returns. Thus,
this change removes the call to "taskq_wait_oustanding", so that calls
to "zvol_free" don't affect the latency of "zvol_remove_minors_impl".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9380
2019-10-01 12:33:12 -07:00
Matthew Macy 3283f137d7 OpenZFS restructuring - zpool
Factor Linux specific functions out of the zpool command.
    
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9333
2019-09-30 12:16:06 -07:00
Matthew Macy 7bb0c29468 OpenZFS restructuring - zfs_ioctl
Refactor the zfs ioctls in to platform dependent and independent bits.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Signed-off-by: Matthew Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9301
2019-09-27 10:46:28 -07:00
Ben McGough 3768db24ab Adding slack notifier
Allow ZED notification via slack incoming webhook.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Ben McGough <bmcgough@fredhutch.org>
Closes #9076
Closes #9350
2019-09-26 09:52:10 -07:00
Tom Caputi bb61cc3185 Fix encryption hierarchy issues with zfs recv -d
Currently, the recv_fix_encryption_hierarchy() function accepts
'destsnap' as one of its parameters. Originally, this was intended
to be the top-level dataset of a receive (whether or not the
receive was recursive). Unfortunately, this parameter actually is
simply the input that is passed in from the command line. When
the user specifies 'zfs recv -d', this string is actually only the
name of the receiving pool since the rest of the name is derived
from the send stream. This causes the function to fail, leaving
some datasets with an invalid encryption hierarchy.

This patch resolves this problem by passing in the top_zfs variable
instead. In order to make this work, this patch also includes some
changes that ensure the value is always present when we need it.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9273
Closes #9309
2019-09-25 17:02:32 -07:00
Ryan Moeller 479d7d3ca6 ZTS: Fix typos in zfs_destroy tests cleanup
lot_must -> log_must

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9362
2019-09-25 11:42:04 -07:00
Brian Behlendorf 5e5cefbaee ZTS: harden xattr/cleanup.ksh
When the xattr/cleanup.ksh script is unable to remove the test group
due to an active process then it will not call default_cleanup.  This
will result in a zvol_ENOSPC/setup failure when attempting to create
the /mnt/testdir directory which will already exist.

Resolve the issue by performing the default_cleanup before removing
the test user and group to ensure this step always happens.  Also
allow one more retry to further minimize the likelihood of the
cleanup failing.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9358
2019-09-25 09:24:45 -07:00
Brian Behlendorf f81d5ef686 Add warning for zfs_vdev_elevator option removal
Originally the zfs_vdev_elevator module option was added as a
convenience so the requested elevator would be automatically set
on the underlying block devices.  At the time this was simple
because the kernel provided an API function which did exactly this.

This API was then removed in the Linux 4.12 kernel which prompted
us to add compatibly code to set the elevator via a usermodehelper.
While well intentioned this introduced a bug which could cause a
system hang, that issue was subsequently fixed by commit 2a0d4188.

In order to avoid future bugs in this area, and to simplify the code,
this functionality is being deprecated.  A console warning has been
added to notify any existing consumers and the documentation updated
accordingly.  This option will remain for the lifetime of the 0.8.x
series for compatibility but if planned to be phased out of master.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8664
Closes #9317
2019-09-25 09:23:29 -07:00
Matthew Macy 5df7e9d85c OpenZFS restructuring - zvol
Refactor the zvol in to platform dependent and independent bits.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9295
2019-09-25 09:20:30 -07:00
loli10K d359e99c38 diff_cb() does not handle large dnodes
Trying to 'zfs diff' a snapshot with large dnodes will incorrectly try
to access its interior slots when dnodesize > sizeof(dnode_phys_t).
This is normally not an issue because the interior slots are
zero-filled, which report_dnode() handles calling
report_free_dnode_range(). However this is not the case for encrypted
large dnodes or filesystem using many SA based xattrs where the extra
data past the legacy dnode size boundary is interpreted as a
dnode_phys_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7678 
Closes #8931 
Closes #9343
2019-09-24 12:01:37 -07:00
Ryan Moeller 73d7820bba Use signed types to prevent subtraction overflow
The difference between the sizes could be positive or negative. Leaving
the types as unsigned means the result overflows when the difference is
negative and removing the labs() means we'll have introduced a bug. The
subtraction results in the correct value when the unsigned integer is
interpreted as a signed integer by labs().

Clang doesn't see that we're doing a subtraction and abusing the types.
It sees the result of the subtraction, an unsigned value, being passed
to an absolute value function and emits a warning which we treat as an
error.

Reviewed by: Youzhong Yang <youzhong@gmail.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9355
2019-09-22 15:27:53 -07:00
Kody A Kantor d49d7336dd Disabled resilver_defer feature leads to looping resilvers
When a disk is replaced with another on a pool with the resilver_defer
feature present, but not enabled the resilver activity restarts during
each spa_sync. This patch checks to make sure that the resilver_defer
feature is first enabled before requesting a deferred resilver.

This was originally fixed in illumos-joyent as OS-7982.

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Signed-off-by: Kody A Kantor <kody@kkantor.com>
External-issue: illumos-joyent OS-7982
Closes #9299 
Closes #9338
2019-09-22 15:25:39 -07:00
Ryan Moeller afc8f0a6ff Refactor libzfs_error_init newlines
Move the trailing newlines from the error message strings to the format
strings to more closely match the other error messages.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9330
2019-09-18 09:05:57 -07:00
Andriy Gapon dd262c9681 Fix dsl_scan_ds_clone_swapped logic
The was incorrect with respect to swapping dataset IDs both in the
on-disk ZAP object and the in-memory queue.

In both cases, if ds1 was already present, then it would be first
replaced with ds2 and then ds would be replaced back with ds1.
Also, both cases did not properly handle a situation where both ds1 and
ds2 are already queued.  A duplicate insertion would be attempted and
its failure would result in a panic.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #9140 
Closes #9163
2019-09-18 09:04:45 -07:00
loli10K fcd37b622b Device removal of indirect vdev panics the kernel
This commit fixes a NULL pointer dereference triggered in
spa_vdev_remove_top_check() by trying to "zpool remove" an indirect
vdev.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9327
2019-09-16 10:46:59 -07:00
loli10K b24771a8c9 Prevent gcc -Werror=maybe-uninitialized warnings in spa_wait_common()
This commit fixes the following build failure detected on Debian9
(GCC 6.3.0):

     CC [M]  module/zfs/spa.o
   module/zfs/spa.c: In function ‘spa_wait_common.part.31’:
   module/zfs/spa.c:9468:6: error: ‘in_progress’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
      if (!in_progress || spa->spa_waiters_cancel || error)
         ^
   cc1: all warnings being treated as errors

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9326
2019-09-16 10:46:02 -07:00
loli10K 7e15647ce9 ZTS: Fix /usr/bin/env: 'python2': No such file or directory
Since 4f342e45 env(1) must be able to find a "python2" executable in
the "constrained path" on systems configured with --with-python=2.x
otherwise the ZFS Test Suite won't be able to use Python scripts.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9325
2019-09-16 10:44:51 -07:00
Tom Caputi 637f0c6019 Fix clone handling with encryption roots
Currently, spa_keystore_change_key_sync_impl() does not recurse
into clones when updating encryption roots for either a call to
'zfs promote' or 'zfs change-key'. This can cause children of
these clones to end up in a state where they point to the wrong
dataset as the encryption root. It can also trigger ASSERTs in
some cases where the code checks reference counts on wrapping
keys. This patch fixes this issue by ensuring that this function
properly recurses into clones during processing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9267 
Closes #9294
2019-09-16 10:07:33 -07:00
loli10K 2a0d41889e Scrubbing root pools may deadlock on kernels without elevator_change() (#9321)
Originally the zfs_vdev_elevator module option was added as a
convenience so the requested elevator would be automatically set
on the underlying block devices. At the time this was simple
because the kernel provided an API function which did exactly this.

This API was then removed in the Linux 4.12 kernel which prompted
us to add compatibly code to set the elevator via a usermodehelper.

Unfortunately changing the evelator via usermodehelper requires reading
some userland binaries, most notably modprobe(8) or sh(1), from a zfs
dataset on systems with root-on-zfs. This can deadlock the system if
used during the following call path because it may need, if the data
is not already cached in the ARC, reading directly from disk while
holding the spa config lock as a writer:

  zfs_ioc_pool_scan()
    -> spa_scan()
      -> spa_scan()
        -> vdev_reopen()
          -> vdev_elevator_switch()
            -> call_usermodehelper()

While the usermodehelper waits sh(1), modprobe(8) is blocked in the
ZIO pipeline trying to read from disk:

  INFO: task modprobe:2650 blocked for more than 10 seconds.
       Tainted: P           OE     5.2.14
  modprobe        D    0  2650    206 0x00000000
  Call Trace:
   ? __schedule+0x244/0x5f0
   schedule+0x2f/0xa0
   cv_wait_common+0x156/0x290 [spl]
   ? do_wait_intr_irq+0xb0/0xb0
   spa_config_enter+0x13b/0x1e0 [zfs]
   zio_vdev_io_start+0x51d/0x590 [zfs]
   ? tsd_get_by_thread+0x3b/0x80 [spl]
   zio_nowait+0x142/0x2f0 [zfs]
   arc_read+0xb2d/0x19d0 [zfs]
   ...
   zpl_iter_read+0xfa/0x170 [zfs]
   new_sync_read+0x124/0x1b0
   vfs_read+0x91/0x140
   ksys_read+0x59/0xd0
   do_syscall_64+0x4f/0x130
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This commit changes how we use the usermodehelper functionality from
synchronous (UMH_WAIT_PROC) to asynchronous (UMH_NO_WAIT) which prevents
scrubs, and other vdev_elevator_switch() consumers, from triggering the
aforementioned issue.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Issue #8664 
Closes #9321
2019-09-13 18:09:59 -07:00
John Gallagher e60e158eff Add subcommand to wait for background zfs activity to complete
Currently the best way to wait for the completion of a long-running
operation in a pool, like a scrub or device removal, is to poll 'zpool
status' and parse its output, which is neither efficient nor convenient.

This change adds a 'wait' subcommand to the zpool command. When invoked,
'zpool wait' will block until a specified type of background activity
completes. Currently, this subcommand can wait for any of the following:

 - Scrubs or resilvers to complete
 - Devices to initialized
 - Devices to be replaced
 - Devices to be removed
 - Checkpoints to be discarded
 - Background freeing to complete

For example, a scrub that is in progress could be waited for by running

    zpool wait -t scrub <pool>

This also adds a -w flag to the attach, checkpoint, initialize, replace,
remove, and scrub subcommands. When used, this flag makes the operations
kicked off by these subcommands synchronous instead of asynchronous.

This functionality is implemented using a new ioctl. The type of
activity to wait for is provided as input to the ioctl, and the ioctl
blocks until all activity of that type has completed. An ioctl was used
over other methods of kernel-userspace communiction primarily for the
sake of portability.

Porting Notes:
This is ported from Delphix OS change DLPX-44432. The following changes
were made while porting:

 - Added ZoL-style ioctl input declaration.
 - Reorganized error handling in zpool_initialize in libzfs to integrate
   better with changes made for TRIM support.
 - Fixed check for whether a checkpoint discard is in progress.
   Previously it also waited if the pool had a checkpoint, instead of
   just if a checkpoint was being discarded.
 - Exposed zfs_initialize_chunk_size as a ZoL-style tunable.
 - Updated more existing tests to make use of new 'zpool wait'
   functionality, tests that don't exist in Delphix OS.
 - Used existing ZoL tunable zfs_scan_suspend_progress, together with
   zinject, in place of a new tunable zfs_scan_max_blks_per_txg.
 - Added support for a non-integral interval argument to zpool wait.

Future work:
ZoL has support for trimming devices, which Delphix OS does not. In the
future, 'zpool wait' could be extended to add the ability to wait for
trim operations to complete.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
Closes #9162
2019-09-13 18:09:06 -07:00
Chengfei ZHu 7238cbd4d3 QAT related bug fixes
1. Fix issue:  Kernel BUG with QAT during decompression  #9276.
   Now it is uninterruptible for a specific given QAT request,
   but Ctrl-C interrupt still works in user-space process.

2. Copy the digest result to the buffer only when doing encryption,
   and vise-versa for decryption.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chengfei Zhu <chengfeix.zhu@intel.com>
Closes #9276 
Closes #9303
2019-09-12 13:33:44 -07:00
Ryan Moeller 4f342e45be Canonicalize Python shebangs
/usr/bin/env python3 is the suggested[1] shebang for Python in general
(likewise for python2) and is conventional across platforms. This eases
development on systems where python is not installed in /usr/bin
(FreeBSD for example) and makes it possible to develop in virtual
environments (venv) for isolating dependencies.

Many packaging guidelines discourage the use of /usr/bin/env, but since
this is the canonical way of writing shebangs in the Python community,
many packaging scripts are already equipped to handle substituting the
appropriate absolute path to python automatically.

Some RPM package builders lacking brp-mangle-shebangs need a small
fallback mechanism in the package spec to stamp the appropriate shebang
on installed Python scripts.

[1]: https://docs.python.org/3/using/unix.html?#miscellaneous

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9314
2019-09-12 13:32:32 -07:00
Matthew Macy b01a6574ae Move objnode handling to common code
objnode is OS agnostic and used only by dmu_redact.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9315
2019-09-12 13:31:09 -07:00
Matthew Macy 74756182d2 Enable compiler to typecheck logging
Annotate spa logging declarations with printflike
Workaround gcc bug (non disable-able warning) by
replacing "" with " "

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9316
2019-09-12 13:28:26 -07:00
Matthew Macy d66620681d OpenZFS restructuring - move linux tracing code to platform directories
Move Linux specific tracing headers and source to platform directories
and update the build system.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9290
2019-09-11 14:25:53 -07:00
Tom Caputi 5815f7ac30 Fix stalled txg with repeated noop scans
Currently, the DSL scan code figures out when it should suspend
processing and allow a txg to continue by calling the function
dsl_scan_check_suspend(). Unfortunately, this function only
allows the scan to suspend at a level 0 block. In the event that
the system is scanning a bunch of empty snapshots or a resilver
is running with a high enough scn_cur_min_txg, the scan will
stop processing each dataset at the root level, deciding it
has nothing left to do. This means that the check_suspend
function is never called and the txg remains stuck until a
dataset is found that has data to scan.

This patch fixes the problem by allowing scans to suspend at
the root level of the objset. For backwards compatibility, we
use the bookmark <objsetid, 0, 0, 0> when we suspend here so
that older versions of the code will work as intended.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9300
2019-09-11 11:16:48 -07:00
Brian Behlendorf 5b51c15861 kmodtool: depmod path
Determine the location of depmod on the system, either /sbin/depmod or
/usr/sbin/depmod.  Then use that path when generating the specfile.

Additionally, update the Requires lines to reference the package which
provides depmod rather than the binary itself.  For CentOS/RHEL 7+8
and all supported Fedora releases this is the kmod package, and for
CentOS/RHEL 6 it is the module-init-tools package.

Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8724
Closes #9310
2019-09-11 11:14:50 -07:00
Brian Behlendorf 490e23cdf4 copy-builtin: SPL must be in Kbuild first (again)
Commit bced7e3 accidentally reintroduced issue #7595 which was
previously addressed by 517d247.  Re-apply the original fix to
resolve the issue and include a comment to make it clear the
ordering is important.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matthew Thode <prometheanfire@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9302
Closes #9208
2019-09-11 11:09:50 -07:00
Brian Behlendorf 25f06d677a Fix /etc/hostid on root pool deadlock
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9256 
Closes #9285
2019-09-10 13:42:30 -07:00
Ryan Moeller 562e1c0327 Add/generalize abstractions in arc_summary3
Code for interfacing with procfs for kstats and tunables is Linux-
specific. A more generic interface can be used for the abstractions of
loading kstats and various tunable parameters, allowing other platforms
to implement the functions cleanly. In a similar vein, determining the
ZFS/SPL version can be abstracted away in order for other platforms to
provide their own implementations of this function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9279
2019-09-10 13:27:53 -07:00
Ryan Moeller 6122948b3b Add/generalize abstraction in arc_summary2
A more generic interface can be used for the abstraction of loading
kstats, allowing other platforms to implement the function cleanly.

In a similar vein, loading tunables can be abstracted away in order for
other platforms to provide their own implementations of this function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9277
2019-09-10 12:17:54 -07:00
Brian Behlendorf b88ca2acf5 Enable SIMD for encryption
When adding the SIMD compatibility code in e5db313 the decryption of a
dataset wrapping key was left in a user thread context.  This was done
intentionally since it's a relatively infrequent operation.  However,
this also meant that the encryption context templates were initialized
using the generic operations.  Therefore, subsequent encryption and
decryption operations would use the generic implementation even when
executed by an I/O pipeline thread.

Resolve the issue by initializing the context templates in an I/O
pipeline thread.  And by updating zio_do_crypt_uio() to dispatch any
encryption operations to a pipeline thread when called from the user
context.  For example, when performing a read from the ARC.

Tested-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9215
Closes #9296
2019-09-10 10:45:46 -07:00
John Wren Kennedy b63e2d881f ZTS: Introduce targeted corruption in file blocks
filetest_001_pos verifies that various checksum algorithms detect
corruption by overwriting the underlying vdev on which a file resides.
It is possible for the overwrite to miss the blocks of a file, causing a
spurious failure. This change introduces a function to corrupt the
individual blocks of a file as determined by zdb.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes #9288
2019-09-09 16:11:07 -07:00
Ryan Moeller 43f4495bde Clean up do_vol_test in zfs_copies tests
Get rid of the `get_used_prop` function. `get_prop used` works fine.

Fix the comment describing the function parameters. The type does not
have a default, and mntp is also used for ext2.

Rename the variable for the number of copies from `copy` to `copies`.

Use a `case` statement to match the type parameter, order the cases
alphabetically, and add a little sanity checking for good measure.

Use eval to make sure the output of commands is silenced rather than
the log messages when redirecting output to /dev/null.

Simplify cases where zfs requires special behavior.

Don't allow the test to loop forever in the event space usage does not
change. Bail out of the loop and fail after an arbitrary number of
iterations.

Add more information to the log message when the test fails, to help
debugging.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9286
2019-09-09 16:04:05 -07:00
Olaf Faaland 4bbf0477a7 BuildRequires libtirpc-devel needed for RHEL 8
Building against RHEL 8 requires libtirpc-devel, as with fedora 28.
Add rhel8 and centos8 options to the test, to account for that.

BuildRequires Originally added for fedora 28 via commit
1a62a305be

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #9289
2019-09-06 11:30:07 -07:00
Matthew Macy bced7e3aaa OpenZFS restructuring - move platform specific sources
Move platform specific Linux source under module/os/linux/
and update the build system accordingly.  Additional code
restructuring will follow to make the common code fully
portable.
    
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Macy <mmacy@FreeBSD.org>
Closes #9206
2019-09-06 11:26:26 -07:00
Tom Caputi 870e7a52c1 Fix noop receive of raw send stream
Currently, the noop receive code fails to work with raw send streams
and resuming send streams. This happens because zfs_receive_impl()
reads the DRR_BEGIN payload without reading the payload itself.
Normally, the kernel expects to read this itself, but in this case
the recv_skip() code runs instead and it is not prepared to handle
the stream being left at any place other than the beginning of a
record.

This patch resolves this issue by manually reading the DRR_BEGIN
payload in the dry-run case. This patch also includes a number of
small fixups in this code path.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9221
Closes #9173
2019-09-05 16:22:05 -07:00
Ryan Moeller 8e2c502cf3 Clean up zfs_clone_010_pos
Remove a lot of unnecessary setting and incrementing of `i`.

Remove unused variable `j`.

Instead of calling out to Python in a loop to generate the same string
repeatedly, generate the string once using shell constructs before
entering the loop.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9284
2019-09-05 16:20:09 -07:00
Matthew Macy 03fdcb9adc Make module tunables cross platform
Adds ZFS_MODULE_PARAM to abstract module parameter
setting to operating systems other than Linux.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9230
2019-09-05 14:49:49 -07:00
Serapheim Dimitropoulos 65a91b166e metaslab_verify_weight_and_frag() shouldn't cause side-effects
`metaslab_verify_weight_and_frag()` a verification function and
by the end of it there shouldn't be any side-effects.

The function calls `metaslab_weight()` which in turn calls
`metaslab_set_fragmentation()`. The latter can dirty and otherwise
not dirty metaslab fro the next TXGand set `metaslab_condense_wanted`
if the spacemaps were just upgraded (meaning we just enabled the
SPACEMAP_HISTOGRAM feature through upgrade).

This patch adds a new flag as a parameter to `metaslab_weight()` and
`metaslab_set_fragmentation()` making the dirtying of the metaslab
optional.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9185 
Closes #9282
2019-09-05 09:57:55 -07:00
Ryan Moeller 240c015ac6 Refactor checksum operations in tests
md5sum in particular but also sha256sum to a lesser extent is used
in several areas of the test suite for computing checksums. The vast
majority of invocations are followed by `| awk '{ print $1 }'`.

Introduce functions to wrap up `md5sum $file | awk '{ print $1 }'` and
likewise for sha256sum. These also serve as a convenient interface for
alternative implementations on other platforms.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9280
2019-09-05 09:51:59 -07:00
Matthew Macy 006e9a4088 OpenZFS restructuring - move platform specific headers
Move platform specific Linux headers under include/os/linux/.
Update the build system accordingly to detect the platform.
This lays some of the initial groundwork to supporting building
for other platforms.

As part of this change it was necessary to create both a user
and kernel space sys/simd.h header which can be included in
either context.  No functional change, the source has been
refactored and the relevant #include's updated.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Matthew Macy <mmacy@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9198
2019-09-05 09:34:54 -07:00
loli10K d02186ee2b Fix zpool subcommands error message with some unsupported options
Both 'detach' and 'online' zpool subcommands, when provided with an
unsupported option, forget to print it in the error message:

   # zpool online -t rpool vda3
   invalid option ''
   usage:
      online [-e] <pool> <device> ...

This changes fixes the error message in order to include the actual
option that is not supported.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9270
2019-09-04 13:36:25 -07:00
loli10K a49dbbbe06 Fix zfs-dkms .deb package warning in prerm script
Debian zfs-dkms package generated by alien doesn't call the prerm script
(rpm's %preun) with an integer as first parameter, which results in the
following warning when the package is uninstalled:

   "zfs-dkms.prerm: line 3: [: remove: integer expression expected"

Modify the if-condition to avoid the warning.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9271
2019-09-03 15:20:39 -07:00
Ryan Moeller a6a2877774 Use the right booleans
TRUE and FALSE happen to be defined, but we should use B_TRUE and
B_FALSE for the sake of consistency.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9264
2019-09-03 12:44:08 -07:00
Igor K e242b67cee Fix panic on DilOS with kstat per dataset statistics
Account for ZFS_MAX_DATASET_NAME_LEN in kstat data size.  This value
is ignored in the Linux kstat code but resolves the issue for other
platforms.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9254 
Closes #9151
2019-09-03 12:12:31 -07:00
Pavel Zakharov a91e4790a6 zvol_wait script should ignore partially received zvols
Partially received zvols won't have links in /dev/zvol.

Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Closes #9260
2019-09-03 11:29:52 -07:00
Andriy Gapon ebeb6f23bf Always refuse receving non-resume stream when resume state exists
This fixes a hole in the situation where the resume state is left from
receiving a new dataset and, so, the state is set on the dataset itself
(as opposed to %recv child).

Additionally, distinguish incremental and resume streams in error
messages.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #9252
2019-09-03 10:56:55 -07:00
Igor K 1a504d27df ZTS: Fix removal_cancel.ksh
Create a larger file to extend the time required to perform the 
removal.  Occasional failures were observed due to the removal
completing before the cancel could be requested.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9259
2019-09-03 10:46:40 -07:00
loli10K 6988f3ed9a Fix Intel QAT / ZFS compatibility on v4.7.1+ kernels
This change use the compat code introduced in 9cc1844a.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9268 
Closes #9269
2019-09-03 10:36:33 -07:00
George Wilson 1e52716257 maxinflight can overflow in spa_load_verify_cb()
When running on larger memory systems, we can overflow the value of
maxinflight. This can result in maxinflight having a value of 0 causing
the system to hang.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <george.wilson@delphix.com>
Closes #9272
2019-09-02 19:17:51 -07:00
Andrea Gelmini a57c82fc50 Fix typos
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9251
2019-09-02 18:17:39 -07:00
Andrea Gelmini c6e457dffb Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9250
2019-09-02 18:14:53 -07:00
Andrea Gelmini cb14aa4ca9 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9249
2019-09-02 18:13:19 -07:00
Andrea Gelmini 4001f09055 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9247
2019-09-02 18:12:01 -07:00
Andrea Gelmini 220dd4ae84 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9246
2019-09-02 18:10:31 -07:00
Andrea Gelmini 24739cd5b0 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9244
2019-09-02 18:08:56 -07:00
Andrea Gelmini 37e42197a4 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9243
2019-09-02 18:07:35 -07:00
Andrea Gelmini ade306a9d4 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9242
2019-09-02 17:58:26 -07:00
Andrea Gelmini e1cfd73f7f Fix typos in module/zfs/
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9240
2019-09-02 17:56:41 -07:00
Andrea Gelmini 7859537768 Fix typos in lib/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9237
2019-09-02 17:53:27 -07:00
Andrea Gelmini c953960048 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9248
2019-08-30 16:53:48 -07:00
Andrea Gelmini b520706f29 Fix typos in tests/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9245
2019-08-30 16:52:00 -07:00
Andrea Gelmini 9f5c1bc609 Fix typos in module/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9241
2019-08-30 14:32:18 -07:00
Andrea Gelmini 9d40bdf414 Fix typos in modules/icp/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9239
2019-08-30 14:26:07 -07:00
Andrea Gelmini cf7c5a030e Fix typos in include/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9238
2019-08-30 09:53:15 -07:00
Andrea Gelmini 0463c95501 Fix typos in etc/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9236
2019-08-30 09:46:52 -07:00
Andrea Gelmini cd6b910b64 Fix typos in contrib/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9235
2019-08-30 09:44:43 -07:00
Andrea Gelmini ad0b23b14a Fix typos in cmd/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9234
2019-08-30 09:43:30 -07:00
Andrea Gelmini ac3d4d0cf6 Fix typos in man/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9233
2019-08-30 09:41:35 -07:00
Andrea Gelmini 2b96f77423 Fix typos in config/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9232
2019-08-30 09:40:30 -07:00
Igor K d39c71d365 Fix refquota_007_neg.ksh
Must use 'zfs' instead of '$ZFS' which is undefined.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9257
2019-08-30 09:32:25 -07:00
Paul Dagnelie 475aa97cab Prevent metaslab_sync panic due to spa_final_dirty_txg
If a pool enables the SPACEMAP_HISTOGRAM feature shortly before being
exported, we can enter a situation that causes a kernel panic. Any metaslabs
that are loaded during the final dirty txg and haven't already been condensed
will cause metaslab_sync to proceed after the final dirty txg so that the
condense can be performed, which there are assertions to prevent. Because of
the nature of this issue, there are a number of ways we can enter this
state. Rather than try to prevent each of them one by one, potentially missing
some edge cases, we instead cut it off at the point of intersection; by
preventing metaslab_sync from proceeding if it would only do so to perform a
condense and we're past the final dirty txg, we preserve the utility of the
existing asserts while preventing this particular issue.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9185
Closes #9186
Closes #9231
Closes #9253
2019-08-30 09:28:31 -07:00
Georgy Yakovlev e2fcfa70e3 etc/init.d/zfs-functions.in: remove arch warning
Remove the x86_64 warning, it's no longer the case that this is the
only supported architecture.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes: #9177
2019-08-29 13:14:48 -07:00
Ryan Moeller 815a645692 Simplify deleting partitions in libtest
Eliminate unnecessary code duplication. We can use a for-loop instead
of a while-loop. There is no need to echo $DISKSARRAY in a subshell or
return 0. Declare all variables with typeset.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9224
2019-08-29 13:11:29 -07:00
Ryan Moeller f66ad580cc Use compatible arg order in tests
BSD getopt() and getopt_long() want options before arguments.
Reorder arguments to zfs/zpool in tests to put all the options first.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9228
2019-08-29 11:03:09 -07:00
Paul Dagnelie eef0f4d84e Keep more metaslabs loaded
With the other metaslab changes loaded onto a system, we can 
significantly reduce the memory usage of each loaded metaslab and 
unload them on demand if there is memory pressure. However, none 
of those changes actually result in us keeping more metaslabs loaded. 
If we don't keep more metaslabs loaded, we will still have to wait 
for demand-loading to finish when no loaded metaslab can satisfy our 
allocation, which can cause ZIL performance issues. In addition,
performance is traditionally measured by IOs per unit time, while 
unloading is currently done on a txg-count basis. Txgs can take a 
widely varying range of times, from tenths of a second to several 
seconds. This can result in confusing, hard to predict behavior.

This change simply adds a time-based component to metaslab unloading. 
A metaslab will remain loaded for one minute and 8 txgs (by default) 
after it was last used, unless it is evicted due to memory pressure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
External-issue: DLPX-65016
External-issue: DLPX-65047
Closes #9197
2019-08-29 10:20:36 -07:00
Pavel Zakharov e6cebbf86e zfs_handle used after being closed/freed in change_one callback
This is a typical case of use after free. We would call zfs_close(zhp) 
which would free the handle, and then call zfs_iter_children() on that 
handle later.  This change ensures that the zfs_handle is only closed 
when we are ready to return.

Running `zfs inherit -r sharenfs pool` was failing with an error
code without any error messages. After some debugging I've pinpointed 
the issue to be memory corruption, which would cause zfs to try to 
issue an ioctl to the wrong device and receive ENOTTY.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Issue #7967 
Closes #9165
2019-08-28 15:02:58 -07:00
Tony Nguyen 8d04284281 Use smaller default slack/delta value for schedule_hrtimeout_range()
For interrupt coalescing, cv_timedwait_hires() uses a 100us slack/delta
for calls to schedule_hrtimeout_range(). This 100us slack can be costly
for small writes.

This change improves small write performance by passing resolution `res`
parameter to schedule_hrtimeout_range() to be used as delta/slack. A new
tunable `spl_schedule_hrtimeout_slack_us` is added to preserve old
behavior when desired.

Performance observations on 8K recordsize filesystem:
- 8K random writes at 1-64 threads, up to 60% improvement for one thread
  and smaller gains as thread count increases. At >64 threads, 2-5%
  decrease in performance was observed.
- 8K sequential writes, similar 60% improvement for one thread and
  leveling out around 64 threads. At >64 threads, 5-10% decrease in
  performance was observed.
- 128K sequential write sees 1-5 for the 128K. No observed regression at
  high thread count.

Testing done on Ubuntu 18.04 with 4.15 kernel, 8vCPUs and SSD storage on
VMware ESX.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #9217
2019-08-28 14:56:54 -07:00
Brian Behlendorf 07a328dde4 ZTS: Temporarily disable several upgrade tests
Until issues #9185 and #9186 have been resolved the following zpool
upgrade tests are being disabled to prevent CI failures.

  zpool_upgrade_002_pos,
  zpool_upgrade_003_pos,
  zpool_upgrade_004_pos,
  zpool_upgrade_007_pos,
  zpool_upgrade_008_pos

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #9185
Issue #9186
Closes #9225
2019-08-28 14:52:08 -07:00
Don Brady 28c91ab66d Tag ABD pages for exclusion in kernel crash dumps
Tag the ABD data pages so that they can be identified for exclusion 
from kernel crash dumps. Eliminating the zfs file data allows for 
significantly smaller crash dump files. Note that ZFS in illumos has 
always excluded the zfs data pages from a kernel crash dump.

This change tags ARC scatter data pages so they can be identified from 
the makedumpfile(8) command. That command is used to create smaller 
dump files by ignoring some memory regions and using compression. It 
already filters file data from the VFS page cache and will now be able 
to exclude ZFS file data pages from the dump file.

A corresponding change to makeumpfile(8) is required to identify ZFS 
data pages.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #8899
2019-08-28 10:44:46 -07:00
Chunwei Chen 035e96118b Fix zil replay panic when TX_REMOVE followed by TX_CREATE
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.

We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7151
Closes #8910
Closes #9123
Closes #9145
2019-08-28 10:42:02 -07:00
Ryan Moeller 9c9dcd6e04 Prefer for (;;) to while (TRUE)
Defining a special constant to make an infinite loop is excessive,
especially when the name clashes with symbols commonly defined on
some platforms (ie FreeBSD).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9219
2019-08-28 10:38:40 -07:00
Andriy Gapon e6203d288a zfs_ioc_snapshot: check user-prop permissions on snapshotted datasets
Previously, the permissions were checked on the pool which was obviously
incorrect.

After this change, zfs_check_userprops() only validates the properties
without any permission checks.  The permissions are checked individually
for each snapshotted dataset.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes #9179 
Closes #9180
2019-08-27 13:45:53 -07:00
Richard Allen f335b8ffe1 Fix Plymouth passphrase prompt in initramfs script
Entering the ZFS encryption passphrase under Plymouth wasn't working
because in the ZFS initrd script, Plymouth was calling zfs via
"--command", which wasn't passing through the filesystem argument to
zfs load-key properly (it was passing through the single quotes around
the filesystem name intended to handle spaces literally,
which zfs load-key couldn't understand).

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Signed-off-by: Richard Allen <belperite@gmail.com>
Issue #9193
Closes #9202
2019-08-27 13:44:02 -07:00
Tom Caputi e7a2fa70c3 Fix deadlock in 'zfs rollback'
Currently, the 'zfs rollback' code can end up deadlocked due to
the way the kernel handles unreferenced inodes on a suspended fs.
Essentially, the zfs_resume_fs() code path may cause zfs to spawn
new threads as it reinstantiates the suspended fs's zil. When a
new thread is spawned, the kernel may attempt to free memory for
that thread by freeing some unreferenced inodes. If it happens to
select inodes that are a a part of the suspended fs a deadlock
will occur because freeing inodes requires holding the fs's
z_teardown_inactive_lock which is still held from the suspend.

This patch corrects this issue by adding an additional reference
to all inodes that are still present when a suspend is initiated.
This prevents them from being freed by the kernel for any reason.

Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9203
2019-08-27 09:55:51 -07:00
Ryan Moeller 142f84dd19 Restore :: in Makefile.am
The double-colon looked like a typo, but it's actually an obscure
feature. Rules with :: may appear multiple times and are run
independently of one another in the order they appear. The use of ::
for distclean-local was conventional, not accidental.

Add comments to indicate the intentional use of double-colon rules.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9210
2019-08-26 11:48:31 -07:00
Paul Dagnelie 95f0144675 Add regression test for "zpool list -p"
Other than this test, zpool list -p is not well tested by any of the 
automated tests.  Add a test for zpool list -p.

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9134
2019-08-25 18:33:03 -07:00
Ryan Moeller a18f8bce5c Split argument list, satisfy shellcheck SC2086
Split the arguments for ${TEST_RUNNER} across multiple lines for
clarity. Also added quotes in the message to match the invoked command.

Unquoted variables in argument lists are subject to splitting. In this
particular case we can't quote the variable because it is an optional
argument. Use the method suggested in the description linked below,
instead.

The technique is to use an unquoted variable with an alternate value.

https://github.com/koalaman/shellcheck/wiki/SC2086

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9212
2019-08-25 18:30:39 -07:00
Brian Behlendorf 4302698be1 ZTS: Fix in-tree dbufstats test case
Commit a887d653 updated the dbufstats such that escalated privileges
are required.  Since all tests under cli_user are run with normal
privileges move this test case to a location where it will be run
required privileges.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9118 
Closes #9196
2019-08-22 17:37:48 -07:00
Brian Behlendorf d1d1f8c37d Fix install error introduced by #9089 (#9205)
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
2019-08-22 17:31:38 -07:00
Ryan Moeller 97c54ea818 Make slog test setup more robust
The slog tests fail when attempting to create pools using file vdevs
that already exist from previous test runs. Remove these files in the
setup for the test.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9194
2019-08-22 17:26:50 -07:00
yshui 19d61d63fa zfs-mount-genrator: dependencies should be space-separated
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Closes #9174
2019-08-22 17:11:17 -07:00
Paul Dagnelie d1484fb189 Fix install error introduced by #9089
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
2019-08-22 12:01:41 -07:00
Brian Behlendorf 31b548ffb9 ZTS: Use decimal values when setting tunables
The mdb_set_uint32 function requires that the values passed in be
decimal.  This was overlooked initially because the matching Linux
function accepts both decimal and hexadecimal values.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #9125
Closes #9195
2019-08-22 10:36:57 -07:00
Brian Behlendorf a3ba6e589f Fix automake program name transformations (#9190)
Automake can perform program name transformations at install time.
However, arc_summary has its own name transformation taking place,
which interferes with the automake transforms. The automake transforms
must be taken into account in order to resolve the conflict.

Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
2019-08-22 09:53:45 -07:00
Mauricio Faria de Oliveira 2f74950c5e Document ZFS_DKMS_ENABLE_DEBUGINFO in userland configuration
Document the ZFS_DKMS_ENABLE_DEBUGINFO option in the userland
configuration file, as done with the other ZFS_DKMS_* options.

It has been introduced with commit e45c1734a6 ("dkms: Enable
debuginfo option to be set with zfs sysconfig file") but isn't
mentioned anywhere other than the 'dkms.conf' file (generated).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Closes #9191
2019-08-22 09:48:48 -07:00
Ryan Moeller 0154a1e539 Dedup IOC enum values in libzfs_input_check
Reuse enum value ZFS_IOC_BASE for `('Z' << 8)`.
This is helpful on FreeBSD where ZFS_IOC_BASE has a different value and
`('Z' << 8)` is wrong.

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9188
2019-08-22 09:46:09 -07:00
Ryan Moeller f591a581d6 Enhance ioctl number checks
When checking ZFS_IOC_* numbers, print which numbers are wrong rather
than silently failing.

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9187
2019-08-22 09:44:11 -07:00
Brian Behlendorf 20f7b917aa ZTS: Fix vdev_zaps_005_pos on CentOS 6
The ancient version of blkid (v2.17.2) used in CentOS 6 will not
detect the newly created pool unless it has been written to.
Force a pool sync so `zpool import` will detect the newly created
pool.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9199
2019-08-22 08:53:44 -07:00
Tony Hutter a9ebdfdd43 Linux 5.3: Fix switch() fall though compiler errors
Fix some switch() fall-though compiler errors:

    abd.c:1504:9: error: this statement may fall through

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #9170
2019-08-21 09:29:23 -07:00
Ryan Moeller f66a1f88fb Minor cleanup in Makefile.am
Split long lines where adding license info to dist archive.

Remove extra colon from target line.

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9189
2019-08-21 09:01:59 -07:00
Alexey Smirnoff c759b33a51 zfs-functions.in: in_mtab() always returns 1
$fs used with the wrong sed command where should be $mntpnt instead
to match a variable exported by read_mtab()

The fix is mostly to reuse the sed command found in read_mtab()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Alexey Smirnoff <fling@member.fsf.org>
Closes #9168
2019-08-20 16:26:19 -07:00
Ryan Moeller 92a9e1da60 Fix automake program name transformations
Automake can perform program name transformations at install time.
However, arc_summary has its own name transformation taking place,
which interferes with the automake transforms. The automake transforms
must be taken into account in order to resolve the conflict.

Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
2019-08-20 17:46:40 -04:00
Matthew Ahrens 325d288c5d Add fast path for zfs_ioc_space_snaps() handling of empty_bpobj
When there are many snapshots, calls to zfs_ioc_space_snaps() (e.g. from
`zfs destroy -nv pool/fs@snap1%snap10000`) can be very slow, resulting
in poor performance because we are holding the dp_config_rwlock the
entire time, blocking spa_sync() from continuing.  With around ten
thousand snapshots, we've seen up to 500 seconds in this ioctl,
iterating over up to 50,000,000 bpobjs, ~99% of which are the empty
bpobj.

By creating a fast path for zfs_ioc_space_snaps() handling of the
empty_bpobj, we can achieve a ~5x performance improvement of this ioctl
(when there are many snapshots, and the deadlist is mostly
empty_bpobj's).

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-58348
Closes #8744
2019-08-20 11:34:52 -07:00
jdike 3beb0a7694 Fix lockdep circular locking false positive involving sa_lock
There are two different deadlock scenarios, but they share a common
link, which is
thread 1 holding sa_lock and trying to get zap->zap_rwlock:
    zap_lockdir_impl+0x858/0x16c0 [zfs]
    zap_lockdir+0xd2/0x100 [zfs]
    zap_lookup_norm+0x7f/0x100 [zfs]
    zap_lookup+0x12/0x20 [zfs]
    sa_setup+0x902/0x1380 [zfs]
    zfsvfs_init+0x3d6/0xb20 [zfs]
    zfsvfs_create+0x5dd/0x900 [zfs]
    zfs_domount+0xa3/0xe20 [zfs]

and thread 2 trying to get sa_lock, either in sa_setup:
   sa_setup+0x742/0x1380 [zfs]
   zfsvfs_init+0x3d6/0xb20 [zfs]
   zfsvfs_create+0x5dd/0x900 [zfs]
   zfs_domount+0xa3/0xe20 [zfs]
or in sa_build_index:
   sa_build_index+0x13d/0x790 [zfs]
   sa_handle_get_from_db+0x368/0x500 [zfs]
   zfs_znode_sa_init.isra.0+0x24b/0x330 [zfs]
   zfs_znode_alloc+0x3da/0x1a40 [zfs]
   zfs_zget+0x39a/0x6e0 [zfs]
   zfs_root+0x101/0x160 [zfs]
   zfs_domount+0x91f/0xea0 [zfs]

From there, there are different locking paths back to something
holding zap->zap_rwlock.

The deadlock scenarios involve multiple different ZFS filesystems
being mounted.  sa_lock is common to these scenarios, and the sa
struct involved is private to a mount.  Therefore, these must be
referring to different sa_lock instances and these deadlocks can't
occur in practice.

The fix, from Brian Behlendorf, is to remove sa_lock from lockdep
coverage by initializing it with MUTEX_NOLOCKDEP.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jeff Dike <jdike@akamai.com>
Closes #9110
2019-08-19 16:04:26 -07:00
Dominic Pearson ff4b68eedc Linux 5.3 compat: Makefile subdir-m no longer supported
Uses obj-m instead, due to kernel changes.

See LKML: Masahiro Yamada, Tue, 6 Aug 2019 19:03:23 +0900

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Dominic Pearson <dsp@technoanimal.net>
Closes #9169
2019-08-19 15:22:52 -07:00
colmbuckley f6fbe25664 Set "none" scheduler if available (initramfs)
Existing zfs initramfs script logic will attempt to set the 'noop' 
scheduler if it's available on the vdev block devices. Newer kernels 
have the similar 'none' scheduler on multiqueue devices; this change 
alters the initramfs script logic to also attempt to set this scheduler 
if it's available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes #9042
2019-08-19 15:11:46 -07:00
Paul Dagnelie 1a26cb6160 Add more refquota tests
It used to be possible for zfs receive (and other operations related 
to clone swap) to bypass refquotas. This can cause a number of issues, 
and there should be an automated test for it.

Added tests for rollback and receive not overriding refquota.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9139
2019-08-19 15:06:53 -07:00
Paul Dagnelie f09fda5071 Cap metaslab memory usage
On systems with large amounts of storage and high fragmentation, a huge 
amount of space can be used by storing metaslab range trees. Since 
metaslabs are only unloaded during a txg sync, and only if they have 
been inactive for 8 txgs, it is possible to get into a state where all 
of the system's memory is consumed by range trees and metaslabs, and 
txgs cannot sync. While ZFS knows how to evict ARC data when needed, 
it has no such mechanism for range tree data. This can result in boot 
hangs for some system configurations.

First, we add the ability to unload metaslabs outside of syncing 
context. Second, we store a multilist of all loaded metaslabs, sorted 
by their selection txg, so we can quickly identify the oldest 
metaslabs.  We use a multilist to reduce lock contention during heavy 
write workloads. Finally, we add logic that will unload a metaslab 
when we're loading a new metaslab, if we're using more than a certain 
fraction of the available memory on range trees.

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9128
2019-08-16 09:08:21 -06:00
Michael Niewöhner 9323aad14d initramfs: fixes for (debian) initramfs
* contrib/initramfs: include /etc/default/zfs and /etc/zfs/zfs-functions
At least debian needs /etc/default/zfs and /etc/zfs/zfs-functions for
its initramfs. Include both in build when initramfs is configured.

* contrib/initramfs: include 60-zvol.rules and zvol_id
Include 60-zvol.rules and zvol_id and set udev as predependency instead
of debians zdev. This makes debians additional zdev hook unneeded.

* Correct initconfdir substitution for some distros
Not every Linux distro is using @sysconfdir@/default but @initconfdir@
which is already determined by configure. Let's use it.

* systemd: prevent possible conflict between systemd and sysvinit
Systemd will not load a sysvinit service if a unit exists with the same
name. This prevents conflicts between sysvinit and systemd.
In ZFS there is one sysvinit service that does not have a systemd
service but a target counterpart, zfs-import.target.
Usually it does not make any sense to install both but it is possisble.
Let's prevent any conflict by masking zfs-import.service by default.
This does not harm even if init.d/zfs-import does not exist.

Reviewed-by: Chris Wedgwood <cw@f00f.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-by: Alex Ingram <reimu@reimuhakurei.net>
Tested-by: Dreamcat4 <dreamcat4@gmail.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #7904 
Closes #9089
2019-08-16 09:02:32 -06:00
Serapheim Dimitropoulos 0f8ff49eb6 dmu_tx_wait() hang likely due to cv_signal() in dsl_pool_dirty_delta()
Even though the bug's writeup (Github issue #9136) is very detailed,
we still don't know exactly how we got to that state, thus I wasn't
able to reproduce the bug. That said, we can make an educated guess
combining the information on filled issue with the code.

From the fact that `dp_dirty_total` was 0 (which is less than
`zfs_dirty_data_max`) we know that there was one thread that set it to
0 and then signaled one of the waiters of `dp_spaceavail_cv` [see
`dsl_pool_dirty_delta()` which is also the only place that
`dp_dirty_total` is changed].  Thus, the only logical explaination
then for the bug being hit is that the waiter that just got awaken
didn't go through `dsl_pool_dirty_data()`. Given that this function
is only called by `dsl_pool_dirty_space()` or `dsl_pool_undirty_space()`
I can only think of two possible ways of the above scenario happening:

[1] The waiter didn't call into any of the two functions - which I
    find highly unlikely (i.e. why wait on `dp_spaceavail_cv` to begin
    with?).
[2] The waiter did call in one of the above function but it passed 0 as
    the space/delta to be dirtied (or undirtied) and then the callee
    returned immediately (e.g both `dsl_pool_dirty_space()` and
    `dsl_pool_undirty_space()` return immediately when space is 0).

In any case and no matter how we got there, the easy fix would be to
just broadcast to all waiters whenever `dp_dirty_total` hits 0. That
said and given that we've never hit this before, it would make sense
to think more on why the above situation occured.

Attempting to mimic what Prakash was doing in the issue filed, I
created a dataset with `sync=always` and started doing contiguous
writes in a file within that dataset. I observed with DTrace that even
though we update the pool's dirty data accounting when we would dirty
stuff, the accounting wouldn't be decremented incrementally as we were
done with the ZIOs of those writes (the reason being that
`dbuf_write_physdone()` isn't be called as we go through the override
code paths, and thus `dsl_pool_undirty_space()` is never called). As a
result we'd have to wait until we get to `dsl_pool_sync()` where we
zero out all dirty data accounting for the pool and the current TXG's
metadata.

In addition, as Matt noted and I later verified, the same issue would
arise when using dedup.

In both cases (sync & dedup) we shouldn't have to wait until
`dsl_pool_sync()` zeros out the accounting data. According to the
comment in that part of the code, the reasons why we do the zeroing,
have nothing to do with what we observe:
````
/*
 * We have written all of the accounted dirty data, so our
 * dp_space_towrite should now be zero.  However, some seldom-used
 * code paths do not adhere to this (e.g. dbuf_undirty(), also
 * rounding error in dbuf_write_physdone).
 * Shore up the accounting of any dirtied space now.
 */
dsl_pool_undirty_space(dp, dp->dp_dirty_pertxg[txg & TXG_MASK], txg);
````

Ideally what we want to do is to undirty in the accounting exactly what
we dirty (I use the word ideally as we can still have rounding errors).
This would make the behavior of the system more clear and predictable.

Another interesting issue that I observed with DTrace was that we
wouldn't update any of the pool's dirty data accounting whenever we
would dirty and/or undirty MOS data. In addition, every time we would
change the size of a dbuf through `dbuf_new_size()` we wouldn't update
the accounted space dirtied in the appropriate dirty record, so when
ZIOs are done we would undirty less that we dirtied from the pool's
accounting point of view.

For the first two issues observed (sync & dedup) this patch ensures
that we still update the pool's accounting when we undirty data,
regardless of the write being physical or not.

For changes in the MOS, we first ensure to zero out the pool's dirty
data accounting in `dsl_pool_sync()` after we synced the MOS. Then we
can go ahead and enable the update of the pool's dirty data accounting
wheneve we change MOS data.

Another fix is that we now update the accounting explicitly for
counting errors in `dbuf_write_done()`.

Finally, `dbuf_new_size()` updates the accounted space of the
appropriate dirty record correctly now.

The problem is that we still don't know how the bug came up in the
issue filled. That said the issues fixed seem to be very relevant, so
instead of going with the broadcasting solution right away,
I decided to leave this patch as is.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
External-issue: DLPX-47285
Closes #9137
2019-08-15 17:53:53 -06:00
Tony Nguyen c8bbf7c00b Improve write performance by using dmu_read_by_dnode()
In zfs_log_write(), we can use dmu_read_by_dnode() rather than
dmu_read() thus avoiding unnecessary dnode_hold() calls.

We get a 2-5% performance gain for large sequential_writes tests, >=128K
writes to files with recordsize=8K.

Testing done on Ubuntu 18.04 with 4.15 kernel, 8vCPUs and SSD storage on
VMware ESX.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
Closes #9156
2019-08-15 17:36:24 -06:00
Serapheim Dimitropoulos 0e37a0f4f3 Assert that a dnode's bonuslen never exceeds its recorded size
This patch introduces an assertion that can catch pitfalls in
development where there is a mismatch between the size of
reads and writes between a *_phys structure and its respective
in-core structure when bonus buffers are used.

This debugging-aid should be complementary to the verification
done by ztest in ztest_verify_dnode_bt().

A side to this patch is that we now clear out any extra bytes
past a bonus buffer's new size when the buffer is shrinking.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #8348
2019-08-15 08:44:57 -06:00
Paul Zuchowski e2b31b58e8 Make txg_wait_synced conditional in zfsvfs_teardown
The call to txg_wait_synced in zfsvfs_teardown should
be made conditional on the objset having dirty data.
This can prevent unnecessary txg_wait_synced during
some unmount operations.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9115
2019-08-15 08:27:13 -06:00
Paul Dagnelie dc04a8c757 Prevent race in blkptr_verify against device removal
When we check the vdev of the blkptr in zfs_blkptr_verify, we can run 
into a race condition where that vdev is temporarily unavailable. This 
happens when a device removal operation and the old vdev_t has been 
removed from the array, but the new indirect vdev has not yet been 
inserted.

We hold the spa_config_lock while doing our sensitive verification. 
To ensure that we don't deadlock, we only grab the lock if we don't 
have config_writer held. In addition, I had to const the tags of the 
refcounts and the spa_config_lock arguments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9112
2019-08-13 21:24:43 -06:00
Chunwei Chen 8e556c5ebc Fix out-of-order ZIL txtype lost on hardlinked files
We should only call zil_remove_async when an object is removed. However,
in current implementation, it is called whenever TX_REMOVE is called. In
the case of hardlinked file, every unlink will generate TX_REMOVE and
causing operations to be dropped even when the object is not removed.

We fix this by only calling zil_remove_async when the file is fully
unlinked.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #8769
Closes #9061
2019-08-13 21:21:27 -06:00
Prakash Surya 475ebd763a Fix device expansion when VM is powered off
When running on an ESXi based VM, I've found that "zpool online -e" will
not expand the zpool, if the disk was expanded in ESXi while the VM was
powered off.

For example, take the following scenario:

 1. VM running on top of VMware ESXi
 2. ZFS pool created with a given device "sda" of size 8GB
 3. VM powered off
 4. Device "sda" size expanded to 16GB
 5. VM powered on
 6. "zpool online -e" used on device "sda"

In this situation, after (2) the zpool will be roughly 8GB in size.
After (6), the expectation is the zpool's size will expand to roughly
16GB in size; i.e. expand to the new size of the "sda" device.
Unfortunately, I've seen that after (6), the zpool size does not change.

What's happening is after (5), the EFI label of the "sda" device will be
such that fields "efi_last_u_lba", "efi_last_lba", and "efi_altern_lba"
all reflect the new size of the disk; i.e. "33554398", "33554431", and
"33554431" respectively.

Thus, the check that we perform in "efi_use_whole_disk":

    if ((efi_label->efi_altern_lba == 1) || (efi_label->efi_altern_lba
        >= efi_label->efi_last_lba)) {

This will return true, and then we return from the function without
having expanded the size of the zpool/device.

In contrast, if we remove steps (3) and (5) in the sequence above, i.e.
the device is expanded while the VM is powered on, things change. In
that case, the fields "efi_last_u_lba" and "efi_altern_lba" do not
change (i.e. they still reflect the old 8GB device size), but the
"efi_last_lba" field does change (i.e. it now reflects the new 16GB
device size). Thus, when we evaluate the same conditional in
"efi_use_whole_disk", it'll return false, so the zpool is expanded.

Taking all of this into account, this PR updates "efi_use_whole_disk" to
properly expand the zpool when the underlying disk is expanded while the
VM is powered off.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9111
2019-08-13 21:18:53 -06:00
Allan Jude d2a32912b9 Mark dsl_livelist_should_disable() static
This function is not used outside of dsl_dataset.c

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #9154
2019-08-13 21:16:23 -06:00
George Wilson c8242a96ba spa_load_verify() may consume too much memory
When a pool is imported it will scan the pool to verify the integrity 
of the data and metadata. The amount it scans will depend on the 
import flags provided. On systems with small amounts of memory or 
when importing a pool from the crash kernel, it's possible for 
spa_load_verify to issue too many I/Os that it consumes all the memory 
of the system resulting in an OOM message or a hang.

To prevent this, we limit the amount of memory that the initial pool
scan can consume. This change will, by default, use 1/16th of the ARC
for scan I/Os to prevent running the system out of memory during import.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: George Wilson george.wilson@delphix.com
External-issue: DLPX-65237
External-issue: DLPX-65238
Closes #9146
2019-08-13 08:11:57 -06:00
Tomohiro Kusumi a43570c5f3 Change boolean-like uint8_t fields in znode_t to boolean_t
Given znode_t is an in-core structure, it's more readable to have
them as boolean. Also co-locate existing boolean fields with them
for space efficiency (expecting 8 booleans to be packed/aligned).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9092
2019-08-13 07:58:02 -06:00
Richard Yao fccbd1d6e2 Drop KMC_NOEMERGENCY
This is not implemented. If it were implemented, using it would risk
deadlocks on pre-3.18 kernels. Lets just drop it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #9119
2019-08-13 07:46:12 -06:00
Serapheim Dimitropoulos 3b9edd7b17 Introduce getting holds and listing bookmarks through ZCP
Consumers of ZFS Channel Programs can now list bookmarks,
and get holds from datasets. A minor-refactoring was also
applied to distinguish between user and system properties
in ZCP.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Ported-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Dan Kimmel <dan.kimmel@delphix.com>

OpenZFS-issue: https://illumos.org/issues/8862
Closes #7902
2019-08-12 10:02:34 -07:00
Serapheim Dimitropoulos 2081db7982 Sort log spacemap tunables in alphabetical order
Beside the whole commit being a nit in reality it should
bring the diffs of the spa_log_spacemap.c source file
between ZoL and delphix/zfs to 0.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9143
2019-08-12 09:49:07 -07:00
Paul Dagnelie c81f1790e2 Metaslab max_size should be persisted while unloaded
When we unload metaslabs today in ZFS, the cached max_size value is
discarded. We instead use the histogram to determine whether or not we
think we can satisfy an allocation from the metaslab. This can result in
situations where, if we're doing I/Os of a size not aligned to a
histogram bucket, a metaslab is loaded even though it cannot satisfy the
allocation we think it can. For example, a metaslab with 16 entries in
the 16k-32k bucket may have entirely 16kB entries. If we try to allocate
a 24kB buffer, we will load that metaslab because we think it should be
able to handle the allocation. Doing so is expensive in CPU time, disk
reads, and average IO latency. This is exacerbated if the write being
attempted is a sync write.

This change makes ZFS cache the max_size after the metaslab is
unloaded. If we ever get a free (or a coalesced group of frees) larger
than the max_size, we will update it. Otherwise, we leave it as is. When
attempting to allocate, we use the max_size as a lower bound, and
respect it unless we are in try_hard. However, we do age the max_size
out at some point, since we expect the actual max_size to increase as we
do more frees. A more sophisticated algorithm here might be helpful, but
this works reasonably well.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9055
2019-08-05 14:34:27 -07:00
DeHackEd 99e755d653 Don't wakeup unnecessarily in 'zpool events -f'
ZED can prevent CPU's from properly sleeping.

Rather than periodically waking up in the zevents code, just go to sleep and wait for a wakeup.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #9091
2019-08-05 11:35:47 -07:00
Serapheim Dimitropoulos 8098465558 Test cancelling a removal in ZTS
This patch adds a new test that sanity checks cancelling a removal.

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9101
2019-08-05 10:50:20 -07:00
jdike 48be0dfba1 lockdep false positive - move txg_kick() outside of ->dp_lock
This fixes a lockdep warning by breaking a link between ->tx_sync_lock
and ->dp_lock.

The deadlock envisioned by lockdep is this:
    thread 1 holds db->db_mtx and tries to get dp->dp_lock:
	dsl_pool_dirty_space+0x70/0x2d0 [zfs]
	dbuf_dirty+0x778/0x31d0 [zfs]

    thread 2 holds bpo->bpo_lock and tries to get db->db_mtx:
        dmu_buf_will_dirty_impl
        dmu_buf_will_dirty+0x6b/0x6c0 [zfs]
        bpobj_iterate_impl+0xbe6/0x1410 [zfs]

    thread 3 holds tx->tx_sync_lock and tries to get bpo->bpo_lock:
        bpobj_space+0x63/0x470 [zfs]
        dsl_scan_active+0x340/0x3d0 [zfs]
        txg_sync_thread+0x3f2/0x1370 [zfs]

    thread 4 holds dp->dp_lock and tries to get tx->tx_sync_lock
       txg_kick+0x61/0x420 [zfs]
       dsl_pool_need_dirty_delay+0x1c7/0x3f0 [zfs]

This patch is orginally from Brian Behlendorf and slightly simplified
by me.

It breaks this cycle in thread 4 by moving the call from
dsl_pool_need_dirty_delay to txg_kick outside the section controlled
by dp->dp_lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Jeff Dike <jdike@akamai.com>
Closes #9094
2019-07-31 14:53:39 -07:00
Serapheim Dimitropoulos cae97c88c4 List log_spacemap feature in zpool-features.5 manual
Update zpool-features.5 manpage to describe the log_spacemap feature.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9096
2019-07-31 09:29:01 -07:00
Clint Armstrong f489458ad7 Add channel program for property based snapshots
Channel programs that many users find useful should be included with zfs
in the /contrib directory. This is the first of these contributions. A
channel program to recursively take snapshots of datasets with the
property com.sun:auto-snapshot=true.

Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clint Armstrong <clint@clintarmstrong.net>
Closes #8443 
Closes #9050
2019-07-30 16:02:19 -07:00
Serapheim Dimitropoulos 1ba4f3e7b4 9072 handle error of zap_cursor_retrieve() for log spacemap zap
In spa_ld_log_sm_metadata(), it is possible for zap_cursor_retrieve()
to return errors other than the expected ENOENT (e.g. when we are at
the end of the zap). Ensure that these error cases are handled
correctly by the import path.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9074
2019-07-30 13:20:01 -07:00
Serapheim Dimitropoulos 2fcf4481a6 mismerged log spacemap comment for metaslab_verify_weight_and_frag
When the log spacemap commit was merged in ZoL, the
metaslab_verify_unflushed_changes() debugging function
was deleted as the feature was pretty much stable by
then. Unfortunately though there was a reference to
it from a comment in metaslab_verify_weight_and_frag().

This patch deletes the reference and pastes that
comment as is.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9097
2019-07-30 10:13:44 -07:00
Michael Niewöhner a6c8289473 install path fixes
* rpm: correct pkgconfig path

pkconfig files get installed to $datarootdir/pkgconfig but rpm expects
them to be at $datadir. This works when $datarootdir==$datadir which is
the case most of the time but will fail when they differ.

* install: make initramfs-tools path static

Since initramfs-tools' path is nothing we can control as it is an
external package it does not make any sense to install zfs additions
anywhere else. Simply use /usr/share/initramfs-tools as path.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9087
2019-07-30 10:06:09 -07:00
Michael Niewöhner 85ce79bbc8 Increase default zcmd allocation to 256K
When creating hundreds of clones (for example using containers with
LXD) cloning slows down as the number of clones increases over time.
The reason for this is that the fetching of the clone information
using a small zcmd buffer requires two ioctl calls, one to determine
the size and a second to return the data. However, this requires
gathering the data twice, once to determine the size and again to
populate the zcmd buffer to return it to userspace.
These are expensive ioctl() calls, so instead, make the default buffer
size much larger: 256K.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9084
2019-07-30 09:59:38 -07:00
Matthew Ahrens 0eb8ba6ab6 Improve performance by using dmu_tx_hold_*_by_dnode()
In zfs_write() and dmu_tx_hold_sa(), we can use dmu_tx_hold_*_by_dnode()
instead of dmu_tx_hold_*(), since we already have a dbuf from the target
dnode in hand.  This eliminates some calls to dnode_hold(), which can be
expensive.  This is especially impactful if several threads are
accessing objects that are in the same block of dnodes, because they
will contend for that dbuf's lock.

We are seeing 10-20% performance wins for the sequential_writes tests in
the performance test suite, when doing >=128K writes to files with
recordsize=8K.

This also removes some unnecessary casts that are in the area.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9081
2019-07-30 09:18:30 -07:00
Brian Behlendorf 1e620c9872 Revert "Develop tests for issues #5866 and #8858"
This reverts commit 693c1fc478.  This
change resulted in a kmem leak being observed in existing code which
needs to be identified and addressed.

Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8978
Closes #9090
2019-07-29 12:46:56 -07:00
Brian Behlendorf adf495e239 Fix channel programs on s390x
When adapting the original sources for s390x the JMP_BUF_CNT was
mistakenly halved due to an incorrect assumption of the size of
a unsigned long.  They are 8 bytes for the s390x architecture.
Increase JMP_BUF_CNT accordingly.

Authored-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Colin Ian King <canonical.com>
Tested-by: Colin Ian King <canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8992
Closes #9080
2019-07-28 18:15:26 -07:00
George Wilson 453bb4791e Race between zfs-share and zfs-mount services
When a system boots the zfs-mount.service and the
zfs-share.service can start simultaneously. What may be
unclear is that sharing a filesystem will first mount
the filesystem if it's not already mounted. This means
that both service can race to mount the same fileystem.
This race can result in a SEGFAULT or EBUSY conditions.

This change explicitly defines the start ordering between the
two services such that the zfs-mount.service is solely
responsible for mounting filesystems eliminating the race
between "zfs mount -a" and "zfs share -a" commands.

Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <george.wilson@delphix.com>
Closes #9083
2019-07-28 18:13:56 -07:00
Paul Zuchowski 693c1fc478 Develop tests for issues #5866 and #8858
Provide zfstest coverage for these two issues which
were a panic accessing extended attributes and
a problem comparing 64 bit and 32 bit generation
numbers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Issue #5866
Issue #8858 
Closes #8978
2019-07-26 17:52:13 -07:00
Tomohiro Kusumi 9fb6abe5ad Implement secpolicy_vnode_setid_retain()
Don't unconditionally return 0 (i.e. retain SUID/SGID).
Test CAP_FSETID capability.

https://github.com/pjd/pjdfstest/blob/master/tests/chmod/12.t
which expects SUID/SGID to be dropped on write(2) by non-owner fails
without this. Most filesystems make this decision within VFS by using
a generic file write for fops.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9035 
Closes #9043
2019-07-26 13:52:30 -07:00
Matthew Ahrens 4b5c9d9f97 zed crashes when devid not present
zed core dumps due to a NULL pointer in zfs_agent_iter_vdev(). The
gs_devid is NULL, but the nvl has a "devid" entry.

zfs_agent_post_event() checks that ZFS_EV_VDEV_GUID or DEV_IDENTIFIER is
present in nvl, but then later it and zfs_agent_iter_vdev() assume that
DEV_IDENTIFIER is present and thus gs_devid is set.

Typically this is not a problem because usually either all vdevs have
devid's, or none of them do. Since zfs_agent_iter_vdev() first checks if
the vdev has devid before dereferencing gs_devid, the problem isn't
typically encountered. However, if some vdevs have devid's and some do
not, then the problem is easily reproduced.  This can happen if the pool
has been moved from a system that has devid's to one that does not.

The fix is for zfs_agent_iter_vdev() to only try to match the devid's if
both nvl and gsp have devid's present.

Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-65090
Closes #9054
Closes #9060
2019-07-26 12:07:48 -07:00
Sara Hartse 37f03da8ba Fast Clone Deletion
Deleting a clone requires finding blocks are clone-only, not shared
with the snapshot. This was done by traversing the entire block tree
which results in a large performance penalty for sparsely
written clones.

This is new method keeps track of clone blocks when they are
modified in a "Livelist" so that, when it’s time to delete,
the clone-specific blocks are already at hand.

We see performance improvements because now deletion work is
proportional to the number of clone-modified blocks, not the size
of the original dataset.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Sara Hartse <sara.hartse@delphix.com>
Closes #8416
2019-07-26 10:54:14 -07:00
Tomohiro Kusumi d274ac5460 Don't directly cast unsigned long to void*
Cast to uintptr_t first for portability on integer to/from pointer
conversion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9065
2019-07-25 11:59:20 -07:00
Matthew Ahrens 1ff46825e2 Replace zf_rwlock with a mutex
The rwlock implementation on linux does not perform as well as mutexes.
We can realize a performance benefit by replacing the zf_rwlock with a
mutex.  Local microbenchmarks show ~50% improvement, and over NFS we see
~5% improvement on several of the ZFS Performance Tests cases,
especially randwrite and seq_write.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #9062
2019-07-25 11:57:58 -07:00
Tomohiro Kusumi 09276fde1c Fix module_param() type for zfs_read_chunk_size
zfs_read_chunk_size is unsigned long.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9051
2019-07-19 11:23:56 -07:00
Tony Hutter d84c5a120e Move some tests to cli_user/zpool_status
The tests in tests/functional/cli_root/zpool_status should all require
root. However, linux.run has "user =" specified for those tests, which
means they run as a normal user.  When I removed that line to run them
as root, the following tests did not pass:

zpool_status_003_pos
zpool_status_-c_disable
zpool_status_-c_homedir
zpool_status_-c_searchpath

These tests need to be run as a normal user.  To fix this, move these
tests to a new tests/functional/cli_user/zpool_status directory.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #9057
2019-07-19 11:21:54 -07:00
Serapheim Dimitropoulos 7f31908913 Tricky semantics of ms_max_size in metaslab_should_allocate()
metaslab_should_allocate() is used in two places:
[1] When trying to select a metaslab to allocate from
[2] When trying to allocate from a metaslab

In [2] we always expect the metaslab to be loaded, and after
the refactoring of the log spacemap changes, whenever we load
a metaslab we set ms_max_size to the biggest range in the
ms_allocatable tree. Thus, when it is used in [2], if that
field is 0, it means that the metaslab doesn't have any
segments that can be used for allocations now (though it may
have some free space but that space can be in the freeing,
freed, or deferred trees).

In [1] a metaslab can be loaded or unloaded at which point 0
can either mean the metaslab doesn't have any space or the
metaslab is just not loaded thus we go ahead and try to make
an estimation based on its weight.

The issue here is when we call the above function for [2] and
the metaslab doesn't have any allocatable space, we still go
ahead and check its ms_weight which may be out of date because
we haven't ran metaslab_sync_done() yet. At that point we are
allowing an allocation to be attempted even though we know
there is no range that is allocatable.

This patch fixes this issue by explicitly checking if the
metaslab is loaded and if it is, the ms_max_size is used.

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9045
2019-07-19 11:19:50 -07:00
Serapheim Dimitropoulos 43a8536260 Race condition between spa async threads and export
In the past we've seen multiple race conditions that have
to do with open-context threads async threads and concurrent
calls to spa_export()/spa_destroy() (including the one
referenced in issue #9015).

This patch ensures that only one thread can execute the
main body of spa_export_common() at a time, with subsequent
threads returning with a new error code created just for
this situation, eliminating this way any race condition
bugs introduced by concurrent calls to this function.

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9015 
Closes #9044
2019-07-18 13:02:33 -07:00
Serapheim Dimitropoulos 1c44a5c97f hdr_recl calls zthr_wakeup() on destroyed zthr
There exists a race condition were hdr_recl() calls
zthr_wakeup() on a destroyed zthr. The timeline is the
following:

[1] hdr_recl() runs first and goes intro zthr_wakeup()
    because arc_initialized is set.
[2] arc_fini() is called by another thread, zeroes
    that flag, destroying the zthr, and goes into
    buf_init().
[3] hdr_recl() tries to enter the destroyed mutex
    and we blow up.

This patch ensures that the ARC's zthrs are not offloaded
any new work once arc_initialized is set and then destroys
them after all of the ARC state has been deleted.

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9047
2019-07-18 12:55:29 -07:00
Serapheim Dimitropoulos bac15c1198 zdb: don't print log spacemap stats in pools without the feature
Creating a pool with not features enabled and running
`zdb -mmmmmm on` it before the patch:

```
Log Space Maps in Pool:

Log Space Map Obsolete Entry Statistics:
0        valid entries out of 0        - txg 0
0        valid entries out of 0        - total
```

After this patch the above output goes away.

Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9048
2019-07-18 12:54:03 -07:00
Tomohiro Kusumi f79121d114 Fix wrong comment on zcr_blksz_{min,max}
These aren't tunable; illumos has this comment fixed in
"3742 zfs comments need cleaner, more consistent style",
so sync with that.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9052
2019-07-18 12:48:46 -07:00
Pavel Zakharov 26b6047469 New service that waits on zvol links to be created
The zfs-volume-wait.service scans existing zvols and waits for their
links under /dev to be created. Any service that depends on zvol
links to be there should add a dependency on zfs-volumes.target.
By default, this target is not enabled.

Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Zakharov <pzakharov@delphix.com>
Closes #8975
2019-07-17 15:33:05 -07:00
Brian Behlendorf d64dd3b62a Retire unused spl_{mutex,rwlock}_{init_fini}
These functions are unused and can be removed along
with the spl-mutex.c and spl-rwlock.c source files.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9029
2019-07-17 15:13:53 -07:00
Brian Behlendorf e7a99dab2b Linux 5.3 compat: retire rw_tryupgrade()
The Linux kernel's rwsem's have never provided an interface to
allow a reader to be upgraded to a writer.  Historically, this
functionality has been implemented by a SPL wrapper function.
However, this approach depends on internal knowledge of the
rw_semaphore and is therefore rather brittle.

Since the ZFS code must always be able to fallback to rw_exit()
and rw_enter() when an rw_tryupgrade() fails; this functionality
isn't critical.  Furthermore, the only potentially performance
sensitive consumer is dmu_zfetch() and no decrease in performance
was observed with this change applied.  See the PR comments for
additional testing details.

Therefore, it is being retired to make the build more robust and
to simplify the rwlock implementation.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9029
2019-07-17 15:08:56 -07:00
Brian Behlendorf 041205afee Linux 5.3 compat: rw_semaphore owner
Commit https://github.com/torvalds/linux/commit/94a9717b updated the
rwsem's owner field to contain additional flags describing the rwsem's
state.  Rather then update the wrappers to mask out these bits, the
code no longer relies on the owner stored by the kernel.  This does
increase the size of a krwlock_t but it makes the implementation
less sensitive to future kernel changes.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9029
2019-07-17 15:07:46 -07:00
jdike a649768a17 Fix lockdep recursive locking false positive in dbuf_destroy
lockdep reports a possible recursive lock in dbuf_destroy.

It is true that dbuf_destroy is acquiring the dn_dbufs_mtx
on one dnode while holding it on another dnode.  However,
it is impossible for these to be the same dnode because,
among other things,dbuf_destroy checks MUTEX_HELD before
acquiring the mutex.

This fix defines a class NESTED_SINGLE == 1 and changes
that lock to call mutex_enter_nested with a subclass of
NESTED_SINGLE.

In order to make the userspace code compile,
include/sys/zfs_context.h now defines mutex_enter_nested and
NESTED_SINGLE.

This is the lockdep report:

[  122.950921] ============================================
[  122.950921] WARNING: possible recursive locking detected
[  122.950921] 4.19.29-4.19.0-debug-d69edad5368c1166 #1 Tainted: G           O
[  122.950921] --------------------------------------------
[  122.950921] dbu_evict/1457 is trying to acquire lock:
[  122.950921] 0000000083e9cbcf (&dn->dn_dbufs_mtx){+.+.}, at: dbuf_destroy+0x3c0/0xdb0 [zfs]
[  122.950921]
               but task is already holding lock:
[  122.950921] 0000000055523987 (&dn->dn_dbufs_mtx){+.+.}, at: dnode_evict_dbufs+0x90/0x740 [zfs]
[  122.950921]
               other info that might help us debug this:
[  122.950921]  Possible unsafe locking scenario:

[  122.950921]        CPU0
[  122.950921]        ----
[  122.950921]   lock(&dn->dn_dbufs_mtx);
[  122.950921]   lock(&dn->dn_dbufs_mtx);
[  122.950921]
                *** DEADLOCK ***

[  122.950921]  May be due to missing lock nesting notation

[  122.950921] 1 lock held by dbu_evict/1457:
[  122.950921]  #0: 0000000055523987 (&dn->dn_dbufs_mtx){+.+.}, at: dnode_evict_dbufs+0x90/0x740 [zfs]
[  122.950921]
               stack backtrace:
[  122.950921] CPU: 0 PID: 1457 Comm: dbu_evict Tainted: G           O      4.19.29-4.19.0-debug-d69edad5368c1166 #1
[  122.950921] Hardware name: Supermicro H8SSL-I2/H8SSL-I2, BIOS 080011  03/13/2009
[  122.950921] Call Trace:
[  122.950921]  dump_stack+0x91/0xeb
[  122.950921]  __lock_acquire+0x2ca7/0x4f10
[  122.950921]  lock_acquire+0x153/0x330
[  122.950921]  dbuf_destroy+0x3c0/0xdb0 [zfs]
[  122.950921]  dbuf_evict_one+0x1cc/0x3d0 [zfs]
[  122.950921]  dbuf_rele_and_unlock+0xb84/0xd60 [zfs]
[  122.950921]  dnode_evict_dbufs+0x3a6/0x740 [zfs]
[  122.950921]  dmu_objset_evict+0x7a/0x500 [zfs]
[  122.950921]  dsl_dataset_evict_async+0x70/0x480 [zfs]
[  122.950921]  taskq_thread+0x979/0x1480 [spl]
[  122.950921]  kthread+0x2e7/0x3e0
[  122.950921]  ret_from_fork+0x27/0x50

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jeff Dike <jdike@akamai.com>
Closes #8984
2019-07-17 09:18:24 -07:00
Brian Behlendorf 095b5412b3 Fix CONFIG_X86_DEBUG_FPU build failure
When CONFIG_X86_DEBUG_FPU is defined the alternatives_patched symbol
is pulled in as a dependency which results in a build failure.  To
prevent this undefine CONFIG_X86_DEBUG_FPU to disable the WARN_ON_FPU()
macro and rely on WARN_ON_ONCE debugging checks which were previously
added.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9041
Closes #9049
2019-07-17 09:14:36 -07:00
Michael Niewöhner 5784c7c36e Add missing __GFP_HIGHMEM flag to vmalloc
Make use of __GFP_HIGHMEM flag in vmem_alloc, which is required for
some 32-bit systems to make use of full available memory.
While kernel versions >=4.12-rc1 add this flag implicitly, older
kernels do not.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9031
2019-07-17 09:09:22 -07:00
Tomohiro Kusumi c8802ba08d Use zfsctl_snapshot_hold() wrapper
zfs_refcount_*() are to be wrapped by zfsctl_snapshot_*() in this file.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9039
2019-07-17 09:07:53 -07:00
Brian Behlendorf 8062b7686a Minor style cleanup
Resolve an assortment of style inconsistencies including
use of white space, typos, capitalization, and line wrapping.
There is no functional change.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9030
2019-07-16 17:22:31 -07:00
Brian Behlendorf 3b03ff2276 Fix get_special_prop() build failure
The cast of the size_t returned by strlcpy() to a uint64_t by the
VERIFY3U can result in a build failure when CONFIG_FORTIFY_SOURCE
is set.  This is due to the additional hardening.  Since the token
is expected to always fit in strval the VERIFY3U has been removed.
If somehow it doesn't, it will still be safely truncated.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8999
Closes #9020
2019-07-16 14:14:12 -07:00
Mike Gerdts d45d7f08fa Add zfs create dryrun
Adds the ability to sanity check zfs create arguments and to see the
value of any additional properties that will local to the dataset.  For
example, automation that may need to adjust quota on a parent filesystem
before creating a volume may call `zfs create -nP -V <size> <volume>` to
obtain the value of refreservation.  This adds the following options to
zfs create:

- -n dry-run (no-op)
- -v verbose
- -P parseable (implies verbose)

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Jerry Jelinek <jerry.jelinek@joyent.com>
Signed-off-by: Mike Gerdts <mike.gerdts@joyent.com>
Closes #8974
2019-07-16 11:19:24 -07:00
Serapheim Dimitropoulos 93e28d661e Log Spacemap Project
= Motivation

At Delphix we've seen a lot of customer systems where fragmentation
is over 75% and random writes take a performance hit because a lot
of time is spend on I/Os that update on-disk space accounting metadata.
Specifically, we seen cases where 20% to 40% of sync time is spend
after sync pass 1 and ~30% of the I/Os on the system is spent updating
spacemaps.

The problem is that these pools have existed long enough that we've
touched almost every metaslab at least once, and random writes
scatter frees across all metaslabs every TXG, thus appending to
their spacemaps and resulting in many I/Os. To give an example,
assuming that every VDEV has 200 metaslabs and our writes fit within
a single spacemap block (generally 4K) we have 200 I/Os. Then if we
assume 2 levels of indirection, we need 400 additional I/Os and
since we are talking about metadata for which we keep 2 extra copies
for redundancy we need to triple that number, leading to a total of
1800 I/Os per VDEV every TXG.

We could try and decrease the number of metaslabs so we have less
I/Os per TXG but then each metaslab would cover a wider range on
disk and thus would take more time to be loaded in memory from disk.
In addition, after it's loaded, it's range tree would consume more
memory.

Another idea would be to just increase the spacemap block size
which would allow us to fit more entries within an I/O block
resulting in fewer I/Os per metaslab and a speedup in loading time.
The problem is still that we don't deal with the number of I/Os
going up as the number of metaslabs is increasing and the fact
is that we generally write a lot to a few metaslabs and a little
to the rest of them. Thus, just increasing the block size would
actually waste bandwidth because we won't be utilizing our bigger
block size.

= About this patch

This patch introduces the Log Spacemap project which provides the
solution to the above problem while taking into account all the
aforementioned tradeoffs. The details on how it achieves that can
be found in the references sections below and in the code (see
Big Theory Statement in spa_log_spacemap.c).

Even though the change is fairly constraint within the metaslab
and lower-level SPA codepaths, there is a side-change that is
user-facing. The change is that VDEV IDs from VDEV holes will no
longer be reused. To give some background and reasoning for this,
when a log device is removed and its VDEV structure was replaced
with a hole (or was compacted; if at the end of the vdev array),
its vdev_id could be reused by devices added after that. Now
with the pool-wide space maps recording the vdev ID, this behavior
can cause problems (e.g. is this entry referring to a segment in
the new vdev or the removed log?). Thus, to simplify things the
ID reuse behavior is gone and now vdev IDs for top-level vdevs
are truly unique within a pool.

= Testing

The illumos implementation of this feature has been used internally
for a year and has been in production for ~6 months. For this patch
specifically there don't seem to be any regressions introduced to
ZTS and I have been running zloop for a week without any related
problems.

= Performance Analysis (Linux Specific)

All performance results and analysis for illumos can be found in
the links of the references. Redoing the same experiments in Linux
gave similar results. Below are the specifics of the Linux run.

After the pool reached stable state the percentage of the time
spent in pass 1 per TXG was 64% on average for the stock bits
while the log spacemap bits stayed at 95% during the experiment
(graph: sdimitro.github.io/img/linux-lsm/PercOfSyncInPassOne.png).

Sync times per TXG were 37.6 seconds on average for the stock
bits and 22.7 seconds for the log spacemap bits (related graph:
sdimitro.github.io/img/linux-lsm/SyncTimePerTXG.png). As a result
the log spacemap bits were able to push more TXGs, which is also
the reason why all graphs quantified per TXG have more entries for
the log spacemap bits.

Another interesting aspect in terms of txg syncs is that the stock
bits had 22% of their TXGs reach sync pass 7, 55% reach sync pass 8,
and 20% reach 9. The log space map bits reached sync pass 4 in 79%
of their TXGs, sync pass 7 in 19%, and sync pass 8 at 1%. This
emphasizes the fact that not only we spend less time on metadata
but we also iterate less times to convergence in spa_sync() dirtying
objects.
[related graphs:
stock- sdimitro.github.io/img/linux-lsm/NumberOfPassesPerTXGStock.png
lsm- sdimitro.github.io/img/linux-lsm/NumberOfPassesPerTXGLSM.png]

Finally, the improvement in IOPs that the userland gains from the
change is approximately 40%. There is a consistent win in IOPS as
you can see from the graphs below but the absolute amount of
improvement that the log spacemap gives varies within each minute
interval.
sdimitro.github.io/img/linux-lsm/StockVsLog3Days.png
sdimitro.github.io/img/linux-lsm/StockVsLog10Hours.png

= Porting to Other Platforms

For people that want to port this commit to other platforms below
is a list of ZoL commits that this patch depends on:

Make zdb results for checkpoint tests consistent
db587941c5

Update vdev_is_spacemap_addressable() for new spacemap encoding
419ba59145

Simplify spa_sync by breaking it up to smaller functions
8dc2197b7b

Factor metaslab_load_wait() in metaslab_load()
b194fab0fb

Rename range_tree_verify to range_tree_verify_not_present
df72b8bebe

Change target size of metaslabs from 256GB to 16GB
c853f382db

zdb -L should skip leak detection altogether
21e7cf5da8

vs_alloc can underflow in L2ARC vdevs
7558997d2f

Simplify log vdev removal code
6c926f426a

Get rid of space_map_update() for ms_synced_length
425d3237ee

Introduce auxiliary metaslab histograms
928e8ad47d

Error path in metaslab_load_impl() forgets to drop ms_sync_lock
8eef997679

= References

Background, Motivation, and Internals of the Feature
- OpenZFS 2017 Presentation:
youtu.be/jj2IxRkl5bQ
- Slides:
slideshare.net/SerapheimNikolaosDim/zfs-log-spacemaps-project

Flushing Algorithm Internals & Performance Results
(Illumos Specific)
- Blogpost:
sdimitro.github.io/post/zfs-lsm-flushing/
- OpenZFS 2018 Presentation:
youtu.be/x6D2dHRjkxw
- Slides:
slideshare.net/SerapheimNikolaosDim/zfs-log-spacemap-flushing-algorithm

Upstream Delphix Issues:
DLPX-51539, DLPX-59659, DLPX-57783, DLPX-61438, DLPX-41227, DLPX-59320
DLPX-63385

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #8442
2019-07-16 10:11:49 -07:00
Antonio Russo df834a7ccc Enable zfs-mount-generator by default
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #8750
Closes #8848
2019-07-15 16:31:57 -07:00
Antonio Russo f88d069cbb systemd encryption key support
Modify zfs-mount-generator to produce a dependency on new
zfs-import-key-*.service units, dynamically created at boot to call
zfs load-key for the encryption root, before attempting to mount any
encrypted datasets.

These units are created by zfs-mount-generator, and RequiresMountsFor on
the keyfile, if present, or call systemd-ask-password if a passphrase is
requested.

This patch includes suggestions from @Fabian-Gruenbichler, @ryanjaeb and
@rlaager, as well an adaptation of @rlaager's script to retry on
incorrect password entry.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #8750
Closes #8848
2019-07-15 16:31:47 -07:00
Tomohiro Kusumi 6993e01202 Drop redundant POSIX ACL check in zpl_init_acl()
ZFS_ACLTYPE_POSIXACL has already been tested in zpl_init_acl(),
so no need to test again on POSIX ACL access.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9009
2019-07-15 16:26:52 -07:00
Brian Behlendorf 64f3d39ae4 Export dnode symbols
External consumers such as Lustre require access to the dnode
interfaces in order to correctly manipulate dnodes.

Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8994
Closes #9027
2019-07-15 16:11:55 -07:00
Tom Caputi 9949b856a0 Ensure dsl_destroy_head() decrypts objsets
This patch corrects a small issue where the dsl_destroy_head()
code that runs when the async_destroy feature is disabled would
not properly decrypt the dataset before beginning processing.
If the dataset is not able to be decrypted, the optimization
code now simply does not run and the dataset is completely
destroyed in the DSL sync task.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9021
2019-07-15 16:08:42 -07:00
Tomohiro Kusumi ff9630d1a8 Disable unused pathname::pn_path* (unneeded in Linux)
struct pathname is originally from Solaris VFS, and it has been used
in ZoL to merely call VOP from Linux VFS interface without API change,
therefore pathname::pn_path* are unused and unneeded. Technically,
struct pathname is a wrapper for C string in ZoL.

Saves stack a bit on lookup and unlink.

(#if0'd members instead of removing since comments refer to them.)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #9025
2019-07-15 13:57:56 -07:00
Brian Behlendorf e5db313494 Linux 5.0 compat: SIMD compatibility
Restore the SIMD optimization for 4.19.38 LTS, 4.14.120 LTS,
and 5.0 and newer kernels.  This is accomplished by leveraging
the fact that by definition dedicated kernel threads never need
to concern themselves with saving and restoring the user FPU state.
Therefore, they may use the FPU as long as we can guarantee user
tasks always restore their FPU state before context switching back
to user space.

For the 5.0 and 5.1 kernels disabling preemption and local
interrupts is sufficient to allow the FPU to be used.  All non-kernel
threads will restore the preserved user FPU state.

For 5.2 and latter kernels the user FPU state restoration will be
skipped if the kernel determines the registers have not changed.
Therefore, for these kernels we need to perform the additional
step of saving and restoring the FPU registers.  Invalidating the
per-cpu global tracking the FPU state would force a restore but
that functionality is private to the core x86 FPU implementation
and unavailable.

In practice, restricting SIMD to kernel threads is not a major
restriction for ZFS.  The vast majority of SIMD operations are
already performed by the IO pipeline.  The remaining cases are
relatively infrequent and can be handled by the generic code
without significant impact.  The two most noteworthy cases are:

  1) Decrypting the wrapping key for an encrypted dataset,
     i.e. `zfs load-key`.  All other encryption and decryption
     operations will use the SIMD optimized implementations.

  2) Generating the payload checksums for a `zfs send` stream.

In order to avoid making any changes to the higher layers of ZFS
all of the `*_get_ops()` functions were updated to take in to
consideration the calling context.  This allows for the fastest
implementation to be used as appropriate (see kfpu_allowed()).

The only other notable instance of SIMD operations being used
outside a kernel thread was at module load time.  This code
was moved in to a taskq in order to accommodate the new kernel
thread restriction.

Finally, a few other modifications were made in order to further
harden this code and facilitate testing.  They include updating
each implementations operations structure to be declared as a
constant.  And allowing "cycle" to be set when selecting the
preferred ops in the kernel as well as user space.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8754 
Closes #8793 
Closes #8965
2019-07-12 09:31:20 -07:00
Nick Mattis d230a65c3b Fixes: #8934 Large kmem_alloc
Large allocation over the spl_kmem_alloc_warn value was being performed.
Switched to vmem_alloc interface as specified for large allocations.
Changed the subsequent frees to match.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: nmattis <nickm970@gmail.com>
Closes #8934
Closes #9011
2019-07-10 15:54:49 -07:00
Attila Fülöp c3fba9091b Fix ZTS killed processes detection
log_neg_expect was using the wrong exit status to detect if a process
got killed by SIGSEGV or SIGBUS, resulting in false positives.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #9003
2019-07-10 11:44:52 -07:00
Shaun Tancheff 02fad9260a pkg-utils python sitelib for SLES15
Use python -Esc to set __python_sitelib.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shaun Tancheff <stancheff@cray.com>
Closes #8969
2019-07-09 13:02:40 -07:00
Tomohiro Kusumi ab5036df1c Fix race in parallel mount's thread dispatching algorithm
Strategy of parallel mount is as follows.

1) Initial thread dispatching is to select sets of mount points that
 don't have dependencies on other sets, hence threads can/should run
 lock-less and shouldn't race with other threads for other sets. Each
 thread dispatched corresponds to top level directory which may or may
 not have datasets to be mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1)
 is to mount datasets for each set of mount points. The mount points
 within each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.

There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

1) #8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directories.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test1 and "/a/b"
  * ThreadB for "/a" for test0/a
 and in case of #8833, ThreadA won the race. Two threads were created
 because "/a" wasn't considered as `"/a" contains "/a"`.

2) #8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list containing "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of #8450, ThreadA won the race. Two threads were created
 because "/var/data" wasn't considered as `"/" contains "/var/data"`.
 In other words, if there is (at least one) "/" in the input list,
 the initial thread dispatching must be single-threaded since every
 directory is a child of "/", meaning they all directly or indirectly
 depend on "/".

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8450
Closes #8833
Closes #8878
2019-07-09 09:31:46 -07:00
loli10K a54f92b4b1 Fix dracut Debian/Ubuntu packaging
This commit ensures make(1) targets that build .deb packages fail if
alien(1) can't convert all .rpm files; additionally it also updates
the zfs-dracut package name which was changed to "noarch" in ca4e5a7.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8990 
Closes #8991
2019-07-09 09:28:05 -07:00
loli10K 1d20b763bb zfs send does not handle invalid input gracefully
Due to some changes introduced in 30af21b 'zfs send' can crash when
provided with invalid inputs: this change attempts to add more checks
to the affected code paths.

Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9001
2019-07-08 15:10:23 -07:00
Paul Dagnelie f664f1ee7f Decrease contention on dn_struct_rwlock
Currently, sequential async write workloads spend a lot of time 
contending on the dn_struct_rwlock. This lock is responsible for 
protecting the entire block tree below it; this naturally results 
in some serialization during heavy write workloads. This can be 
resolved by having per-dbuf locking, which will allow multiple 
writers in the same object at the same time.

We introduce a new rwlock, the db_rwlock. This lock is responsible 
for protecting the contents of the dbuf that it is a part of; when 
reading a block pointer from a dbuf, you hold the lock as a reader. 
When writing data to a dbuf, you hold it as a writer. This allows 
multiple threads to write to different parts of a file at the same 
time.

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens matt@delphix.com
Reviewed by: George Wilson george.wilson@delphix.com
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
External-issue: DLPX-52564
External-issue: DLPX-53085
External-issue: DLPX-57384
Closes #8946
2019-07-08 13:18:50 -07:00
Brad Lewis cb70964221 8659 static dtrace probes unavailable on non-GPL modules
ZFS tracing efforts are hampered by the inability to access zfs static
probes(probes using DTRACE_PROBE macros). The probes are available via
tracepoints for GPL modules only.  The build could be modified to
generate a function for each unique DTRACE_PROBE invocation. These could
be then accessed via kprobes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Brad Lewis <brad.lewis@delphix.com>
Closes #8659 
Closes #8663
2019-07-08 11:20:53 -07:00
Brian Behlendorf 1086f54219 Revert "Fail early on bio corruption confirmed on 5.2-rc1"
This reverts commit aa7aab6c45.
The change is not compatible with CentOS 6's 2.6.32 based kernel
due to differnces in the bio layer.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8961
2019-07-05 20:38:56 -07:00
Tom Caputi 52a83dc694 Remove VERIFY from dsl_dataset_crypt_stats()
This patch fixes an issue where dsl_dataset_crypt_stats() would
VERIFY that it was able to hold the encryption root. This function
should instead silently continue without populating the related
field in the nvlist, as is the convention for this code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8976
2019-07-05 16:53:14 -07:00
Paul Dagnelie fe0ea84812 Don't activate metaslabs with weight 0
We return ENOSPC in metaslab_activate if the metaslab has weight 0, 
to avoid activating a metaslab with no space available.  For sanity 
checking, we also assert that there is no free space in the range 
tree in that case.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #8968
2019-07-05 16:45:20 -07:00
loli10K 3b5fe2c351 Fix zfs "redact" misc issues
* zfs redact error messages do not end with newline character
 * 30af21b0 inadvertently removed some ZFS_PROP comments
 * man/zfs: zfs redact <redaction_snapshot> is not optional

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8988
2019-07-05 16:38:17 -07:00
Mike Gerdts 341166c843 OpenZFS 9318 - vol_volsize_to_reservation does not account for raidz skip blocks
When a volume is created in a pool with raidz vdevs and
volblocksize != 128k, the volume can reference more space than is
reserved with the automatically calculated refreservation.  There
are two deficiencies in vol_volsize_to_reservation that contribute
to this:

  1) Skip blocks may be added to keep each allocation a multiple
     of parity + 1. This is the dominating factor when volblocksize
     is close to 2^ashift.

  2) raidz deflation for 128 KB blocks is different for most other
     block sizes.

See "The theory of raidz space accounting" comment in
libzfs_dataset.c for a full explanation.

Authored by: Mike Gerdts <mike.gerdts@joyent.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Kody Kantor <kody.kantor@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Mike Gerdts <mike.gerdts@joyent.com>

Porting Notes:
* ZTS: wait for zvols to exist before writing
* ZTS: use log_must_busy with {zpool|zfs} destroy

OpenZFS-issue: https://www.illumos.org/issues/9318
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/b73ccab0
Closes #8973
2019-07-05 15:35:15 -07:00
Paul Zuchowski 6dbca94f0c Improve "Unable to automount" error message.
Having the mountpoint and dataset name both in the message made it
confusing to read.  Additionally, convert this to a zfs_dbgmsg rather than
sending it to the console.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #8959
2019-07-03 13:05:02 -07:00
Tomohiro Kusumi aa7aab6c45 Fail early on bio corruption confirmed on 5.2-rc1
Unable to import zpool with "Large kmem_alloc" warning due to
corrupted bio's with invalid # of page vectors.
See #8867 for details.

Fail early with ENOMEM.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8867 
Closes #8961
2019-07-03 13:03:22 -07:00
Brian Behlendorf 46db9d6161 Check b_freeze_cksum under ZFS_DEBUG_MODIFY conditional
The b_freeze_cksum field can only have data when ZFS_DEBUG_MODIFY
is set.  Therefore, the EQUIV check must be wrapped accordingly.
For the same reason the ASSERT in arc_buf_fill() in unsafe.
However, since it's largely redundant it has simply been removed.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8979
2019-07-03 13:01:54 -07:00
Matthew Ahrens aa79910726 Fix typo in zpool-features.5, section bookmark_written
The full property name includes "delphix", not "delphxi".

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8985
2019-07-03 12:57:05 -07:00
Tom Caputi 2ba59fa9f1 Fix error text for EINVAL in zfs_receive_one()
This small patch fixes the EINVAL case for zfs_receive_one(). A
missing 'else' has been added to the two possible cases, which
will ensure the intended error message is printed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8977
2019-07-02 17:29:59 -07:00
Tomohiro Kusumi df358db7c3 Don't use d_path() for automount mount point for chroot'd process
Chroot'd process fails to automount snapshots due to realpath(3)
failure in mount.zfs(8).

Construct a mount point path from sb of the ctldir inode and dirent
name, instead of from d_path(), so that chroot'd process doesn't get
affected by its view of fs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8903 
Closes #8966
2019-07-02 08:25:23 -07:00
loli10K ac6d54160a Fix zfs(8) mandoc errors
This was accidentally introduced in 765d1f06:

mandoc: ./man/man8/zfs.8: ERROR: skipping item outside list: It Ar filesystem Ns | Ns Ar mountpoint
mandoc: ./man/man8/zfs.8: ERROR: skipping item outside list: It Xo
mandoc: ./man/man8/zfs.8: ERROR: skipping end of block that is not open: Xc
mandoc: ./man/man8/zfs.8: ERROR: skipping item outside list: It Xo
mandoc: ./man/man8/zfs.8: ERROR: skipping end of block that is not open: Xc

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8980
2019-07-02 08:19:55 -07:00
George Wilson 681a85cb01 nopwrites on dmu_sync-ed blocks can result in a panic
After device removal, performing nopwrites on a dmu_sync-ed block
will result in a panic. This panic can show up in two ways:
1. an attempt to issue an IOCTL in vdev_indirect_io_start()
2. a failed comparison of zio->io_bp and zio->io_bp_orig in
   zio_done()
To resolve both of these panics, nopwrites of blocks on indirect
vdevs should be ignored and new allocations should be performed on
concrete vdevs.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #8957
2019-06-28 12:40:23 -07:00
Tom Caputi 765d1f0644 Add 'zfs umount -u' for encrypted datasets
This patch adds the ability for the user to unload keys for
datasets as they are being unmounted. This is analogous to
'zfs mount -l'.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes: #8917
Closes: #8952
2019-06-28 12:38:37 -07:00
Paul Dagnelie 679b0f2abf Concurrent small allocation defeats large allocation
With the new parallel allocators scheme, there is a possibility for 
a problem where two threads, allocating from the same allocator at 
the same time, conflict with each other. There are two primary cases 
to worry about. First, another thread working on another allocator
activates the same metaslab that the first thread was trying to
activate. This results in the first thread needing to go back and
reselect a new metaslab, even though it may have waited a long time
for this metaslab to load. Second, another thread working on the same
allocator may have activated a different metaslab while the first
thread was waiting for its metaslab to load. Both of these cases
can cause the first thread to be significantly delayed in issuing 
its IOs. The second case can also cause metaslab load/unload churn; 
because the metaslab is loaded but not fully activated, we never set 
the selected_txg, which results in the metaslab being immediately 
unloaded again. This process can repeat many times, wasting disk and 
cpu resources. This is more likely to happen when the IO of the first 
thread is a larger one (like a ZIL write) and the other thread is 
doing a smaller write, because it is more likely to find an 
acceptable metaslab quickly.

There are two primary changes. The first is to always proceed with 
the allocation when returning from metaslab_activate if we were 
preempted in either of the ways described in the previous section. 
The second change is to set the selected_txg before we do the call 
to activate so that even if the metaslab is not used for an 
allocation, we won't immediately attempt to unload it.

Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
External-issue: DLPX-61314
Closes #8843
2019-06-26 11:00:12 -07:00
Paul Dagnelie 3fab4d9e08 zdb -vvvvv on ztest pool dies with "out of memory"
ztest creates some extremely large files as part of its 
operation. When zdb tries to dump a large enough file, it 
can run out of memory or spend an extremely long time 
attempting to print millions or billions of uint64_ts.

We cap the amount of data from a uint64 object that we 
are willing to read and print.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
External-issue: DLPX-53814
Closes #8947
2019-06-25 12:50:37 -07:00
Alexander Motin fc7546777b Avoid extra taskq_dispatch() calls by DMU
DMU sync code calls taskq_dispatch() for each sublist of os_dirty_dnodes
and os_synced_dnodes.  Since the number of sublists by default is equal
to number of CPUs, it will dispatch equal, potentially large, number of
tasks, waking up many CPUs to handle them, even if only one or few of
sublists actually have any work to do.

This change adds check for empty sublists to avoid this.

Reviewed by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Closes #8909
2019-06-25 12:03:38 -07:00
loli10K 5279ae918b Redacted Send/Receive causes zdb to dump core
When used with verbosity >= 4 zdb fails an assertion in dump_bookmarks()
because it expects snprintf() to retun 0 on success.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8948
2019-06-24 18:06:26 -07:00
loli10K 746d4a451e Fix bp_embedded_type enum definition
With the addition of BP_EMBEDDED_TYPE_REDACTED in 30af21b0 a couple of
codepaths make wrong assumptions and could potentially result in errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8951
2019-06-24 18:02:17 -07:00
Igor K 13d454c6ca -Y option for zdb is valid
The -Y option was added for ztest to test split block reconstruction.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #8926
2019-06-24 17:58:11 -07:00
Matthew Ahrens 59ec30a329 Remove code for zfs remap
The "zfs remap" command was disabled by
6e91a72fe3, because it has little utility
and introduced some tricky bugs.  This commit removes the code for it,
the associated ZFS_IOC_REMAP ioctl, and tests.

Note that the ioctl and property will remain, but have no functionality.
This allows older software to fail gracefully if it attempts to use
these, and avoids a backwards incompatibility that would be introduced if
we renumbered the later ioctls/props.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8944
2019-06-24 16:44:01 -07:00
Tom Caputi 53864800f6 Fix error message on promoting encrypted dataset
This patch corrects the error message reported when attempting
to promote a dataset outside of its encryption root.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8905 
Closes #8935
2019-06-24 16:42:52 -07:00
Brian Behlendorf 8f12a4f8d2 Fix out-of-tree build failures
Resolve the incorrect use of srcdir and builddir references for
various files in the build system.  These have crept in over time
and went unnoticed because when building in the top level directory
srcdir and builddir are identical.

With this change it's again possible to build in a subdirectory.

    $ mkdir obj
    $ cd obj
    $ ../configure
    $ make

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8921 
Closes #8943
2019-06-24 09:32:47 -07:00
Tomohiro Kusumi cc9625c47c Add missing .gitignore from "Implement Redacted Send/Receive"
30af21b025 needed to ignore get_diff executable.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8950
2019-06-23 11:51:29 -07:00
Don Brady 186898bbb5 OpenZFS 9425 - channel programs can be interrupted
Problem Statement
=================
ZFS Channel program scripts currently require a timeout, so that hung or
long-running scripts return a timeout error instead of causing ZFS to get
wedged. This limit can currently be set up to 100 million Lua instructions.
Even with a limit in place, it would be desirable to have a sys admin
(support engineer) be able to cancel a script that is taking a long time.

Proposed Solution
=================
Make it possible to abort a channel program by sending an interrupt signal.In
the underlying txg_wait_sync function, switch the cv_wait to a cv_wait_sig to
catch the signal. Once a signal is encountered, the dsl_sync_task function can
install a Lua hook that will get called before the Lua interpreter executes a
new line of code. The dsl_sync_task can resume with a standard txg_wait_sync
call and wait for the txg to complete.  Meanwhile, the hook will abort the
script and indicate that the channel program was canceled. The kernel returns
a EINTR to indicate that the channel program run was canceled.

Porting notes: Added missing return value from cv_wait_sig()

Authored by: Don Brady <don.brady@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Don Brady <don.brady@delphix.com>

OpenZFS-issue: https://www.illumos.org/issues/9425
OpenZFS-commit: https://github.com/illumos/illumos-gate/commit/d0cb1fb926
Closes #8904
2019-06-22 16:51:46 -07:00
Matthew Ahrens cb9e5b7e84 dn_struct_rwlock can not be held in dmu_tx_try_assign()
The thread calling dmu_tx_try_assign() can't hold the dn_struct_rwlock
while assigning the tx, because this can lead to deadlock. Specifically,
if this dnode is already assigned to an earlier txg, this thread may
need to wait for that txg to sync (the ERESTART case below).  The other
thread that has assigned this dnode to an earlier txg prevents this txg
from syncing until its tx can complete (calling dmu_tx_commit()), but it
may need to acquire the dn_struct_rwlock to do so (e.g. via
dmu_buf_hold*()).

This commit adds an assertion to dmu_tx_try_assign() to ensure that this
deadlock is not inadvertently introduced.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8929
2019-06-22 16:48:54 -07:00
gordan-bobic ca4e5a785f Remove arch and relax version dependency
Remove arch and relax version dependency for zfs-dracut
package.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gordan Bobic <gordan@redsleeve.org>
Issue #8913 
Closes #8914
2019-06-22 16:47:18 -07:00
Harry Mallon 8b14cb46bf Add libnvpair to libzfs pkg-config
Functions such as `fnvlist_lookup_nvlist` need libnvpair to be linked.
Default pkg-config file did not contain it.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Harry Mallon <hjmallon@gmail.com>
Closes #8919
2019-06-22 16:43:11 -07:00
Don Brady a9cd8bfde7 Let zfs mount all tolerate in-progress mounts
The zfs-mount service can unexpectedly fail to start when zfs 
encounters a mount that is in progress. This service uses 
zfs mount -a, which has a window between the time it checks if 
the dataset was mounted and when the actual mount (via mount.zfs 
binary) occurs.

The reason for the racing mounts is that both zfs-mount.target 
and zfs-share.target are allowed to execute concurrently after 
the import.  This is more of an issue with the relatively recent 
addition of parallel mounting, and we should consider serializing
the mount and share targets.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #8881
2019-06-22 16:41:21 -07:00
Allan Jude fb6e6f1ffb zstreamdump: add per-record-type counters and an overhead counter
Count the bytes of payload for each replication record type

Count the bytes of overhead (replication records themselves)

Include these counters in the output summary at the end of the run.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Sponsored-By: Klara Systems and Catalogic
Closes #8432
2019-06-22 16:33:44 -07:00
Paul Dagnelie 2b09628b59 Fix comments on zfs_bookmark_phys
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #8945
2019-06-22 16:32:26 -07:00
Tomohiro Kusumi d5bf1cf179 Fix build break by "Implement Redacted Send/Receive"
30af21b025 broke build on Fedora. gcc can detect potential overflow
on compile-time. Consider strlen of already copied string.

Also change strn to strl variants per suggestion from @behlendorf
and @ofaaland.

--
libzfs_input_check.c: In function 'test_redact':
libzfs_input_check.c:711:2: error: 'strncat' specified bound 288 equals
 destination size [-Werror=stringop-overflow=]
  strncat(bookmark, "#testbookmark", sizeof (bookmark));
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8939
2019-06-22 16:30:59 -07:00
Paul Dagnelie a370182fed Add SCSI_PASSTHROUGH to zvols to enable UNMAP support
When exporting ZVOLs as SCSI LUNs, by default Windows will not
issue them UNMAP commands. This reduces storage efficiency in
many cases.

We add the SCSI_PASSTHROUGH flag to the zvol's device queue,
which lets the SCSI target logic know that it can handle SCSI
commands.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #8933
2019-06-21 09:40:56 -07:00
loli10K 3976fd65d3 Redacted Send/Receive broke zfs(8) help message
Since 30af21b0 was merged 'zfs send' help message format is broken
and lists "-r" as a valid option: this commit corrects these
small issues.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8942
2019-06-21 09:38:15 -07:00
Tomohiro Kusumi 9585497208 Prevent pointer to an out-of-scope local variable
`show_str` could be a pointer to a local variable in stack
which is out-of-scope by the time
`return (snprintf(buf, buflen, "%s\n", show_str));`
is called.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8924 
Closes #8940
2019-06-20 18:31:52 -07:00
Matthew Ahrens accd6d9dc4 dedup=verify doesn't clear the blkptr's dedup flag
The logic to handle strong checksum collisions where the data doesn't
match is incorrect. It is not clearing the dedup bit of the blkptr,
which can cause a panic later in zio_ddt_free() due to the dedup table
not matching what is in the blkptr.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-48097
Closes #8936
2019-06-20 18:30:40 -07:00
Igor K a64f8276c7 Update vdev_ops_t from illumos
Align vdev_ops_t from illumos for better compatibility.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #8925
2019-06-20 18:29:02 -07:00
Tom Caputi da68988708 Allow unencrypted children of encrypted datasets
When encryption was first added to ZFS, we made a decision to
prevent users from creating unencrypted children of encrypted
datasets. The idea was to prevent users from inadvertently
leaving some of their data unencrypted. However, since the
release of 0.8.0, some legitimate reasons have been brought up
for this behavior to be allowed. This patch simply removes this
limitation from all code paths that had checks for it and updates
the tests accordingly.

Reviewed-by: Jason King <jason.king@joyent.com>
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8737 
Closes #8870
2019-06-20 12:29:51 -07:00
dacianstremtan 84b4201f32 Replace whereis with type in zfs-lib.sh
The whereis command should not be used since it may not exist 
in the initramfs.  The dracut plymouth module also uses the type
command instead of whereis.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Signed-off-by: Dacian Reece-Stremtan <dacianstremtan@gmail.com>
Closes #8920 
Closes #8938
2019-06-20 12:27:14 -07:00
Matthew Ahrens 050d720c43 Remove dedupditto functionality
If dedup is in use, the `dedupditto` property can be set, causing ZFS to
keep an extra copy of data that is referenced many times (>100x).  The
idea was that this data is more important than other data and thus we
want to be really sure that it is not lost if the disk experiences a
small amount of random corruption.

ZFS (and system administrators) rely on the pool-level redundancy to
protect their data (e.g. mirroring or RAIDZ).  Since the user/sysadmin
doesn't have control over what data will be offered extra redundancy by
dedupditto, this extra redundancy is not very useful.  The bulk of the
data is still vulnerable to loss based on the pool-level redundancy.
For example, if particle strikes corrupt 0.1% of blocks, you will either
be saved by mirror/raidz, or you will be sad.  This is true even if
dedupditto saved another 0.01% of blocks from being corrupted.

Therefore, the dedupditto functionality is rarely enabled (i.e. the
property is rarely set), and it fulfills its promise of increased
redundancy even more rarely.

Additionally, this feature does not work as advertised (on existing
releases), because scrub/resilver did not repair the extra (dedupditto)
copy (see https://github.com/zfsonlinux/zfs/pull/8270).

In summary, this seldom-used feature doesn't work, and even if it did it
wouldn't provide useful data protection.  It has a non-trivial
maintenance burden (again see https://github.com/zfsonlinux/zfs/pull/8270).

We should remove the dedupditto functionality.  For backwards
compatibility with the existing CLI, "zpool set dedupditto" will still
"succeed" (exit code zero), but won't have any effect.  For backwards
compatibility with existing pools that had dedupditto enabled at some
point, the code will still be able to understand dedupditto blocks and
free them when appropriate.  However, ZFS won't write any new dedupditto
blocks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Issue #8270 
Closes #8310
2019-06-19 14:54:02 -07:00
Tomohiro Kusumi fb0be12d7b Use ZFS_DEV macro instead of literals
The rest of the code/comments use ZFS_DEV, so sync with that.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8912
2019-06-19 12:27:31 -07:00
Michael Niewöhner 0b755ec3d5 Fix memory leak in check_disk()
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #8897  
Closes #8911
2019-06-19 11:53:37 -07:00
Olaf Faaland c308b1dd63 kmod-zfs-devel rpm should provide kmod-spl-devel
When configure is run with --with-spec=redhat, and rpms are built, the
kmod-zfs-devel package is missing

Provides: kmod-spl-devel = %{version}

which is required by software such as Lustre which builds against zfs
kmods.  Adding it makes it easier for such software to build against
both zfs-0.7 (where SPL is separate and may be missing) and zfs-0.8.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #8930
2019-06-19 11:44:44 -07:00
Brian Behlendorf 4ca457b065 ZTS: Fix mmp_interval failure
The mmp_interval test case was failing on Fedora 30 due to the built-in
'echo' command terminating the script when it was unable to write to
the sysfs module parameter.  This change in behavior was observed with
ksh-2020.0.0-alpha1.  Resolve the issue by using the external cat
command which fails gracefully as expected.

Additionally, remove some incorrect quotes around the $? return values.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8906
2019-06-19 10:39:28 -07:00
Paul Dagnelie 30af21b025 Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to 
a target system. One possible use case for this feature is to not 
transmit sensitive information to a data warehousing, test/dev, or 
analytics environment. Another is to save space by not replicating 
unimportant data within a given dataset, for example in backup tools 
like zrepl.

Redacted send/receive is a three-stage process. First, a clone (or 
clones) is made of the snapshot to be sent to the target. In this 
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction 
snapshot" (or snapshots). Second, the new zfs redact command is used 
to create a redaction bookmark. The redaction bookmark stores the 
list of blocks in a snapshot that were modified by the redaction 
snapshot(s). Finally, the redaction bookmark is passed as a parameter 
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive 
or unwanted information, and those blocks are not included in the send 
stream.  When sending from the redaction bookmark, the blocks it 
contains are considered as candidate blocks in addition to those 
blocks in the destination snapshot that were modified since the 
creation_txg of the redaction bookmark.  This step is necessary to 
allow the target to rehydrate data in the case where some blocks are 
accidentally or unnecessarily modified in the redaction snapshot.

The changes to bookmarks to enable fast space estimation involve 
adding deadlists to bookmarks. There is also logic to manage the 
life cycles of these deadlists.

The new size estimation process operates in cases where previously 
an accurate estimate could not be provided. In those cases, a send 
is performed where no data blocks are read, reducing the runtime 
significantly and providing a byte-accurate size estimate.

Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 09:48:12 -07:00
Alexander Motin c1b5801bb5 Minimize aggsum_compare(&arc_size, arc_c) calls.
For busy ARC situation when arc_size close to arc_c is desired.  But
then it is quite likely that aggsum_compare(&arc_size, arc_c) will need
to flush per-CPU buckets to find exact comparison result.  Doing that
often in a hot path penalizes whole idea of aggsum usage there, since it
replaces few simple atomic additions with dozens of lock acquisitions.

Replacing aggsum_compare() with aggsum_upper_bound() in code increasing
arc_p when ARC is growing (arc_size < arc_c) according to PMC profiles
allows to save ~5% of CPU time in aggsum code during sequential write
to 12 ZVOLs with 16KB block size on large dual-socket system.

I suppose there some minor arc_p behavior change due to lower precision
of the new code, but I don't think it is a big deal, since it should
affect only very small window in time (aggsum buckets are flushed every
second) and in ARC size (buckets are limited to 10 average ARC blocks
per CPU).

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Closes #8901
2019-06-14 14:07:33 -07:00
Ryan Moeller b1b4ac2708 Python config cleanup
Don't require Python at configure/build unless building pyzfs.
Move ZFS_AC_PYTHON_MODULE to always-pyzfs.m4 where it is used.
Make test syntax more consistent.

Sponsored by: iXsystems, Inc.
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #8895
2019-06-13 13:15:46 -07:00
Matthew Ahrens 7218b29e4b lz4_decompress_abd declared but not defined
`lz4_decompress_abd` is declared in zio_compress.h but it is not defined
anywhere. The declaration should be removed.

Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-47477
Closes #8894
2019-06-13 13:14:34 -07:00
Matthew Ahrens 53dce5acc6 panic in removal_remap test on 4K devices
If the zfs_remove_max_segment tunable is changed to be not a multiple of
the sector size, then the device removal code will malfunction and try
to create mappings that are smaller than one sector, leading to a panic.

On debug bits this assertion will fail in spa_vdev_copy_segment():
    ASSERT3U(DVA_GET_ASIZE(&dst), ==, size);

On nondebug, the system panics with a stack like:
    metaslab_free_concrete()
    metaslab_free_impl()
    metaslab_free_impl_cb()
    vdev_indirect_remap()
    free_from_removing_vdev()
    metaslab_free_impl()
    metaslab_free_dva()
    metaslab_free()

Fortunately, the default for zfs_remove_max_segment is 1MB, so this
can't occur by default.  We hit it during this test because
removal_remap.ksh changes zfs_remove_max_segment to 1KB. When testing on
4KB-sector disks, we hit the bug.

This change makes the zfs_remove_max_segment tunable more robust,
automatically rounding it up to a multiple of the sector size. We also
turn some key assertions into VERIFY's so that similar bugs would be
caught before they are encoded on disk (and thus avoid a
panic-reboot-loop).

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-61342
Closes #8893
2019-06-13 13:12:39 -07:00
Matthew Ahrens be89734a29 compress metadata in later sync passes
Starting in sync pass 5 (zfs_sync_pass_dont_compress), we disable
compression (including of metadata).  Ostensibly this helps the sync
passes to converge (i.e. for a sync pass to not need to allocate
anything because it is 100% overwrites).

However, in practice it increases the average number of sync passes,
because when we turn compression off, a lot of block's size will change
and thus we have to re-allocate (not overwrite) them.  It also increases
the number of 128KB allocations (e.g. for indirect blocks and spacemaps)
because these will not be compressed.  The 128K allocations are
especially detrimental to performance on highly fragmented systems,
which may have very few free segments of this size, and may need to load
new metaslabs to satisfy 128K allocations.

We should increase zfs_sync_pass_dont_compress.  In practice on a highly
fragmented system we see a few 5-pass txg's, a tiny number of 6-pass
txg's, and no txg's with more than 6 passes.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-63431
Closes #8892
2019-06-13 13:10:18 -07:00
Alexander Motin ae5c78e0b1 Move write aggregation memory copy out of vq_lock
Memory copy is too heavy operation to do under the congested lock.
Moving it out reduces congestion by many times to almost invisible.
Since the original zio removed from the queue, and the child zio is
not executed yet, I don't see why would the copy need protection.
My guess it just remained like this from the time when lock was not
dropped here, which was added later to fix lock ordering issue.

Multi-threaded sequential write tests with both HDD and SSD pools
with ZVOL block sizes of 4KB, 16KB, 64KB and 128KB all show major
reduction of lock congestion, saving from 15% to 35% of CPU time
and increasing throughput from 10% to 40%.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Alexander Motin <mav@FreeBSD.org>
Closes #8890
2019-06-13 13:08:24 -07:00
Matthew Ahrens d3230d761a looping in metaslab_block_picker impacts performance on fragmented pools
On fragmented pools with high-performance storage, the looping in
metaslab_block_picker() can become the performance-limiting bottleneck.
When looking for a larger block (e.g. a 128K block for the ZIL), we may
search through many free segments (up to hundreds of thousands) to find
one that is large enough to satisfy the allocation. This can take a long
time (up to dozens of ms), and is done while holding the ms_lock, which
other threads may spin waiting for.

When this performance problem is encountered, profiling will show
high CPU time in metaslab_block_picker, as well as in mutex_enter from
various callers.

The problem is very evident on a test system with a sync write workload
with 8K writes to a recordsize=8k filesystem, with 4TB of SSD storage,
84% full and 88% fragmented. It has also been observed on production
systems with 90TB of storage, 76% full and 87% fragmented.

The fix is to change metaslab_df_alloc() to search only up to 16MB from
the previous allocation (of this alignment). After that, we will pick a
segment that is of the exact size requested (or larger). This reduces
the number of iterations to a few hundred on fragmented pools (a ~100x
improvement).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-62324
Closes #8877
2019-06-13 13:06:15 -07:00
Tulsi Jain 9c7da9a95a Restrict filesystem creation if name referred either '.' or '..'
This change restricts filesystem creation if the given name
contains either '.' or '..'

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: TulsiJain <tulsi.jain@delphix.com>
Closes #8842 
Closes #8564
2019-06-13 08:56:15 -07:00
Matthew Ahrens 3475724ea4 ztest: dmu_tx_assign() gets ENOSPC in spa_vdev_remove_thread()
When running zloop, we occasionally see the following crash:

    dmu_tx_assign(tx, TXG_WAIT) == 0 (0x1c == 0)
    ASSERT at ../../module/zfs/vdev_removal.c:1507:spa_vdev_remove_thread()/sbin/ztest(+0x89c3)[0x55faf567b9c3]

The error value 0x1c is ENOSPC.

The transaction used by spa_vdev_remove_thread() should not be able to
fail due to being out of space. i.e. we should not call
dmu_tx_hold_space().  This will allow the removal thread to schedule its
work even when the pool is low on space.  The "slop space" will provide
enough free space to sync out the txg.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-37853
Closes #8889
2019-06-13 08:48:42 -07:00
Tomohiro Kusumi daddbdc7cc Fix lockdep warning on insmod
sysfs_attr_init() is required to make lockdep happy for dynamically
allocated sysfs attributes. This fixed #8868 on Fedora 29 running
kernel-debug.

This requirement was introduced in 2.6.34.
See include/linux/sysfs.h for what it actually does.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8868 
Closes #8884
2019-06-12 17:15:06 -07:00
Matthew Ahrens d9b4bf0665 fat zap should prefetch when iterating
When iterating over a ZAP object, we're almost always certain to iterate
over the entire object. If there are multiple leaf blocks, we can
realize a performance win by issuing reads for all the leaf blocks in
parallel when the iteration begins.

For example, if we have 10,000 snapshots, "zfs destroy -nv
pool/fs@1%9999" can take 30 minutes when the cache is cold. This change
provides a >3x performance improvement, by issuing the reads for all ~64
blocks of each ZAP object in parallel.

Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-58347
Closes #8862
2019-06-12 13:13:09 -07:00
Matthew Ahrens d9cd66e45f Target ARC size can get reduced to arc_c_min
Sometimes the target ARC size is reduced to arc_c_min, which impacts
performance.  We've seen this happen as part of the random_reads
performance regression test, where the ARC size is reduced before the
reads test starts which impacts how long it takes for system to reach
good IOPS performance.

We call arc_reduce_target_size when arc_reap_cb_check() returns TRUE,
and arc_available_memory() is less than arc_c>>arc_shrink_shift.

However, arc_available_memory() could easily be low, even when arc_c is
low, because we can have tons of unused bufs in the abd kmem cache. This
would be especially true just after the DMU requests a bunch of stuff be
evicted from the ARC (e.g. due to "zpool export").

To fix this, the ARC should reduce arc_c by the requested amount, not
all the way down to arc_size (or arc_c_min), which can be very small.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-59431
Closes #8864
2019-06-12 13:06:55 -07:00
bnjf 10269e02f9 Fix typo in vdev_raidz_math.c
Fix typo in vdev_raidz_math.c

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brad Forschinger <github@bnjf.id.au>
Closes #8875 
Closes #8880
2019-06-12 13:03:33 -07:00
Matthew Ahrens 5662fd5794 single-chunk scatter ABDs can be treated as linear
Scatter ABD's are allocated from a number of pages.  In contrast to
linear ABD's, these pages are disjoint in the kernel's virtual address
space, so they can't be accessed as a contiguous buffer.  Therefore
routines that need a linear buffer (e.g. abd_borrow_buf() and friends)
must allocate a separate linear buffer (with zio_buf_alloc()), and copy
the contents of the pages to/from the linear buffer.  This can have a
measurable performance overhead on some workloads.

https://github.com/zfsonlinux/zfs/commit/87c25d567fb7969b44c7d8af63990e
("abd_alloc should use scatter for >1K allocations") increased the use
of scatter ABD's, specifically switching 1.5K through 4K (inclusive)
buffers from linear to scatter.  For workloads that access blocks whose
compressed sizes are in this range, that commit introduced an additional
copy into the read code path.  For example, the
sequential_reads_arc_cached tests in the test suite were reduced by
around 5% (this is doing reads of 8K-logical blocks, compressed to 3K,
which are cached in the ARC).

This commit treats single-chunk scattered buffers as linear buffers,
because they are contiguous in the kernel's virtual address space.

All single-page (4K) ABD's can be represented this way.  Some multi-page
ABD's can also be represented this way, if we were able to allocate a
single "chunk" (higher-order "page" which represents a power-of-2 series
of physically-contiguous pages).  This is often the case for 2-page (8K)
ABD's.

Representing a single-entry scatter ABD as a linear ABD has the
performance advantage of avoiding the copy (and allocation) in
abd_borrow_buf_copy / abd_return_buf_copy.  A performance increase of
around 5% has been observed for ARC-cached reads (of small blocks which
can take advantage of this), fixing the regression introduced by
87c25d567.

Note that this optimization is only possible because all physical memory
is always mapped into the kernel's address space.  This is not the case
for HIGHMEM pages, so the optimization can not be made on 32-bit
systems.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8580
2019-06-11 09:02:31 -07:00
Matthew Ahrens b8738257c2 make zil max block size tunable
We've observed that on some highly fragmented pools, most metaslab
allocations are small (~2-8KB), but there are some large, 128K
allocations.  The large allocations are for ZIL blocks.  If there is a
lot of fragmentation, the large allocations can be hard to satisfy.

The most common impact of this is that we need to check (and thus load)
lots of metaslabs from the ZIL allocation code path, causing sync writes
to wait for metaslabs to load, which can take a second or more.  In the
worst case, we may not be able to satisfy the allocation, in which case
the ZIL will resort to txg_wait_synced() to ensure the change is on
disk.

To provide a workaround for this, this change adds a tunable that can
reduce the size of ZIL blocks.

External-issue: DLPX-61719
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #8865
2019-06-10 11:48:42 -07:00
Alexander Motin 5a902f5aaa Fix comparison signedness in arc_is_overflowing()
When ARC size is very small, aggsum_lower_bound(&arc_size) may return
negative values, that due to unsigned comparison caused delays, waiting
for arc_adjust() to "fix" it by calling aggsum_value(&arc_size).  Use
of signed comparison there fixes the problem.

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Closes #8873
2019-06-10 09:52:25 -07:00
Tom Caputi c08c30ed13 Fix incorrect error message for raw receive
This patch fixes an incorrect error message that comes up when
doing a non-forcing, raw, incremental receive into a dataset
that has a newer snapshot than the "from" snapshot. In this
case, the current code prints a confusing message about an IVset
guid mismatch.

This functionality is supported by non-raw receives as an
undocumented feature, but was never supported by the raw receive
code. If this is desired in the future, we can probably figure
out a way to make it work.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Issue #8758 
Closes #8863
2019-06-10 09:45:08 -07:00
Richard Elling cfc16f8ba8 Improve ZTS block_device_wait debugging
The udevadm settle timeout can be 120 or 180 seconds by default
for some distributions. If a long delay is experienced, it could
be due to some strangeness in a malfunctioning device that isn't
related to the devices under test. To help debug this condition,
a notice is given if settle takes too long.

Arguments can now be passed to block_device_wait. The expected
arguments are block device pathnames.

Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #8839
2019-06-10 09:21:19 -07:00
Richard Elling 4cb1b541d4 Block_device_wait does not return an error code
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #8839
2019-06-10 09:21:08 -07:00
Richard Elling bef70afaa6 Remove redundant redundant remove
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #8839
2019-06-10 09:20:42 -07:00
Richard Elling 2ff615b2bf Fix logic error in setpartition function
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Elling <Richard.Elling@RichardElling.com>
Closes #8839
2019-06-10 09:19:32 -07:00
Eli Schwartz 215e4fe4d2 arc_summary: prefer python3 version and install when there is no python
This matches the behavior of other python scripts, such as arcstat and
dbufstat, which are always installed but whose install-exec-hook actions
will simply touch up the shebang if a python interpreter was configured
*and* that interpreter is a python2 interpreter.

Fixes installation in a minimal build chroot without python available.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@freqlabs.com>
Signed-off-by: Eli Schwartz <eschwartz@archlinux.org>
Closes #8851
2019-06-10 09:08:53 -07:00
Samuel VERSCHELDE 01d1e88b1a Fix %post and %postun generation in kmodtool
During zfs-kmod RPM build, $(uname -r) gets unintentionally evaluated on
the build host, once and for all. It should be evaluated during the
execution of the scriptlets on the installation host. Escaping the $
character avoids evaluating it during build.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Samuel Verschelde <stormi-xcp@ylix.fr>
Closes #8866
2019-06-10 09:06:58 -07:00
Paul Dagnelie 893a6d62c1 Allow metaslab to be unloaded even when not freed from
On large systems, the memory used by loaded metaslabs can become
a concern. While range trees are a fairly efficient data structure, 
on heavily fragmented pools they can still consume a significant 
amount of memory. This problem is amplified when we fail to unload 
metaslabs that we aren't using. Currently, we only unload a metaslab 
during metaslab_sync_done; in order for that function to be called 
on a given metaslab in a given txg, we have to have dirtied that 
metaslab in that txg. If the dirtying was the result of an allocation, 
we wouldn't be unloading it (since it wouldn't be 8 txgs since it 
was selected), so in effect we only unload a metaslab during txgs 
where it's being freed from.

We move the unload logic from sync_done to a new function, and 
call that function on all metaslabs in a given vdev during 
vdev_sync_done().

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #8837
2019-06-06 19:10:43 -07:00
Jorgen Lundman 876d76be34 Avoid updating zfs_gitrev.h when rev is unchanged
Build process would always re-compile spa_history.c due to touching
zfs_gitrev.h - avoid if no change in gitrev.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes #8860
2019-06-06 19:01:41 -07:00
Tom Caputi fd7a657bff Reinstate raw receive check when truncating
This patch re-adds a check that was removed in 369aa50. The check
confirms that a raw receive is not occuring before truncating an
object's dn_maxblkid. At the time, it was believed that all cases
that would hit this code path would be handled in other places,
but that was not the case.

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8852 
Closes #8857
2019-06-06 13:47:33 -07:00
Allan Jude b7109a413c l2arc_apply_transforms: Fix typo in comment
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #8822
2019-06-06 13:14:48 -07:00
Serapheim Dimitropoulos cb020f0d86 Reduced IOPS when all vdevs are in the zfs_mg_fragmentation_threshold
Historically while doing performance testing we've noticed that IOPS
can be significantly reduced when all vdevs in the pool are hitting
the zfs_mg_fragmentation_threshold percentage. Specifically in a
hypothetical pool with two vdevs, what can happen is the following:
Vdev A would go above that threshold and only vdev B would be used.
Then vdev B would pass that threshold but vdev A would go below it
(we've been freeing from A to allocate to B). The allocations would
go back and forth utilizing one vdev at a time with IOPS taking a hit.

Empirically, we've seen that our vdev selection for allocations is
good enough that fragmentation increases uniformly across all vdevs
the majority of the time. Thus we set the threshold percentage high
enough to avoid hitting the speed bump on pools that are being pushed
to the edge. We effectively disable its effect in the majority of the
cases but we don't remove (at least for now) just in case we hit any
weird behavior in the future.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #8859
2019-06-06 13:08:41 -07:00
Garrett Fields 627f5117a3 If $ZFS_BOOTFS contains guid, replace the guid portion with $pool
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Garrett Fields <ghfields@gmail.com>
Closes #8356
2019-06-06 13:04:35 -07:00
Tom Caputi 3ce85b5e60 Fix integer overflow of ZTOI(zp)->i_generation
The ZFS on-disk format stores each inode's generation ID as a 64
bit number on disk and in-core. However, the Linux kernel's inode
is only a 32 bit number. In most places, the code handles this
correctly, but the cast is missing in zfs_rezget(). For many pools,
this isn't an issue since the generation ID is computed as the
current txg when the inode is created and many pools don't have
more than 2^32 txgs.

For the pools that have more txgs, this issue causes any inode with
a high enough generation number to report IO errors after a call to
"zfs rollback" while holding the file or directory open. This patch
simply adds the missing cast.

Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8858
2019-06-06 12:59:39 -07:00
Don Brady 8e91c5ba6a hkdf_test binary should only have one icp instance
The build for test binary hkdf_test was linking both against libicp 
and libzpool. This results in two instances of libicp inside the 
binary but the call to icp_init() only initializes one of them!

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #8850
2019-06-05 14:21:25 -07:00
Tomohiro Kusumi 1a6889947e Drop objid argument in zfs_znode_alloc() (sync with OpenZFS)
Since zfs_znode_alloc() already takes dmu_buf_t*, taking another
uint64_t argument for objid is redundant. inode's ->i_ino does and
needs to match znode's ->z_id.

zfs_znode_alloc() in FreeBSD and illumos doesn't have this argument
since vnode doesn't have vnode# in VFS (hence ->z_id exists).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8841
2019-06-05 14:18:46 -07:00
Peter Wirdemo 2ecc2020ef Fixed a small typo in man/man1/raidz_test.1
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Peter Wirdemo <peter.wirdemo@gmail.com>
Closes #8855
2019-06-05 09:09:17 -07:00
Torsten Wörtwein 1ec7eda167 Allow TRIM_UNUSED_KSYM when build as a builtin-module
If ZFS is built with enable_linux_builtin, it seems to be possible
to compile the kernel with TRIM_UNUSED_KSYM.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Torsten Wörtwein <twoertwein@gmail.com>
Closes #8820
2019-06-04 18:12:16 -07:00
Ryan Moeller 1a132f0638 Make Python detection optional and more portable
Previously, --without-python would cause ./configure to fail. Now it is
able to proceed, and the Python scripts will not be built.

Use portable parameter expansion matching instead of nonstandard
substring matching to detect the Python version.  This test is
duplicated in several places, so define a function for it.

Don't assume the full path to binaries, since different platforms do
install things in different places.  Use AC_CHECK_PROGS instead.

When building without Python, also build without pyzfs.

Sponsored by: iXsystems, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Eli Schwartz <eschwartz93@gmail.com>
Signed-off-by: Ryan Moeller <ryan@freqlabs.com>
Closes #8809 
Closes #8731
2019-06-04 18:05:46 -07:00
DeHackEd df24bcf00a Wait in 'S' state when send/recv pipe is blocking
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #8733 
Closes #8752
2019-06-03 20:54:43 -07:00
TulsiJain a3c98d5728 Make zfs_async_block_max_blocks handle zero correctly
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: TulsiJain <tulsi.jain@delphix.com>
Closes #8829
Closes #8289
2019-06-03 09:40:48 -07:00
Brian Behlendorf 2531ce3720 Revert "Report holes when there are only metadata changes"
This reverts commit ec4f9b8f30 which introduced a narrow race which
can lead to lseek(, SEEK_DATA) incorrectly returning ENXIO.  Resolve
the issue by revering this change to restore the previous behavior
which depends solely on checking the dirty list.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8816 
Closes #8834
2019-05-30 17:13:18 -07:00
Tomohiro Kusumi 1608985a41 Add link count test for root inode
Add tests for
97aa3ba44("Fix link count of root inode when snapdir is visible")
as suggested in #8727.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8732
2019-05-29 16:26:46 -07:00
Tomohiro Kusumi fe0c9f409a Remove vn_set_fs_pwd()/vn_set_pwd() (no need to be at / during insmod)
Per suggestion from @behlendorf in #8777, remove vn_set_fs_pwd() and
vn_set_pwd() which are only used in zfs_ioctl.c:_init() while loading
zfs.ko.

The rest of initialization functions being called here after cwd set
to / don't depend on cwd of the process except for spa_config_load().
spa_config_load() uses a relative path ".//etc/zfs/zpool.cache" when
`rootdir` is non-NULL, which is "/etc/zfs/zpool.cache" given cwd is /,
so just unconditionally use the absolute path without "./", so that
`vn_set_pwd("/")` as well as the entire functions can be removed.
This is also what FreeBSD does.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8826
2019-05-29 16:18:14 -07:00
Brian Behlendorf 1e724f4f34 Exclude log device ashift from normal class
When opening a log device during import its allocation bias will
not yet have been set by vdev_load().  This results in the log
device's ashift being incorrectly applied to the maximum ashift
of the vdevs in the normal class.  Which in turn prevents the
removal of any top-level devices due to the ashift check in the
spa_vdev_remove_top_check() function.

This issue is resolved by including vdev_islog in the check since
it will be set correctly during vdev_open().

Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8735
2019-05-29 11:35:50 -07:00
madz ec4afd27f1 Fix integer overflow in get_next_chunk()
dn->dn_datablksz type is uint32_t and need to be casted to uint64_t
to avoid an overflow when the record size is greater than 4 MiB.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olivier Mazouffre <olivier.mazouffre@ims-bordeaux.fr>
Closes #8778 
Closes #8797
2019-05-29 10:17:25 -07:00
Josh Soref 46df7e6cc9 grammar: it is / plural agreement
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
Closes #8818
2019-05-28 15:58:32 -07:00
Tomohiro Kusumi 5691b86ce5 Refactor parent dataset handling in libzfs zfs_rename()
For recursive renaming, simplify the code by moving `zhrp` and
`parentname` to inner scope. `zhrp` is only used to test existence
of a parent dataset for recursive dataset dir scan since ba6a24026c.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8815
2019-05-28 15:31:38 -07:00
loli10K ab1a9705f8 Double-free of encryption wrapping key due to invalid pool properties
This commits fixes a double-free in zfs_ioc_pool_create() triggered by
specifying an unsupported combination of properties when creating a pool
with encryption enabled.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8791
2019-05-28 15:19:50 -07:00
Ryan Moeller 227d379385 Update comments to match code
s/get_vdev_spec/make_root_vdev

The former doesn't exist anymore.

Sponsored by: iXsystems, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Ryan Moeller <ryan@freqlabs.com>
Closes #8759
2019-05-28 15:18:31 -07:00
Stoiko Ivanov 10e1a0112e tests: fix cosmetic permission issues during make install
files in dist_*_SCRIPTS get installed with 0755, those in dist_*_DATA
with 0644. This commit moves all .kshlib, .shlib and .cfg files in the
testsuite to dist_pkgdata_DATA, and removes the shebang from
zpool_import.kshlib.

This ensures that the files are installed with appropriate permissions
and silences some warnings from lintian

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #8803
2019-05-28 15:07:24 -07:00
Stoiko Ivanov 3ba8cd6d0e test-runner.py: change shebang to python3
In commit 6e72a5b9b6 python scripts which
work with python2 and python3 changed the shebang from /usr/bin/python
to /usr/bin/python3. This gets adapted by the build-system on systems
which don't provide python3.
This commit changes test-runner.py to also use /usr/bin/python3,
enabling the change during buildtime and fixing a minor lintian issue
for those Debian packages, which depend on a specific python version
(python3/python2).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #8803
2019-05-28 15:06:07 -07:00
loli10K 0869b74a1e Endless loop in zpool_do_remove() on platforms with unsigned char
On systems where "char" is an unsigned type the value returned by
getopt() will never be negative (-1), leading to an endless loop:
this issue prevents both 'zpool remove' and 'zstreamdump' for
working on some systems.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8789
2019-05-28 11:14:58 -07:00
Tomohiro Kusumi 841a7a98fc Update descriptions for vnops
These descriptions are not uptodate with the code.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8767
2019-05-25 14:29:10 -07:00
Tom Caputi 3b61ca3e57 Fix embedded bp accounting in count_block()
Currently, count_block() does not correctly account for the
possibility that the bp that is passed to it could be embedded.
These blocks shouldn't be counted since the work of scanning
these blocks in already handled when the containing block is
scanned. This patch simply resolves this issue by returning
early in this case.

Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: Bill Sommerfeld <sommerfeld@alum.mit.edu>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8800 
Closes #8766
2019-05-25 13:52:23 -07:00
Tom Caputi 8e3c3ed1b3 Disable parallel processing for 'zfs mount -l'
Currently, 'zfs mount -a' will always attempt to parallelize
work related to mounting as best it can. Unfortunately, when
the user passes the '-l' option to load keys, this causes
all threads to prompt the user for their keys at once,
resulting in a confusing and racy user experience. This patch
simply disables parallel mounting when using the '-l' flag.

Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8762 
Closes #8811
2019-05-25 13:46:32 -07:00
Tomohiro Kusumi bfd5a709e7 Linux 5.2 compat: Directly call wait_on_page_bit()
wait_on_page_writeback() was made GPL only in torvalds/linux@19343b5bdd.

Directly call wait_on_page_bit() without using wait_on_page_writeback()
interface, given zfs_putpage() is the only caller for now.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8794
2019-05-25 13:42:09 -07:00
Tomohiro Kusumi 36c110f994 Linux 5.2 compat: Fix config/kernel-shrink.m4 test failure
"whether ->count_objects callback exists" test failed with
"error: error" message for using an incomplete function shrinker_cb().

This is caused by torvalds/linux@83da1bed86. It's configurable,
but we would want to be able to compile with default kbuild setting.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8776
2019-05-25 13:40:46 -07:00
Tomohiro Kusumi 4bb17ebfe2 Linux 5.2 compat: Remove config/kernel-set-fs-pwd.m4
This failed on 5.2-rc1 with "error: unknown" message, for set_fs_pwd()
not being visible in both const and non-const tests.

This is caused by torvalds/linux@83da1bed86. It's configurable,
but we would want to be able to compile with default kbuild setting.

set_fs_pwd() has never been exported with exception of some distro
kernels, and set_fs_pwd() wasn't used in ZoL to begin with. The test
result was used for a spl function vn_set_fs_pwd().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8777
2019-05-25 13:28:55 -07:00
Tomohiro Kusumi c3e5907fdb Drop local definition of MOUNT_BUSY
It's accessible via <sys/mntent.h>.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #8765
2019-05-24 16:43:23 -07:00
loli10K 43977948aa zpool: status -t is not documented in help message
This commit adds the undocumented "-t" option to zpool(8) help message.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8782
2019-05-24 14:16:00 -07:00
loli10K fe609530f2 zfs-tests: fix warnings when packaging some .shlib files
This change prevents the following warning when packaging some zfs-tests
files:

   *** WARNING: ./usr/src/zfs-0.8.0/tests/zfs-tests/include/zpool_script.shlib
   is executable but has empty or no shebang, removing executable bit

Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8787
2019-05-24 14:12:14 -07:00
loli10K d28b492ab3 VERIFY3P() message is missing a space character
This commit just reintroduces a [space] character inadvertently removed
in a887d653.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8786
2019-05-24 14:06:53 -07:00
loli10K 2696e86f2b zfs-tests: verify zfs(8) and zpool(8) help message is under 80 columns
This commit updates the ZFS Test Suite to detect incorrect wrapping of
both zfs(8) and zpool(8) help message

Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8785
2019-05-24 14:04:08 -07:00
loli10K b868525bf7 zfs: don't pretty-print objsetid property
The objsetid property, while being stored as a number, is a dataset
identifier and should not be pretty-printed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8784
2019-05-24 13:58:12 -07:00
loli10K 62b2152eca zfs: missing newline character in zfs_do_channel_program() error message
This commit simply adds a missing newline ("\n") character to the error
message printed by the zfs command when the provided pool parameter
can't be found.

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8783
2019-05-24 13:54:36 -07:00
siv0 0ff5cd0772 Fix ksh-path for random_readwrite_fixed.ksh
The test in zfs-tests/tests/perf/regression/random_readwrite_fixed.ksh
is the only file to use /usr/bin/ksh in the shebang.
Change it to /bin/ksh for consistency.

Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Closes #8779
2019-05-24 13:17:50 -07:00
Tomohiro Kusumi 8708fd888f Linux 2.6.39 compat: Test if kstrtoul() exists
kstrtoul() exists only after torvalds/linux@33ee3b2e2e in 2.6.39.
Use strict_strtoul() if kstrtoul() doesn't exist.
Note that strict_strtoul() has existed as an alias for kstrtoul()
for a while, but removed in torvalds/linux@3db2e9cdc0.

It looks like RHEL6 (2.6.32 based) has backported kstrtoul(),
and this caused build CI to pass compilation test.
It should fail on vanilla < 2.6.39 kernels or distro kernels without
backport as reported in #8760.

--
 # grep "kstrtoul(" /lib/modules/2.6.32-754.12.1.el6.x86_64/build/ \
 include/linux/kernel.h >/dev/null
 # echo $?
 0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8760 
Closes #8761
2019-05-24 12:26:18 -07:00
loli10K 2e8c315fc6 Device removal panics on 32-bit systems
The issue is caused by an incorrect usage of the sizeof() operator
in vdev_obsolete_sm_object(): on 64-bit systems this is not an issue
since both "uint64_t" and "uint64_t*" are 8 bytes in size. However on
32-bit systems pointers are 4 bytes long which is not supported by
zap_lookup_impl(). Trying to remove a top-level vdev on a 32-bit system
will cause the following failure:

VERIFY3(0 == vdev_obsolete_sm_object(vd, &obsolete_sm_object)) failed (0 == 22)
PANIC at vdev_indirect.c:833:vdev_indirect_sync_obsolete()
Showing stack for process 1315
CPU: 6 PID: 1315 Comm: txg_sync Tainted: P           OE   4.4.69+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
 c1abc6e7 0ae10898 00000286 d4ac3bc0 c14397bc da4cd7d8 d4ac3bf0 d4ac3bd0
 d790e7ce d7911cc1 00000523 d4ac3d00 d790e7d7 d7911ce4 da4cd7d8 00000341
 da4ce664 da4cd8c0 da33fa6e 49524556 28335946 3d3d2030 65647620 626f5f76
Call Trace:
 [<>] dump_stack+0x58/0x7c
 [<>] spl_dumpstack+0x23/0x27 [spl]
 [<>] spl_panic.cold.0+0x5/0x41 [spl]
 [<>] ? dbuf_rele+0x3e/0x90 [zfs]
 [<>] ? zap_lookup_norm+0xbe/0xe0 [zfs]
 [<>] ? zap_lookup+0x57/0x70 [zfs]
 [<>] ? vdev_obsolete_sm_object+0x102/0x12b [zfs]
 [<>] vdev_indirect_sync_obsolete+0x3e1/0x64d [zfs]
 [<>] ? txg_verify+0x1d/0x160 [zfs]
 [<>] ? dmu_tx_create_dd+0x80/0xc0 [zfs]
 [<>] vdev_sync+0xbf/0x550 [zfs]
 [<>] ? mutex_lock+0x10/0x30
 [<>] ? txg_list_remove+0x9f/0x1a0 [zfs]
 [<>] ? zap_contains+0x4d/0x70 [zfs]
 [<>] spa_sync+0x9f1/0x1b10 [zfs]
 ...
 [<>] ? kthread_stop+0x110/0x110

This commit simply corrects the "integer_size" parameter used to lookup
the vdev's ZAP object.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8790
2019-05-24 12:17:52 -07:00
loli10K e55db32ad0 zpool: trim -p is not a valid option
This commit removes the documented but not handled "-p" option from
zpool(8) help message.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8781
2019-05-24 09:40:46 -07:00
loli10K 75c09c5060 Fix coverity defects: CID 186143
CID 186143: Memory - illegal accesses (USE_AFTER_FREE)

This patch fixes an use-after-free in spa_import_progress_destroy()
moving the kmem_free() call at the end of the function.

Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #8788
2019-05-23 19:17:00 -07:00
Igor K 2d0077da6d Rename reservation tests from *.sh to *.ksh
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Igor Kozhukhov <igor@dilos.org>
Closes #8729
2019-05-23 15:42:03 -07:00
Rafael Kitover 8b8b44d06f kernel timer API rework
In `config/kernel-timer.m4` refactor slightly to check more generally
for the new `timer_setup()` APIs, but also check the callback signature
because some kernels (notably 4.14) have the new `timer_setup()` API but
use the old callback signature. Also add a check for a `flags` member in
`struct timer_list`, which was added in 4.1-rc8.

Add compatibility shims to `include/spl/sys/timer.h` to allow using the
new timer APIs with the only two caveats being that the callback
argument type must be declared as `spl_timer_list_t` and an explicit
assignment is required to get the timer variable for the `timer_of()`
macro. So the callback would look like this:

```c
__cv_wakeup(spl_timer_list_t t)
{
        struct timer_list *tmr = (struct timer_list *)t;
	struct thing *parent = from_timer(parent, tmr,
		parent_timer_field);
	... /* do stuff with parent */
```

Make some minor changes to `spl-condvar.c` and `spl-taskq.c` to use the
new timer APIs instead of conditional code.

Reviewed-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rafael Kitover <rkitover@gmail.com>
Closes #8647
2019-05-23 14:40:28 -07:00
3925 changed files with 371895 additions and 125941 deletions
+21
View File
@@ -0,0 +1,21 @@
env:
CIRRUS_CLONE_DEPTH: 1
ARCH: amd64
build_task:
matrix:
freebsd_instance:
image_family: freebsd-12-4
freebsd_instance:
image_family: freebsd-13-2
freebsd_instance:
image_family: freebsd-14-0-snap
prepare_script:
- pkg install -y autoconf automake libtool gettext-runtime gmake ksh93 py39-packaging py39-cffi py39-sysctl
configure_script:
- env MAKE=gmake ./autogen.sh
- env MAKE=gmake ./configure --with-config="user" --with-python=3.9
build_script:
- gmake -j `sysctl -n kern.smp.cpus`
install_script:
- gmake install
+10
View File
@@ -0,0 +1,10 @@
root = true
[*]
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
[*.{c,h}]
tab_width = 8
indent_style = tab
+83 -94
View File
@@ -1,10 +1,12 @@
# Contributing to ZFS on Linux
<p align="center"><img src="http://zfsonlinux.org/images/zfs-linux.png"/></p>
# Contributing to OpenZFS
<p align="center">
<img alt="OpenZFS Logo"
src="https://openzfs.github.io/openzfs-docs/_static/img/logo/480px-Open-ZFS-Secondary-Logo-Colour-halfsize.png"/>
</p>
*First of all, thank you for taking the time to contribute!*
By using the following guidelines, you can help us make ZFS on Linux even
better.
By using the following guidelines, you can help us make OpenZFS even better.
## Table Of Contents
[What should I know before I get
@@ -32,17 +34,17 @@ started?](#what-should-i-know-before-i-get-started)
Helpful resources
* [ZFS on Linux wiki](https://github.com/zfsonlinux/zfs/wiki)
* [OpenZFS Documentation](http://open-zfs.org/wiki/Developer_resources)
* [Git and GitHub for beginners](https://github.com/zfsonlinux/zfs/wiki/Git-and-GitHub-for-beginners)
* [OpenZFS Documentation](https://openzfs.github.io/openzfs-docs/)
* [OpenZFS Developer Resources](http://open-zfs.org/wiki/Developer_resources)
* [Git and GitHub for beginners](https://openzfs.github.io/openzfs-docs/Developer%20Resources/Git%20and%20GitHub%20for%20beginners.html)
## What should I know before I get started?
### Get ZFS
You can build zfs packages by following [these
instructions](https://github.com/zfsonlinux/zfs/wiki/Building-ZFS),
instructions](https://openzfs.github.io/openzfs-docs/Developer%20Resources/Building%20ZFS.html),
or install stable packages from [your distribution's
repository](https://github.com/zfsonlinux/zfs/wiki/Getting-Started).
repository](https://openzfs.github.io/openzfs-docs/Getting%20Started/index.html).
### Debug ZFS
A variety of methods and tools are available to aid ZFS developers.
@@ -51,29 +53,30 @@ configure option should be set. This will enable additional correctness
checks and all the ASSERTs to help quickly catch potential issues.
In addition, there are numerous utilities and debugging files which
provide visibility in to the inner workings of ZFS. The most useful
of these tools are discussed in detail on the [debugging ZFS wiki
page](https://github.com/zfsonlinux/zfs/wiki/Debugging).
provide visibility into the inner workings of ZFS. The most useful
of these tools are discussed in detail on the [Troubleshooting
page](https://openzfs.github.io/openzfs-docs/Basic%20Concepts/Troubleshooting.html).
### Where can I ask for help?
[The zfs-discuss mailing list or IRC](http://list.zfsonlinux.org)
are the best places to ask for help. Please do not file support requests
on the GitHub issue tracker.
The [zfs-discuss mailing
list](https://openzfs.github.io/openzfs-docs/Project%20and%20Community/Mailing%20Lists.html)
or IRC are the best places to ask for help. Please do not file
support requests on the GitHub issue tracker.
## How Can I Contribute?
### Reporting Bugs
*Please* contact us via the [zfs-discuss mailing
list or IRC](http://list.zfsonlinux.org) if you aren't
certain that you are experiencing a bug.
list](https://openzfs.github.io/openzfs-docs/Project%20and%20Community/Mailing%20Lists.html)
or IRC if you aren't certain that you are experiencing a bug.
If you run into an issue, please search our [issue
tracker](https://github.com/zfsonlinux/zfs/issues) *first* to ensure the
tracker](https://github.com/openzfs/zfs/issues) *first* to ensure the
issue hasn't been reported before. Open a new issue only if you haven't
found anything similar to your issue.
You can open a new issue and search existing issues using the public [issue
tracker](https://github.com/zfsonlinux/zfs/issues).
tracker](https://github.com/openzfs/zfs/issues).
#### When opening a new issue, please include the following information at the top of the issue:
* What distribution (with version) you are using.
@@ -105,13 +108,13 @@ information like:
* Stack traces which may be logged to `dmesg`.
### Suggesting Enhancements
ZFS on Linux is a widely deployed production filesystem which is under
active development. The team's primary focus is on fixing known issues,
improving performance, and adding compelling new features.
OpenZFS is a widely deployed production filesystem which is under active
development. The team's primary focus is on fixing known issues, improving
performance, and adding compelling new features.
You can view the list of proposed features
by filtering the issue tracker by the ["Feature"
label](https://github.com/zfsonlinux/zfs/issues?q=is%3Aopen+is%3Aissue+label%3AFeature).
by filtering the issue tracker by the ["Type: Feature"
label](https://github.com/openzfs/zfs/issues?q=is%3Aopen+is%3Aissue+label%3A%22Type%3A+Feature%22).
If you have an idea for a feature first check this list. If your idea already
appears then add a +1 to the top most comment, this helps us gauge interest
in that feature.
@@ -120,8 +123,11 @@ Otherwise, open a new issue and describe your proposed feature. Why is this
feature needed? What problem does it solve?
### Pull Requests
* All pull requests must be based on the current master branch and apply
without conflicts.
#### General
* All pull requests, except backports and releases, must be based on the current master branch
and should apply without conflicts.
* Please attempt to limit pull requests to a single commit which resolves
one specific issue.
* Make sure your commit messages are in the correct format. See the
@@ -133,16 +139,28 @@ logically independent patches which build on each other. This makes large
changes easier to review and approve which speeds up the merging process.
* Try to keep pull requests simple. Simple code with comments is much easier
to review and approve.
* All proposed changes must be approved by an OpenZFS organization member.
* If you have an idea you'd like to discuss or which requires additional testing, consider opening it as a draft pull request.
Once everything is in good shape and the details have been worked out you can remove its draft status.
Any required reviews can then be finalized and the pull request merged.
#### Tests and Benchmarks
* Every pull request will by tested by the buildbot on multiple platforms by running the [zfs-tests.sh and zloop.sh](
https://openzfs.github.io/openzfs-docs/Developer%20Resources/Building%20ZFS.html#running-zloop-sh-and-zfs-tests-sh) test suites.
* To verify your changes conform to the [style guidelines](
https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#style-guides
), please run `make checkstyle` and resolve any warnings.
* Static code analysis of each pull request is performed by the buildbot; run `make lint` to check your changes.
* Test cases should be provided when appropriate.
This includes making sure new features have adequate code coverage.
* If your pull request improves performance, please include some benchmarks.
* The pull request must pass all required [ZFS
Buildbot](http://build.zfsonlinux.org/) builders before
being accepted. If you are experiencing intermittent TEST
builder failures, you may be experiencing a [test suite
issue](https://github.com/zfsonlinux/zfs/issues?q=is%3Aissue+is%3Aopen+label%3A%22Test+Suite%22).
There are also various [buildbot options](https://github.com/zfsonlinux/zfs/wiki/Buildbot-Options)
issue](https://github.com/openzfs/zfs/issues?q=is%3Aissue+is%3Aopen+label%3A%22Type%3A+Test+Suite%22).
There are also various [buildbot options](https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.html)
to control how changes are tested.
* All proposed changes must be approved by a ZFS on Linux organization member.
### Testing
All help is appreciated! If you're in a position to run the latest code
@@ -152,16 +170,41 @@ range of realistic workloads, configurations and architectures we're better
able quickly identify and resolve potential issues.
Users can also run the [ZFS Test
Suite](https://github.com/zfsonlinux/zfs/tree/master/tests) on their systems
Suite](https://github.com/openzfs/zfs/tree/master/tests) on their systems
to verify ZFS is behaving as intended.
## Style Guides
### Repository Structure
OpenZFS uses a standardised branching structure.
- The "development and main branch", is the branch all development should be based on.
- "Release branches" contain the latest released code for said version.
- "Staging branches" contain selected commits prior to being released.
**Branch Names:**
- Development and Main branch: `master`
- Release branches: `zfs-$VERSION-release`
- Staging branches: `zfs-$VERSION-staging`
`$VERSION` should be replaced with the `major.minor` version number.
_(This is the version number without the `.patch` version at the end)_
### Coding Conventions
We currently use [C Style and Coding Standards for
SunOS](http://www.cis.upenn.edu/%7Elee/06cse480/data/cstyle.ms.pdf) as our
coding convention.
This repository has an `.editorconfig` file. If your editor [supports
editorconfig](https://editorconfig.org/#download), it will
automatically respect most of this project's whitespace preferences.
Additionally, Git can help warn on whitespace problems as well:
```
git config --local core.whitespace trailing-space,space-before-tab,indent-with-non-tab,-tab-in-indent
```
### Commit Message Formats
#### New Changes
Commit messages for new changes must meet the following guidelines:
@@ -187,70 +230,6 @@ attempting to solve.
Signed-off-by: Contributor <contributor@email.com>
```
#### OpenZFS Patch Ports
If you are porting OpenZFS patches, the commit message must meet
the following guidelines:
* The first line must be the summary line from the most important OpenZFS commit being ported.
It must begin with `OpenZFS dddd, dddd - ` where `dddd` are OpenZFS issue numbers.
* Provides a `Authored by:` line to attribute each patch for each original author.
* Provides the `Reviewed by:` and `Approved by:` lines from each original
OpenZFS commit.
* Provides a `Ported-by:` line with the developer's name followed by
their email for each OpenZFS commit.
* Provides a `OpenZFS-issue:` line with link for each original illumos
issue.
* Provides a `OpenZFS-commit:` line with link for each original OpenZFS commit.
* If necessary, provide some porting notes to describe any deviations from
the original OpenZFS commits.
An example OpenZFS patch port commit message for a single patch is provided
below.
```
OpenZFS 1234 - Summary from the original OpenZFS commit
Authored by: Original Author <original@email.com>
Reviewed by: Reviewer One <reviewer1@email.com>
Reviewed by: Reviewer Two <reviewer2@email.com>
Approved by: Approver One <approver1@email.com>
Ported-by: ZFS Contributor <contributor@email.com>
Provide some porting notes here if necessary.
OpenZFS-issue: https://www.illumos.org/issues/1234
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/abcd1234
```
If necessary, multiple OpenZFS patches can be combined in a single port.
This is useful when you are porting a new patch and its subsequent bug
fixes. An example commit message is provided below.
```
OpenZFS 1234, 5678 - Summary of most important OpenZFS commit
1234 Summary from original OpenZFS commit for 1234
Authored by: Original Author <original@email.com>
Reviewed by: Reviewer Two <reviewer2@email.com>
Approved by: Approver One <approver1@email.com>
Ported-by: ZFS Contributor <contributor@email.com>
Provide some porting notes here for 1234 if necessary.
OpenZFS-issue: https://www.illumos.org/issues/1234
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/abcd1234
5678 Summary from original OpenZFS commit for 5678
Authored by: Original Author2 <original2@email.com>
Reviewed by: Reviewer One <reviewer1@email.com>
Approved by: Approver Two <approver2@email.com>
Ported-by: ZFS Contributor <contributor@email.com>
Provide some porting notes here for 5678 if necessary.
OpenZFS-issue: https://www.illumos.org/issues/5678
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/efgh5678
```
#### Coverity Defect Fixes
If you are submitting a fix to a
[Coverity defect](https://scan.coverity.com/projects/zfsonlinux-zfs),
@@ -290,3 +269,13 @@ Git can append the `Signed-off-by` line to your commit messages. Simply
provide the `-s` or `--signoff` option when performing a `git commit`.
For more information about writing commit messages, visit [How to Write
a Git Commit Message](https://chris.beams.io/posts/git-commit/).
#### Co-authored By
If someone else had part in your pull request, please add the following to the commit:
`Co-authored-by: Name <gitregistered@email.address>`
This is useful if their authorship was lost during squashing, rebasing, etc.,
but may be used in any situation where there are co-authors.
The email address used here should be the same as on the GitHub profile of said user.
If said user does not have their email address public, please use the following instead:
`Co-authored-by: Name <[username]@users.noreply.github.com>`
-48
View File
@@ -1,48 +0,0 @@
<!-- Please fill out the following template, which will help other contributors address your issue. -->
<!--
Thank you for reporting an issue.
*IMPORTANT* - Please search our issue tracker *before* making a new issue.
If you cannot find a similar issue, then create a new issue.
https://github.com/zfsonlinux/zfs/issues
*IMPORTANT* - This issue tracker is for *bugs* and *issues* only.
Please search the wiki and the mailing list archives before asking
questions on the mailing list.
https://github.com/zfsonlinux/zfs/wiki/Mailing-Lists
Please fill in as much of the template as possible.
-->
### System information
<!-- add version after "|" character -->
Type | Version/Name
--- | ---
Distribution Name |
Distribution Version |
Linux Kernel |
Architecture |
ZFS Version |
SPL Version |
<!--
Commands to find ZFS/SPL versions:
modinfo zfs | grep -iw version
modinfo spl | grep -iw version
-->
### Describe the problem you're observing
### Describe how to reproduce the problem
### Include any warning/errors/backtraces from the system logs
<!--
*IMPORTANT* - Please mark logs and text output from terminal commands
or else Github will not display them correctly.
An example is provided below.
Example:
```
this is an example how log text should be marked (wrap it with ```)
```
-->
+55
View File
@@ -0,0 +1,55 @@
---
name: Bug report
about: Create a report to help us improve OpenZFS
title: ''
labels: 'Type: Defect'
assignees: ''
---
<!-- Please fill out the following template, which will help other contributors address your issue. -->
<!--
Thank you for reporting an issue.
*IMPORTANT* - Please check our issue tracker before opening a new issue.
Additional valuable information can be found in the OpenZFS documentation
and mailing list archives.
Please fill in as much of the template as possible.
-->
### System information
<!-- add version after "|" character -->
Type | Version/Name
--- | ---
Distribution Name |
Distribution Version |
Kernel Version |
Architecture |
OpenZFS Version |
<!--
Command to find OpenZFS version:
zfs version
Commands to find kernel version:
uname -r # Linux
freebsd-version -r # FreeBSD
-->
### Describe the problem you're observing
### Describe how to reproduce the problem
### Include any warning/errors/backtraces from the system logs
<!--
*IMPORTANT* - Please mark logs and text output from terminal commands
or else Github will not display them correctly.
An example is provided below.
Example:
```
this is an example how log text should be marked (wrap it with ```)
```
-->
+14
View File
@@ -0,0 +1,14 @@
blank_issues_enabled: false
contact_links:
- name: OpenZFS Questions
url: https://github.com/openzfs/zfs/discussions/new
about: Ask the community for help
- name: OpenZFS Community Support Mailing list (Linux)
url: https://zfsonlinux.topicbox.com/groups/zfs-discuss
about: Get community support for OpenZFS on Linux
- name: FreeBSD Community Support Mailing list
url: https://lists.freebsd.org/mailman/listinfo/freebsd-fs
about: Get community support for OpenZFS on FreeBSD
- name: OpenZFS on IRC
url: https://web.libera.chat/#openzfs
about: Use IRC to get community support for OpenZFS
+33
View File
@@ -0,0 +1,33 @@
---
name: Feature request
about: Suggest a feature for OpenZFS
title: ''
labels: 'Type: Feature'
assignees: ''
---
<!--
Thank you for suggesting a feature.
Please check our issue tracker before opening a new feature request.
Filling out the following template will help other contributors better understand your proposed feature.
-->
### Describe the feature would like to see added to OpenZFS
<!--
Provide a clear and concise description of the feature.
-->
### How will this feature improve OpenZFS?
<!--
What problem does this feature solve?
-->
### Additional context
<!--
Any additional information you can add about the proposal?
-->
+8 -6
View File
@@ -4,7 +4,7 @@
<!---
Documentation on ZFS Buildbot options can be found at
https://github.com/zfsonlinux/zfs/wiki/Buildbot-Options
https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.html
-->
### Motivation and Context
@@ -19,6 +19,7 @@ https://github.com/zfsonlinux/zfs/wiki/Buildbot-Options
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->
<!--- If your change is a performance enhancement, please provide benchmarks here. -->
<!--- Please think about using the draft PR feature if appropriate -->
### Types of changes
<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
@@ -27,14 +28,15 @@ https://github.com/zfsonlinux/zfs/wiki/Buildbot-Options
- [ ] Performance enhancement (non-breaking change which improves efficiency)
- [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs\_core, libnvpair, libuutil and libzfsbootenv)
- [ ] Documentation (a change to man pages or other documentation)
### Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
- [ ] My code follows the ZFS on Linux [code style requirements](https://github.com/zfsonlinux/zfs/blob/master/.github/CONTRIBUTING.md#coding-conventions).
- [ ] My code follows the OpenZFS [code style requirements](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#coding-conventions).
- [ ] I have updated the documentation accordingly.
- [ ] I have read the [**contributing** document](https://github.com/zfsonlinux/zfs/blob/master/.github/CONTRIBUTING.md).
- [ ] I have added [tests](https://github.com/zfsonlinux/zfs/tree/master/tests) to cover my changes.
- [ ] All new and existing tests passed.
- [ ] All commit messages are properly formatted and contain [`Signed-off-by`](https://github.com/zfsonlinux/zfs/blob/master/.github/CONTRIBUTING.md#signed-off-by).
- [ ] I have read the [**contributing** document](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md).
- [ ] I have added [tests](https://github.com/openzfs/zfs/tree/master/tests) to cover my changes.
- [ ] I have run the ZFS Test Suite with this change applied.
- [ ] All commit messages are properly formatted and contain [`Signed-off-by`](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#signed-off-by).
+4 -1
View File
@@ -4,7 +4,8 @@ codecov:
after_n_builds: 2 # user and kernel
coverage:
precision: 2 # 2 digits of precision
precision: 0 # 0 decimals of precision
round: nearest # Round to nearest precision point
range: "50...90" # red -> yellow -> green
status:
@@ -20,3 +21,5 @@ comment:
layout: "reach, diff, flags, footer"
behavior: once # update if exists; post new; skip if deleted
require_changes: yes # only post when coverage changes
# ignore: Please place any ignores in config/ax_code_coverage.m4 instead
+4
View File
@@ -0,0 +1,4 @@
name: "Custom CodeQL Analysis"
queries:
- uses: ./.github/codeql/custom-queries/cpp/deprecatedFunctionUsage.ql
+4
View File
@@ -0,0 +1,4 @@
name: "Custom CodeQL Analysis"
paths-ignore:
- tests
@@ -0,0 +1,59 @@
/**
* @name Deprecated function usage detection
* @description Detects functions whose usage is banned from the OpenZFS
* codebase due to QA concerns.
* @kind problem
* @severity error
* @id cpp/deprecated-function-usage
*/
import cpp
predicate isDeprecatedFunction(Function f) {
f.getName() = "strtok" or
f.getName() = "__xpg_basename" or
f.getName() = "basename" or
f.getName() = "dirname" or
f.getName() = "bcopy" or
f.getName() = "bcmp" or
f.getName() = "bzero" or
f.getName() = "asctime" or
f.getName() = "asctime_r" or
f.getName() = "gmtime" or
f.getName() = "localtime" or
f.getName() = "strncpy"
}
string getReplacementMessage(Function f) {
if f.getName() = "strtok" then
result = "Use strtok_r(3) instead!"
else if f.getName() = "__xpg_basename" then
result = "basename(3) is underspecified. Use zfs_basename() instead!"
else if f.getName() = "basename" then
result = "basename(3) is underspecified. Use zfs_basename() instead!"
else if f.getName() = "dirname" then
result = "dirname(3) is underspecified. Use zfs_dirnamelen() instead!"
else if f.getName() = "bcopy" then
result = "bcopy(3) is deprecated. Use memcpy(3)/memmove(3) instead!"
else if f.getName() = "bcmp" then
result = "bcmp(3) is deprecated. Use memcmp(3) instead!"
else if f.getName() = "bzero" then
result = "bzero(3) is deprecated. Use memset(3) instead!"
else if f.getName() = "asctime" then
result = "Use strftime(3) instead!"
else if f.getName() = "asctime_r" then
result = "Use strftime(3) instead!"
else if f.getName() = "gmtime" then
result = "gmtime(3) isn't thread-safe. Use gmtime_r(3) instead!"
else if f.getName() = "localtime" then
result = "localtime(3) isn't thread-safe. Use localtime_r(3) instead!"
else
result = "strncpy(3) is deprecated. Use strlcpy(3) instead!"
}
from FunctionCall fc, Function f
where
fc.getTarget() = f and
isDeprecatedFunction(f)
select fc, getReplacementMessage(f)
@@ -0,0 +1,4 @@
name: openzfs-cpp-queries
version: 0.0.0
libraryPathDependencies: codeql-cpp
suites: openzfs-cpp-suite
+13
View File
@@ -0,0 +1,13 @@
# Configuration for probot-no-response - https://github.com/probot/no-response
# Number of days of inactivity before an Issue is closed for lack of response
daysUntilClose: 31
# Label requiring a response
responseRequiredLabel: "Status: Feedback requested"
# Comment to post when closing an Issue for lack of response. Set to `false` to disable
closeComment: >
This issue has been automatically closed because there has been no response
to our request for more information from the original author. With only the
information that is currently in the issue, we don't have enough information
to take action. Please reach out if you have or find the answers we need so
that we can investigate further.
+26
View File
@@ -0,0 +1,26 @@
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 365
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 90
# Limit to only `issues` or `pulls`
only: issues
# Issues with these labels will never be considered stale
exemptLabels:
- "Type: Feature"
- "Bot: Not Stale"
- "Status: Work in Progress"
# Set to true to ignore issues in a project (defaults to false)
exemptProjects: true
# Set to true to ignore issues in a milestone (defaults to false)
exemptMilestones: true
# Set to true to ignore issues with an assignee (defaults to false)
exemptAssignees: true
# Label to use when marking an issue as stale
staleLabel: "Status: Stale"
# Comment to post when marking an issue as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as "stale" because it has not had
any activity for a while. It will be closed in 90 days if no further activity occurs.
Thank you for your contributions.
# Limit the number of actions per hour, from 1-30. Default is 30
limitPerRun: 6
-3
View File
@@ -1,3 +0,0 @@
preprocessorErrorDirective:./module/zfs/vdev_raidz_math_avx512f.c:243
preprocessorErrorDirective:./module/zfs/vdev_raidz_math_sse2.c:266
+61
View File
@@ -0,0 +1,61 @@
## The testings are done this way
```mermaid
flowchart TB
subgraph CleanUp and Summary
CleanUp+Summary
end
subgraph Functional Testings
sanity-checks-20.04
zloop-checks-20.04
functional-testing-20.04-->Part1-20.04
functional-testing-20.04-->Part2-20.04
functional-testing-20.04-->Part3-20.04
functional-testing-20.04-->Part4-20.04
functional-testing-22.04-->Part1-22.04
functional-testing-22.04-->Part2-22.04
functional-testing-22.04-->Part3-22.04
functional-testing-22.04-->Part4-22.04
sanity-checks-22.04
zloop-checks-22.04
end
subgraph Code Checking + Building
Build-Ubuntu-20.04
codeql.yml
checkstyle.yml
Build-Ubuntu-22.04
end
Build-Ubuntu-20.04-->sanity-checks-20.04
Build-Ubuntu-20.04-->zloop-checks-20.04
Build-Ubuntu-20.04-->functional-testing-20.04
Build-Ubuntu-22.04-->sanity-checks-22.04
Build-Ubuntu-22.04-->zloop-checks-22.04
Build-Ubuntu-22.04-->functional-testing-22.04
sanity-checks-20.04-->CleanUp+Summary
Part1-20.04-->CleanUp+Summary
Part2-20.04-->CleanUp+Summary
Part3-20.04-->CleanUp+Summary
Part4-20.04-->CleanUp+Summary
Part1-22.04-->CleanUp+Summary
Part2-22.04-->CleanUp+Summary
Part3-22.04-->CleanUp+Summary
Part4-22.04-->CleanUp+Summary
sanity-checks-22.04-->CleanUp+Summary
```
1) build zfs modules for Ubuntu 20.04 and 22.04 (~15m)
2) 2x zloop test (~10m) + 2x sanity test (~25m)
3) 4x functional testings in parts 1..4 (each ~1h)
4) cleanup and create summary
- content of summary depends on the results of the steps
When everything runs fine, the full run should be done in
about 2 hours.
The codeql.yml and checkstyle.yml are not part in this circle.
+57
View File
@@ -0,0 +1,57 @@
acl
alien
attr
autoconf
bc
build-essential
curl
dbench
debhelper-compat
dh-python
dkms
fakeroot
fio
gdb
gdebi
git
ksh
lcov
libacl1-dev
libaio-dev
libattr1-dev
libblkid-dev
libcurl4-openssl-dev
libdevmapper-dev
libelf-dev
libffi-dev
libmount-dev
libpam0g-dev
libselinux1-dev
libssl-dev
libtool
libudev-dev
linux-headers-generic
lsscsi
mdadm
nfs-kernel-server
pamtester
parted
po-debconf
python3
python3-all-dev
python3-cffi
python3-dev
python3-packaging
python3-pip
python3-setuptools
python3-sphinx
rng-tools-debian
rsync
samba
sysstat
uuid-dev
watchdog
wget
xfslibs-dev
xz-utils
zlib1g-dev
@@ -0,0 +1,5 @@
cppcheck
devscripts
mandoc
pax-utils
shellcheck
+59
View File
@@ -0,0 +1,59 @@
name: checkstyle
on:
push:
pull_request:
jobs:
checkstyle:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Install dependencies
run: |
# https://github.com/orgs/community/discussions/47863
sudo apt-mark hold grub-efi-amd64-signed
sudo apt-get update --fix-missing
sudo apt-get upgrade
sudo xargs --arg-file=${{ github.workspace }}/.github/workflows/build-dependencies.txt apt-get install -qq
sudo xargs --arg-file=${{ github.workspace }}/.github/workflows/checkstyle-dependencies.txt apt-get install -qq
sudo python3 -m pip install --quiet flake8
sudo apt-get clean
# confirm that the tools are installed
# the build system doesn't fail when they are not
checkbashisms --version
cppcheck --version
flake8 --version
scanelf --version
shellcheck --version
- name: Prepare
run: |
./autogen.sh
./configure
make -j$(nproc) --no-print-directory --silent
- name: Checkstyle
run: |
make -j$(nproc) --no-print-directory --silent checkstyle
- name: Lint
run: |
make -j$(nproc) --no-print-directory --silent lint
- name: CheckABI
id: CheckABI
run: |
docker run -v $PWD:/source ghcr.io/openzfs/libabigail make -j$(nproc) --no-print-directory --silent checkabi
- name: StoreABI
if: failure() && steps.CheckABI.outcome == 'failure'
run: |
docker run -v $PWD:/source ghcr.io/openzfs/libabigail make -j$(nproc) --no-print-directory --silent storeabi
- name: Prepare artifacts
if: failure() && steps.CheckABI.outcome == 'failure'
run: |
find -name *.abi | tar -cf abi_files.tar -T -
- uses: actions/upload-artifact@v4
if: failure() && steps.CheckABI.outcome == 'failure'
with:
name: New ABI files (use only if you're sure about interface changes)
path: abi_files.tar
+41
View File
@@ -0,0 +1,41 @@
name: "CodeQL"
on:
push:
pull_request:
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: [ 'cpp', 'python' ]
steps:
- name: Set make jobs
run: |
echo "MAKEFLAGS=-j$(nproc)" >> $GITHUB_ENV
- name: Checkout repository
uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
config-file: .github/codeql-${{ matrix.language }}.yml
languages: ${{ matrix.language }}
- name: Autobuild
uses: github/codeql-action/autobuild@v2
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
with:
category: "/language:${{matrix.language}}"
+119
View File
@@ -0,0 +1,119 @@
#!/usr/bin/env bash
# for runtime reasons we split functional testings into N parts
# - use a define to check for missing tarfiles
FUNCTIONAL_PARTS="4"
ZTS_REPORT="tests/test-runner/bin/zts-report.py"
chmod +x $ZTS_REPORT
function output() {
echo -e $* >> Summary.md
}
function error() {
output ":bangbang: $* :bangbang:\n"
}
# this function generates the real summary
# - expects a logfile "log" in current directory
function generate() {
# we issued some error already
test ! -s log && return
# for overview and zts-report
cat log | grep '^Test' > list
# error details
awk '/\[FAIL\]|\[KILLED\]/{ show=1; print; next; }
/\[SKIP\]|\[PASS\]/{ show=0; } show' log > err
# summary of errors
if [ -s err ]; then
output "<pre>"
$ZTS_REPORT --no-maybes ./list >> Summary.md
output "</pre>"
# generate seperate error logfile
ERRLOGS=$((ERRLOGS+1))
errfile="err-$ERRLOGS.md"
echo -e "\n## $headline (debugging)\n" >> $errfile
echo "<details><summary>Error Listing - with dmesg and dbgmsg</summary><pre>" >> $errfile
dd if=err bs=999k count=1 >> $errfile
echo "</pre></details>" >> $errfile
else
output "All tests passed :thumbsup:"
fi
output "<details><summary>Full Listing</summary><pre>"
cat list >> Summary.md
output "</pre></details>"
# remove tmp files
rm -f err list log
}
# check tarfiles and untar
function check_tarfile() {
if [ -f "$1" ]; then
tar xf "$1" || error "Tarfile $1 returns some error"
else
error "Tarfile $1 not found"
fi
}
# check logfile and concatenate test results
function check_logfile() {
if [ -f "$1" ]; then
cat "$1" >> log
else
error "Logfile $1 not found"
fi
}
# sanity
function summarize_s() {
headline="$1"
output "\n## $headline\n"
rm -rf testfiles
check_tarfile "$2/sanity.tar"
check_logfile "testfiles/log"
generate
}
# functional
function summarize_f() {
headline="$1"
output "\n## $headline\n"
rm -rf testfiles
for i in $(seq 1 $FUNCTIONAL_PARTS); do
tarfile="$2-part$i/part$i.tar"
check_tarfile "$tarfile"
check_logfile "testfiles/log"
done
generate
}
# https://docs.github.com/en/enterprise-server@3.6/actions/using-workflows/workflow-commands-for-github-actions#step-isolation-and-limits
# Job summaries are isolated between steps and each step is restricted to a maximum size of 1MiB.
# [ ] can not show all error findings here
# [x] split files into smaller ones and create additional steps
ERRLOGS=0
if [ ! -f Summary/Summary.md ]; then
# first call, we do the default summary (~500k)
echo -n > Summary.md
summarize_s "Sanity Tests Ubuntu 20.04" Logs-20.04-sanity
summarize_s "Sanity Tests Ubuntu 22.04" Logs-22.04-sanity
summarize_f "Functional Tests Ubuntu 20.04" Logs-20.04-functional
summarize_f "Functional Tests Ubuntu 22.04" Logs-22.04-functional
cat Summary.md >> $GITHUB_STEP_SUMMARY
mkdir -p Summary
mv *.md Summary
else
# here we get, when errors where returned in first call
test -f Summary/err-$1.md && cat Summary/err-$1.md >> $GITHUB_STEP_SUMMARY
fi
exit 0
+88
View File
@@ -0,0 +1,88 @@
#!/usr/bin/env bash
set -eu
function prerun() {
echo "::group::Install build dependencies"
# remove snap things, update+upgrade will be faster then
for x in lxd core20 snapd; do sudo snap remove $x; done
sudo apt-get purge snapd google-chrome-stable firefox
# https://github.com/orgs/community/discussions/47863
sudo apt-get remove grub-efi-amd64-bin grub-efi-amd64-signed shim-signed --allow-remove-essential
sudo apt-get update
sudo apt upgrade
sudo xargs --arg-file=.github/workflows/build-dependencies.txt apt-get install -qq
sudo apt-get clean
sudo dmesg -c > /var/tmp/dmesg-prerun
echo "::endgroup::"
}
function mod_build() {
echo "::group::Generate debian packages"
./autogen.sh
./configure --enable-debug --enable-debuginfo --enable-asan --enable-ubsan
make --no-print-directory --silent native-deb-utils native-deb-kmod
mv ../*.deb .
rm ./openzfs-zfs-dracut*.deb ./openzfs-zfs-dkms*.deb
echo "$ImageOS-$ImageVersion" > tests/ImageOS.txt
echo "::endgroup::"
}
function mod_install() {
# install the pre-built module only on the same runner image
MOD=`cat tests/ImageOS.txt`
if [ "$MOD" != "$ImageOS-$ImageVersion" ]; then
rm -f *.deb
mod_build
fi
echo "::group::Install and load modules"
# don't use kernel-shipped zfs modules
sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
sudo apt-get install --fix-missing ./*.deb
# Native Debian packages enable and start the services
# Stop zfs-zed daemon, as it may interfere with some ZTS test cases
sudo systemctl stop zfs-zed
sudo depmod -a
sudo modprobe zfs
sudo dmesg
sudo dmesg -c > /var/tmp/dmesg-module-load
echo "::endgroup::"
echo "::group::Report CPU information"
lscpu
cat /proc/spl/kstat/zfs/chksum_bench
echo "::endgroup::"
echo "::group::Optimize storage for ZFS testings"
# remove swap and umount fast storage
# 89GiB -> rootfs + bootfs with ~80MB/s -> don't care
# 64GiB -> /mnt with 420MB/s -> new testing ssd
sudo swapoff -a
# this one is fast and mounted @ /mnt
# -> we reformat with ext4 + move it to /var/tmp
DEV="/dev/disk/azure/resource-part1"
sudo umount /mnt
sudo mkfs.ext4 -O ^has_journal -F $DEV
sudo mount -o noatime,barrier=0 $DEV /var/tmp
sudo chmod 1777 /var/tmp
# disk usage afterwards
sudo df -h /
sudo df -h /var/tmp
sudo fstrim -a
echo "::endgroup::"
}
case "$1" in
build)
prerun
mod_build
;;
tests)
prerun
mod_install
;;
esac
+24
View File
@@ -0,0 +1,24 @@
#!/usr/bin/env bash
set -eu
TDIR="/usr/share/zfs/zfs-tests/tests/functional"
echo -n "TODO="
case "$1" in
part1)
# ~1h 20m
echo "cli_root"
;;
part2)
# ~1h
ls $TDIR|grep '^[a-m]'|grep -v "cli_root"|xargs|tr -s ' ' ','
;;
part3)
# ~1h
ls $TDIR|grep '^[n-qs-z]'|xargs|tr -s ' ' ','
;;
part4)
# ~1h
ls $TDIR|grep '^r'|xargs|tr -s ' ' ','
;;
esac
+124
View File
@@ -0,0 +1,124 @@
name: zfs-linux-tests
on:
workflow_call:
inputs:
os:
description: 'The ubuntu version: 20.02 or 22.04'
required: true
type: string
jobs:
zloop:
runs-on: ubuntu-${{ inputs.os }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- uses: actions/download-artifact@v4
with:
name: modules-${{ inputs.os }}
- name: Install modules
run: |
tar xzf modules-${{ inputs.os }}.tgz
.github/workflows/scripts/setup-dependencies.sh tests
- name: Tests
timeout-minutes: 30
run: |
sudo mkdir -p /var/tmp/zloop
# run for 10 minutes or at most 2 iterations for a maximum runner
# time of 20 minutes.
sudo /usr/share/zfs/zloop.sh -t 600 -I 2 -l -m1 -- -T 120 -P 60
- name: Prepare artifacts
if: failure()
run: |
sudo chmod +r -R /var/tmp/zloop/
- uses: actions/upload-artifact@v4
if: failure()
with:
name: Zpool-logs-${{ inputs.os }}
path: |
/var/tmp/zloop/*/
!/var/tmp/zloop/*/vdev/
retention-days: 14
if-no-files-found: ignore
- uses: actions/upload-artifact@v4
if: failure()
with:
name: Zpool-files-${{ inputs.os }}
path: |
/var/tmp/zloop/*/vdev/
retention-days: 14
if-no-files-found: ignore
sanity:
runs-on: ubuntu-${{ inputs.os }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- uses: actions/download-artifact@v4
with:
name: modules-${{ inputs.os }}
- name: Install modules
run: |
tar xzf modules-${{ inputs.os }}.tgz
.github/workflows/scripts/setup-dependencies.sh tests
- name: Tests
timeout-minutes: 60
shell: bash
run: |
set -o pipefail
/usr/share/zfs/zfs-tests.sh -vKR -s 3G -r sanity | scripts/zfs-tests-color.sh
- name: Prepare artifacts
if: success() || failure()
run: |
RESPATH="/var/tmp/test_results"
mv -f $RESPATH/current $RESPATH/testfiles
tar cf $RESPATH/sanity.tar -h -C $RESPATH testfiles
- uses: actions/upload-artifact@v4
if: success() || failure()
with:
name: Logs-${{ inputs.os }}-sanity
path: /var/tmp/test_results/sanity.tar
if-no-files-found: ignore
functional:
runs-on: ubuntu-${{ inputs.os }}
strategy:
fail-fast: false
matrix:
tests: [ part1, part2, part3, part4 ]
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- uses: actions/download-artifact@v4
with:
name: modules-${{ inputs.os }}
- name: Install modules
run: |
tar xzf modules-${{ inputs.os }}.tgz
.github/workflows/scripts/setup-dependencies.sh tests
- name: Setup tests
run: |
.github/workflows/scripts/setup-functional.sh ${{ matrix.tests }} >> $GITHUB_ENV
- name: Tests
timeout-minutes: 120
shell: bash
run: |
set -o pipefail
/usr/share/zfs/zfs-tests.sh -vKR -s 3G -T ${{ env.TODO }} | scripts/zfs-tests-color.sh
- name: Prepare artifacts
if: success() || failure()
run: |
RESPATH="/var/tmp/test_results"
mv -f $RESPATH/current $RESPATH/testfiles
tar cf $RESPATH/${{ matrix.tests }}.tar -h -C $RESPATH testfiles
- uses: actions/upload-artifact@v4
if: success() || failure()
with:
name: Logs-${{ inputs.os }}-functional-${{ matrix.tests }}
path: /var/tmp/test_results/${{ matrix.tests }}.tar
if-no-files-found: ignore
+64
View File
@@ -0,0 +1,64 @@
name: zfs-linux
on:
push:
pull_request:
jobs:
build:
name: Build
strategy:
fail-fast: false
matrix:
os: [20.04, 22.04]
runs-on: ubuntu-${{ matrix.os }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Build modules
run: .github/workflows/scripts/setup-dependencies.sh build
- name: Prepare modules upload
run: tar czf modules-${{ matrix.os }}.tgz *.deb .github tests/test-runner tests/ImageOS.txt
- uses: actions/upload-artifact@v4
with:
name: modules-${{ matrix.os }}
path: modules-${{ matrix.os }}.tgz
retention-days: 14
testings:
name: Testing
strategy:
fail-fast: false
matrix:
os: [20.04, 22.04]
needs: build
uses: ./.github/workflows/zfs-linux-tests.yml
with:
os: ${{ matrix.os }}
cleanup:
if: always()
name: Cleanup
runs-on: ubuntu-22.04
needs: testings
steps:
- uses: actions/download-artifact@v4
- name: Generating summary
run: |
tar xzf modules-22.04/modules-22.04.tgz .github tests
.github/workflows/scripts/generate-summary.sh
# up to 4 steps, each can have 1 MiB output (for debugging log files)
- name: Summary for errors #1
run: .github/workflows/scripts/generate-summary.sh 1
- name: Summary for errors #2
run: .github/workflows/scripts/generate-summary.sh 2
- name: Summary for errors #3
run: .github/workflows/scripts/generate-summary.sh 3
- name: Summary for errors #4
run: .github/workflows/scripts/generate-summary.sh 4
- uses: actions/upload-artifact@v4
with:
name: Summary Files
path: Summary/
+56 -31
View File
@@ -1,6 +1,7 @@
#
# N.B.
# This is the toplevel .gitignore file.
# This is the top-level .gitignore file:
# ignore everything except a list of allowed files.
#
# This is not the place for entries that are specific to
# a subdirectory. Instead add those files to the
# .gitignore file in that subdirectory.
@@ -10,6 +11,57 @@
# command after changing this file, to see if there are
# any tracked files which get ignored after the change.
*
!.github
!cmd
!config
!contrib
!etc
!include
!lib
!man
!module
!rpm
!scripts
!tests
!udev
!.github/**
!cmd/**
!config/**
!contrib/**
!etc/**
!include/**
!lib/**
!man/**
!module/**
!rpm/**
!scripts/**
!tests/**
!udev/**
!.editorconfig
!.cirrus.yml
!.gitignore
!.gitmodules
!.mailmap
!AUTHORS
!autogen.sh
!CODE_OF_CONDUCT.md
!configure.ac
!copy-builtin
!COPYRIGHT
!LICENSE
!Makefile.am
!META
!NEWS
!NOTICE
!README.md
!RELEASES.md
!TEST
!zfs.release.in
#
# Normal rules
#
@@ -31,35 +83,8 @@
modules.order
Makefile
Makefile.in
#
# Top level generated files specific to this top level dir
#
/bin
/configure
/config.log
/config.status
/libtool
/zfs_config.h
/zfs_config.h.in
/zfs.release
/stamp-h1
/aclocal.m4
/autom4te.cache
#
# Top level generic files
#
!.gitignore
tags
TAGS
current
cscope.*
*.rpm
*.deb
*.tar.gz
changelog
*.patch
*.orig
*.tmp
*.log
venv
+1 -1
View File
@@ -1,3 +1,3 @@
[submodule "scripts/zfs-images"]
path = scripts/zfs-images
url = https://github.com/zfsonlinux/zfs-images
url = https://github.com/openzfs/zfs-images
+213
View File
@@ -0,0 +1,213 @@
# This file maps the name+email seen in a commit back to a canonical
# name+email. Git will replace the commit name/email with the canonical version
# wherever it sees it.
#
# If there is a commit in the history with a "wrong" name or email, list it
# here. If you regularly commit with an alternate name or email address and
# would like to ensure that you are always listed consistently in the repo, add
# mapping here.
#
# On the other hand, if you use multiple names or email addresses legitimately
# (eg you use a company email address for your paid OpenZFS work, and a
# personal address for your evening side projects), then don't map one to the
# other here.
#
# The most common formats are:
#
# Canonical Name <canonical-email>
# Canonical Name <canonical-email> <commit-email>
# Canonical Name <canonical-email> Commit Name <commit-email>
#
# See https://git-scm.com/docs/gitmailmap for more info.
# These maps are making names consistent where they have varied but the email
# address has never changed. In most cases, the full name is in the
# Signed-off-by of a commit with a matching author.
Ahelenia Ziemiańska <nabijaczleweli@gmail.com>
Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Alex John <alex@stty.io>
Andreas Dilger <adilger@dilger.ca>
Andrew Walker <awalker@ixsystems.com>
Benedikt Neuffer <github@itfriend.de>
Chengfei Zhu <chengfeix.zhu@intel.com>
ChenHao Lu <18302010006@fudan.edu.cn>
Chris Lindee <chris.lindee+github@gmail.com>
Colm Buckley <colm@tuatha.org>
Crag Wang <crag0715@gmail.com>
Damian Szuberski <szuberskidamian@gmail.com>
Daniel Kolesa <daniel@octaforge.org>
Debabrata Banerjee <dbavatar@gmail.com>
Finix Yan <yanchongwen@hotmail.com>
Gaurav Kumar <gauravk.18@gmail.com>
Gionatan Danti <g.danti@assyoma.it>
Glenn Washburn <development@efficientek.com>
Gordan Bobic <gordan.bobic@gmail.com>
Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
hedong zhang <h_d_zhang@163.com>
Ilkka Sovanto <github@ilkka.kapsi.fi>
InsanePrawn <Insane.Prawny@gmail.com>
Jason Cohen <jwittlincohen@gmail.com>
Jason Harmening <jason.harmening@gmail.com>
Jeremy Faulkner <gldisater@gmail.com>
Jinshan Xiong <jinshan.xiong@gmail.com>
John Poduska <jpoduska@datto.com>
Justin Scholz <git@justinscholz.de>
Ka Ho Ng <khng300@gmail.com>
Kash Pande <github@tripleback.net>
Kay Pedersen <christianpe96@gmail.com>
KernelOfTruth <kerneloftruth@gmail.com>
Liu Hua <liu.hua130@zte.com.cn>
Liu Qing <winglq@gmail.com>
loli10K <ezomori.nozomu@gmail.com>
Mart Frauenlob <allkind@fastest.cc>
Matthias Blankertz <matthias@blankertz.org>
Michael Gmelin <grembo@FreeBSD.org>
Olivier Mazouffre <olivier.mazouffre@ims-bordeaux.fr>
Piotr Kubaj <pkubaj@anongoth.pl>
Quentin Zdanis <zdanisq@gmail.com>
Roberto Ricci <ricci@disroot.org>
Rob Norris <robn@despairlabs.com>
Rob Norris <rob.norris@klarasystems.com>
Sam Lunt <samuel.j.lunt@gmail.com>
Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Stoiko Ivanov <github@nomore.at>
Tamas TEVESZ <ice@extreme.hu>
WHR <msl0000023508@gmail.com>
Yanping Gao <yanping.gao@xtaotech.com>
Youzhong Yang <youzhong@gmail.com>
# Signed-off-by: overriding Author:
Ryan <errornointernet@envs.net> <error.nointernet@gmail.com>
Qiuhao Chen <chenqiuhao1997@gmail.com> <haohao0924@126.com>
Yuxin Wang <yuxinwang9999@gmail.com> <Bi11gates9999@gmail.com>
Zhenlei Huang <zlei@FreeBSD.org> <zlei.huang@gmail.com>
# Commits from strange places, long ago
Brian Behlendorf <behlendorf1@llnl.gov> <behlendo@7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c>
Brian Behlendorf <behlendorf1@llnl.gov> <behlendo@fedora-17-amd64.(none)>
Brian Behlendorf <behlendorf1@llnl.gov> <behlendo@myhost.(none)>
Brian Behlendorf <behlendorf1@llnl.gov> <ubuntu@ip-172-31-16-145.us-west-1.compute.internal>
Brian Behlendorf <behlendorf1@llnl.gov> <ubuntu@ip-172-31-20-6.us-west-1.compute.internal>
Herb Wartens <wartens2@llnl.gov> <wartens2@7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c>
Ned Bass <bass6@llnl.gov> <bass6@zeno1.(none)>
Tulsi Jain <tulsi.jain@delphix.com> <tulsi.jain@Tulsi-Jains-MacBook-Pro.local>
# Mappings from Github no-reply addresses
ajs124 <git@ajs124.de> <ajs124@users.noreply.github.com>
Alek Pinchuk <apinchuk@axcient.com> <alek-p@users.noreply.github.com>
Alexander Lobakin <alobakin@pm.me> <solbjorn@users.noreply.github.com>
Alexey Smirnoff <fling@member.fsf.org> <fling-@users.noreply.github.com>
Allen Holl <allen.m.holl@gmail.com> <65494904+allen-4@users.noreply.github.com>
Alphan Yılmaz <alphanyilmaz@gmail.com> <a1ea321@users.noreply.github.com>
Ameer Hamza <ahamza@ixsystems.com> <106930537+ixhamza@users.noreply.github.com>
Andrew J. Hesford <ajh@sideband.org> <48421688+ahesford@users.noreply.github.com>>
Andrew Sun <me@andrewsun.com> <as-com@users.noreply.github.com>
Aron Xu <happyaron.xu@gmail.com> <happyaron@users.noreply.github.com>
Arun KV <arun.kv@datacore.com> <65647132+arun-kv@users.noreply.github.com>
Ben Wolsieffer <benwolsieffer@gmail.com> <lopsided98@users.noreply.github.com>
bernie1995 <bernie.pikes@gmail.com> <42413912+bernie1995@users.noreply.github.com>
Bojan Novković <bnovkov@FreeBSD.org> <72801811+bnovkov@users.noreply.github.com>
Boris Protopopov <boris.protopopov@actifio.com> <bprotopopov@users.noreply.github.com>
Brad Forschinger <github@bnjf.id.au> <bnjf@users.noreply.github.com>
Brandon Thetford <brandon@dodecatec.com> <dodexahedron@users.noreply.github.com>
buzzingwires <buzzingwires@outlook.com> <131118055+buzzingwires@users.noreply.github.com>
Cedric Maunoury <cedric.maunoury@gmail.com> <38213715+cedricmaunoury@users.noreply.github.com>
Charles Suh <charles.suh@gmail.com> <charlessuh@users.noreply.github.com>
Chris Peredun <chris.peredun@ixsystems.com> <126915832+chrisperedun@users.noreply.github.com>
Dacian Reece-Stremtan <dacianstremtan@gmail.com> <35844628+dacianstremtan@users.noreply.github.com>
Damian Szuberski <szuberskidamian@gmail.com> <30863496+szubersk@users.noreply.github.com>
Daniel Hiepler <d-git@coderdu.de> <32984777+heeplr@users.noreply.github.com>
Daniel Kobras <d.kobras@science-computing.de> <sckobras@users.noreply.github.com>
Daniel Reichelt <hacking@nachtgeist.net> <nachtgeist@users.noreply.github.com>
David Quigley <david.quigley@intel.com> <dpquigl@users.noreply.github.com>
Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com> <31087738+dennisfriedrichsen@users.noreply.github.com>
Dex Wood <slash2314@gmail.com> <slash2314@users.noreply.github.com>
DHE <git@dehacked.net> <DeHackEd@users.noreply.github.com>
Dmitri John Ledkov <dimitri.ledkov@canonical.com> <19779+xnox@users.noreply.github.com>
Dries Michiels <driesm.michiels@gmail.com> <32487486+driesmp@users.noreply.github.com>
Edmund Nadolski <edmund.nadolski@ixsystems.com> <137826107+ednadolski-ix@users.noreply.github.com>
Érico Nogueira <erico.erc@gmail.com> <34201958+ericonr@users.noreply.github.com>
Fedor Uporov <fuporov.vstack@gmail.com> <60701163+fuporovvStack@users.noreply.github.com>
Felix Dörre <felix@dogcraft.de> <felixdoerre@users.noreply.github.com>
Felix Neumärker <xdch47@posteo.de> <34678034+xdch47@users.noreply.github.com>
Finix Yan <yancw@info2soft.com> <Finix1979@users.noreply.github.com>
Gaurav Kumar <gauravk.18@gmail.com> <gaurkuma@users.noreply.github.com>
George Gaydarov <git@gg7.io> <gg7@users.noreply.github.com>
Georgy Yakovlev <gyakovlev@gentoo.org> <168902+gyakovlev@users.noreply.github.com>
Gerardwx <gerardw@alum.mit.edu> <Gerardwx@users.noreply.github.com>
Gian-Carlo DeFazio <defazio1@llnl.gov> <defaziogiancarlo@users.noreply.github.com>
Giuseppe Di Natale <dinatale2@llnl.gov> <dinatale2@users.noreply.github.com>
Hajo Möller <dasjoe@gmail.com> <dasjoe@users.noreply.github.com>
Harry Mallon <hjmallon@gmail.com> <1816667+hjmallon@users.noreply.github.com>
Hiếu Lê <leorize+oss@disroot.org> <alaviss@users.noreply.github.com>
Jake Howard <git@theorangeone.net> <RealOrangeOne@users.noreply.github.com>
James Cowgill <james.cowgill@mips.com> <jcowgill@users.noreply.github.com>
Jaron Kent-Dobias <jaron@kent-dobias.com> <kentdobias@users.noreply.github.com>
Jason King <jason.king@joyent.com> <jasonbking@users.noreply.github.com>
Jeff Dike <jdike@akamai.com> <52420226+jdike@users.noreply.github.com>
Jitendra Patidar <jitendra.patidar@nutanix.com> <53164267+jsai20@users.noreply.github.com>
João Carlos Mendes Luís <jonny@jonny.eng.br> <dioni21@users.noreply.github.com>
John Eismeier <john.eismeier@gmail.com> <32205350+jeis2497052@users.noreply.github.com>
John L. Hammond <john.hammond@intel.com> <35266395+jhammond-intel@users.noreply.github.com>
John-Mark Gurney <jmg@funkthat.com> <jmgurney@users.noreply.github.com>
John Ramsden <johnramsden@riseup.net> <johnramsden@users.noreply.github.com>
Jonathon Fernyhough <jonathon@m2x.dev> <559369+jonathonf@users.noreply.github.com>
Jose Luis Duran <jlduran@gmail.com> <jlduran@users.noreply.github.com>
Justin Hibbits <chmeeedalf@gmail.com> <chmeeedalf@users.noreply.github.com>
Kevin Greene <kevin.greene@delphix.com> <104801862+kxgreene@users.noreply.github.com>
Kevin Jin <lostking2008@hotmail.com> <33590050+jxdking@users.noreply.github.com>
Kevin P. Fleming <kevin@km6g.us> <kpfleming@users.noreply.github.com>
Krzysztof Piecuch <piecuch@kpiecuch.pl> <3964215+pikrzysztof@users.noreply.github.com>
Kyle Evans <kevans@FreeBSD.org> <kevans91@users.noreply.github.com>
Laurențiu Nicola <lnicola@dend.ro> <lnicola@users.noreply.github.com>
loli10K <ezomori.nozomu@gmail.com> <loli10K@users.noreply.github.com>
Lorenz Hüdepohl <dev@stellardeath.org> <lhuedepohl@users.noreply.github.com>
Luís Henriques <henrix@camandro.org> <73643340+lumigch@users.noreply.github.com>
Marcin Skarbek <git@skarbek.name> <mskarbek@users.noreply.github.com>
Matt Fiddaman <github@m.fiddaman.uk> <81489167+matt-fidd@users.noreply.github.com>
Maxim Filimonov <che@bein.link> <part1zano@users.noreply.github.com>
Max Zettlmeißl <max@zettlmeissl.de> <6818198+maxz@users.noreply.github.com>
Michael Niewöhner <foss@mniewoehner.de> <c0d3z3r0@users.noreply.github.com>
Michael Zhivich <mzhivich@akamai.com> <33133421+mzhivich@users.noreply.github.com>
MigeljanImeri <ImeriMigel@gmail.com> <78048439+MigeljanImeri@users.noreply.github.com>
Mo Zhou <cdluminate@gmail.com> <5723047+cdluminate@users.noreply.github.com>
Nick Mattis <nickm970@gmail.com> <nmattis@users.noreply.github.com>
omni <omni+vagant@hack.org> <79493359+omnivagant@users.noreply.github.com>
Pablo Correa Gómez <ablocorrea@hotmail.com> <32678034+pablofsf@users.noreply.github.com>
Paul Zuchowski <pzuchowski@datto.com> <31706010+PaulZ-98@users.noreply.github.com>
Peter Ashford <ashford@accs.com> <pashford@users.noreply.github.com>
Peter Dave Hello <hsu@peterdavehello.org> <PeterDaveHello@users.noreply.github.com>
Peter Wirdemo <peter.wirdemo@gmail.com> <4224155+pewo@users.noreply.github.com>
Petros Koutoupis <petros@petroskoutoupis.com> <pkoutoupis@users.noreply.github.com>
Ping Huang <huangping@smartx.com> <101400146+hpingfs@users.noreply.github.com>
Piotr P. Stefaniak <pstef@freebsd.org> <pstef@users.noreply.github.com>
Richard Allen <belperite@gmail.com> <33836503+belperite@users.noreply.github.com>
Rich Ercolani <rincebrain@gmail.com> <214141+rincebrain@users.noreply.github.com>
Rick Macklem <rmacklem@uoguelph.ca> <64620010+rmacklem@users.noreply.github.com>
Rob Wing <rob.wing@klarasystems.com> <98866084+rob-wing@users.noreply.github.com>
Roman Strashkin <roman.strashkin@nexenta.com> <Ramzec@users.noreply.github.com>
Ryan Hirasaki <ryanhirasaki@gmail.com> <4690732+RyanHir@users.noreply.github.com>
Samuel Wycliffe J <samwyc@hpe.com> <115969550+samwyc@users.noreply.github.com>
Samuel Wycliffe <samuelwycliffe@gmail.com> <50765275+npc203@users.noreply.github.com>
Savyasachee Jha <hi@savyasacheejha.com> <savyajha@users.noreply.github.com>
Scott Colby <scott@scolby.com> <scolby33@users.noreply.github.com>
Sean Eric Fagan <kithrup@mac.com> <kithrup@users.noreply.github.com>
Spencer Kinny <spencerkinny1995@gmail.com> <30333052+Spencer-Kinny@users.noreply.github.com>
Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com> <75025422+nssrikanth@users.noreply.github.com>
Stefan Lendl <s.lendl@proxmox.com> <1321542+stfl@users.noreply.github.com>
Thomas Bertschinger <bertschinger@lanl.gov> <101425190+bertschinger@users.noreply.github.com>
Thomas Geppert <geppi@digitx.de> <geppi@users.noreply.github.com>
Tim Crawford <tcrawford@datto.com> <crawfxrd@users.noreply.github.com>
Todd Seidelmann <18294602+seidelma@users.noreply.github.com>
Tom Matthews <tom@axiom-partners.com> <tomtastic@users.noreply.github.com>
Tony Perkins <tperkins@datto.com> <62951051+tony-zfs@users.noreply.github.com>
Torsten Wörtwein <twoertwein@gmail.com> <twoertwein@users.noreply.github.com>
Tulsi Jain <tulsi.jain@delphix.com> <TulsiJain@users.noreply.github.com>
Václav Skála <skala@vshosting.cz> <33496485+vaclavskala@users.noreply.github.com>
Vaibhav Bhanawat <vaibhav.bhanawat@delphix.com> <88050553+vaibhav-delphix@users.noreply.github.com>
Violet Purcell <vimproved@inventati.org> <66446404+vimproved@users.noreply.github.com>
Vipin Kumar Verma <vipin.verma@hpe.com> <75025470+vermavipinkumar@users.noreply.github.com>
Wolfgang Bumiller <w.bumiller@proxmox.com> <Blub@users.noreply.github.com>
xtouqh <xtouqh@hotmail.com> <72357159+xtouqh@users.noreply.github.com>
Yuri Pankov <yuripv@FreeBSD.org> <113725409+yuripv@users.noreply.github.com>
Yuri Pankov <yuripv@FreeBSD.org> <82001006+yuripv@users.noreply.github.com>
-38
View File
@@ -1,38 +0,0 @@
language: c
sudo: required
env:
global:
# Travis limits maximum log size, we have to cut tests output
- ZFS_TEST_TRAVIS_LOG_MAX_LENGTH=800
matrix:
# tags are mainly in ascending order
- ZFS_TEST_TAGS='acl,atime,bootfs,cachefile,casenorm,chattr,checksum,clean_mirror,compression,ctime,delegate,devices,events,exec,fault,features,grow_pool,zdb,zfs,zfs_bookmark,zfs_change-key,zfs_clone,zfs_copies,zfs_create,zfs_diff,zfs_get,zfs_inherit,zfs_load-key,zfs_rename'
- ZFS_TEST_TAGS='cache,history,hkdf,inuse,zfs_property,zfs_receive,zfs_reservation,zfs_send,zfs_set,zfs_share,zfs_snapshot,zfs_unload-key,zfs_unmount,zfs_unshare,zfs_upgrade,zpool,zpool_add,zpool_attach,zpool_clear,zpool_create,zpool_destroy,zpool_detach'
- ZFS_TEST_TAGS='grow_replicas,mv_files,cli_user,zfs_mount,zfs_promote,zfs_rollback,zpool_events,zpool_expand,zpool_export,zpool_get,zpool_history,zpool_import,zpool_labelclear,zpool_offline,zpool_online,zpool_remove,zpool_reopen,zpool_replace,zpool_scrub,zpool_set,zpool_status,zpool_sync,zpool_upgrade'
- ZFS_TEST_TAGS='zfs_destroy,large_files,largest_pool,link_count,migration,mmap,mmp,mount,nestedfs,no_space,nopwrite,online_offline,pool_names,poolversion,privilege,quota,raidz,redundancy,rsend'
- ZFS_TEST_TAGS='inheritance,refquota,refreserv,rename_dirs,replacement,reservation,rootpool,scrub_mirror,slog,snapshot,snapused,sparse,threadsappend,tmpfile,truncate,upgrade,userquota,vdev_zaps,write_dirs,xattr,zvol,libzfs'
before_install:
- sudo apt-get -qq update
- sudo apt-get install --yes -qq build-essential autoconf libtool gawk alien fakeroot linux-headers-$(uname -r)
- sudo apt-get install --yes -qq zlib1g-dev uuid-dev libattr1-dev libblkid-dev libselinux-dev libudev-dev libssl-dev
# packages for tests
- sudo apt-get install --yes -qq parted lsscsi ksh attr acl nfs-kernel-server fio
install:
- git clone --depth=1 https://github.com/zfsonlinux/spl
- cd spl
- git checkout master
- sh autogen.sh
- ./configure
- make --no-print-directory -s pkg-utils pkg-kmod
- sudo dpkg -i *.deb
- cd ..
- sh autogen.sh
- ./configure
- make --no-print-directory -s pkg-utils pkg-kmod
- sudo dpkg -i *.deb
script:
- travis_wait 50 /usr/share/zfs/zfs-tests.sh -v -T $ZFS_TEST_TAGS
after_failure:
- find /var/tmp/test_results/current/log -type f -name '*' -printf "%f\n" -exec cut -c -$ZFS_TEST_TRAVIS_LOG_MAX_LENGTH {} \;
after_success:
- find /var/tmp/test_results/current/log -type f -name '*' -printf "%f\n" -exec cut -c -$ZFS_TEST_TRAVIS_LOG_MAX_LENGTH {} \;
+394 -26
View File
@@ -10,295 +10,663 @@ PAST MAINTAINERS:
CONTRIBUTORS:
Aaron Fineman <abyxcos@gmail.com>
Adam D. Moss <c@yotes.com>
Adam Leventhal <ahl@delphix.com>
Adam Stevko <adam.stevko@gmail.com>
adisbladis <adis@blad.is>
Adrian Chadd <adrian@freebsd.org>
Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Ahmed G <ahmedg@delphix.com>
Aidan Harris <me@aidanharr.is>
AJ Jordan <alex@strugee.net>
ajs124 <git@ajs124.de>
Akash Ayare <aayare@delphix.com>
Akash B <akash-b@hpe.com>
Alan Somers <asomers@gmail.com>
Alar Aun <spamtoaun@gmail.com>
Albert Lee <trisk@nexenta.com>
Alec Salazar <alec.j.salazar@gmail.com>
Alejandro Colomar <Colomar.6.4.3@GMail.com>
Alejandro R. Sedeño <asedeno@mit.edu>
Alek Pinchuk <alek@nexenta.com>
Aleksa Sarai <cyphar@cyphar.com>
Alexander Eremin <a.eremin@nexenta.com>
Alexander Lobakin <alobakin@pm.me>
Alexander Motin <mav@freebsd.org>
Alexander Pyhalov <apyhalov@gmail.com>
Alexander Richardson <Alexander.Richardson@cl.cam.ac.uk>
Alexander Stetsenko <ams@nexenta.com>
Alex Braunegg <alex.braunegg@gmail.com>
Alexey Shvetsov <alexxy@gentoo.org>
Alexey Smirnoff <fling@member.fsf.org>
Alex John <alex@stty.io>
Alex McWhirter <alexmcwhirter@triadic.us>
Alex Reece <alex@delphix.com>
Alex Wilson <alex.wilson@joyent.com>
Alex Zhuravlev <alexey.zhuravlev@intel.com>
Alexander Eremin <a.eremin@nexenta.com>
Alexander Motin <mav@freebsd.org>
Alexander Pyhalov <apyhalov@gmail.com>
Alexander Stetsenko <ams@nexenta.com>
Alexey Shvetsov <alexxy@gentoo.org>
Alexey Smirnoff <fling@member.fsf.org>
Allan Jude <allanjude@freebsd.org>
Allen Holl <allen.m.holl@gmail.com>
Alphan Yılmaz <alphanyilmaz@gmail.com>
alteriks <alteriks@gmail.com>
Alyssa Ross <hi@alyssa.is>
Ameer Hamza <ahamza@ixsystems.com>
Anatoly Borodin <anatoly.borodin@gmail.com>
AndCycle <andcycle@andcycle.idv.tw>
Andrea Gelmini <andrea.gelmini@gelma.net>
Andrea Righi <andrea.righi@canonical.com>
Andreas Buschmann <andreas.buschmann@tech.net.de>
Andreas Dilger <adilger@intel.com>
Andreas Vögele <andreas@andreasvoegele.com>
Andrew Barnes <barnes333@gmail.com>
Andrew Hamilton <ahamilto@tjhsst.edu>
Andrew Innes <andrew.c12@gmail.com>
Andrew J. Hesford <ajh@sideband.org>
Andrew Reid <ColdCanuck@nailedtotheperch.com>
Andrew Stormont <andrew.stormont@nexenta.com>
Andrew Sun <me@andrewsun.com>
Andrew Tselischev <andrewtselischev@gmail.com>
Andrew Turner <andrew@fubar.geek.nz>
Andrew Walker <awalker@ixsystems.com>
Andrey Prokopenko <job@terem.fr>
Andrey Vesnovaty <andrey.vesnovaty@gmail.com>
Andriy Gapon <avg@freebsd.org>
Andy Bakun <github@thwartedefforts.org>
Andy Fiddaman <omnios@citrus-it.co.uk>
Aniruddha Shankar <k@191a.net>
Anton Gubarkov <anton.gubarkov@gmail.com>
Antonio Russo <antonio.e.russo@gmail.com>
Arkadiusz Bubała <arkadiusz.bubala@open-e.com>
Armin Wehrfritz <dkxls23@gmail.com>
Arne Jansen <arne@die-jansens.de>
Aron Xu <happyaron.xu@gmail.com>
Arshad Hussain <arshad.hussain@aeoncomputing.com>
Arun KV <arun.kv@datacore.com>
Arvind Sankar <nivedita@alum.mit.edu>
Attila Fülöp <attila@fueloep.org>
Avatat <kontakt@avatat.pl>
Bart Coddens <bart.coddens@gmail.com>
Basil Crow <basil.crow@delphix.com>
Huang Liu <liu.huang@zte.com.cn>
Bassu <bassu@phi9.com>
Ben Allen <bsallen@alcf.anl.gov>
Ben Rubson <ben.rubson@gmail.com>
Ben Cordero <bencord0@condi.me>
Benda Xu <orv@debian.org>
Benedikt Neuffer <github@itfriend.de>
Benjamin Albrecht <git@albrecht.io>
Benjamin Gentil <benjgentil.pro@gmail.com>
Benjamin Sherman <benjamin@holyarmy.org>
Ben McGough <bmcgough@fredhutch.org>
Ben Rubson <ben.rubson@gmail.com>
Ben Wolsieffer <benwolsieffer@gmail.com>
bernie1995 <bernie.pikes@gmail.com>
Bill McGonigle <bill-github.com-public1@bfccomputing.com>
Bill Pijewski <wdp@joyent.com>
Bojan Novković <bnovkov@FreeBSD.org>
Boris Protopopov <boris.protopopov@nexenta.com>
Brad Forschinger <github@bnjf.id.au>
Brad Lewis <brad.lewis@delphix.com>
Brandon Thetford <brandon@dodecatec.com>
Brian Atkinson <bwa@g.clemson.edu>
Brian Behlendorf <behlendorf1@llnl.gov>
Brian J. Murrell <brian@sun.com>
Brooks Davis <brooks@one-eyed-alien.net>
BtbN <btbn@btbn.de>
bunder2015 <omfgbunder@gmail.com>
buzzingwires <buzzingwires@outlook.com>
bzzz77 <bzzz.tomas@gmail.com>
cable2999 <cable2999@users.noreply.github.com>
Caleb James DeLisle <calebdelisle@lavabit.com>
Cameron Harr <harr1@llnl.gov>
Cao Xuewen <cao.xuewen@zte.com.cn>
Carlo Landmeter <clandmeter@gmail.com>
Carlos Alberto Lopez Perez <clopez@igalia.com>
Cedric Maunoury <cedric.maunoury@gmail.com>
Chaoyu Zhang <zhang.chaoyu@zte.com.cn>
Charles Suh <charles.suh@gmail.com>
Chen Can <chen.can2@zte.com.cn>
Chengfei Zhu <chengfeix.zhu@intel.com>
Chen Haiquan <oc@yunify.com>
ChenHao Lu <18302010006@fudan.edu.cn>
Chip Parker <aparker@enthought.com>
Chris Burroughs <chris.burroughs@gmail.com>
Chris Davidson <christopher.davidson@gmail.com>
Chris Dunlap <cdunlap@llnl.gov>
Chris Dunlop <chris@onthe.net.au>
Chris Lindee <chris.lindee+github@gmail.com>
Chris McDonough <chrism@plope.com>
Chris Peredun <chris.peredun@ixsystems.com>
Chris Siden <chris.siden@delphix.com>
Chris Wedgwood <cw@f00f.org>
Chris Williamson <chris.williamson@delphix.com>
Chris Zubrzycki <github@mid-earth.net>
Christ Schlacta <aarcane@aarcane.info>
Chris Siebenmann <cks.github@cs.toronto.edu>
Christer Ekholm <che@chrekh.se>
Christian Kohlschütter <christian@kohlschutter.com>
Christian Neukirchen <chneukirchen@gmail.com>
Christian Schwarz <me@cschwarz.com>
Christopher Voltz <cjunk@voltz.ws>
Christ Schlacta <aarcane@aarcane.info>
Chris Wedgwood <cw@f00f.org>
Chris Williamson <chris.williamson@delphix.com>
Chris Zubrzycki <github@mid-earth.net>
Chuck Tuffli <ctuffli@gmail.com>
Chunwei Chen <david.chen@nutanix.com>
Clemens Fruhwirth <clemens@endorphin.org>
Clemens Lang <cl@clang.name>
Clint Armstrong <clint@clintarmstrong.net>
Coleman Kane <ckane@colemankane.org>
Colin Ian King <colin.king@canonical.com>
Colin Percival <cperciva@tarsnap.com>
Colm Buckley <colm@tuatha.org>
Crag Wang <crag0715@gmail.com>
Craig Loomis <cloomis@astro.princeton.edu>
Craig Sanders <github@taz.net.au>
Cyril Plisko <cyril.plisko@infinidat.com>
DHE <git@dehacked.net>
Cy Schubert <cy@FreeBSD.org>
Cédric Berger <cedric@precidata.com>
Dacian Reece-Stremtan <dacianstremtan@gmail.com>
Dag-Erling Smørgrav <des@FreeBSD.org>
Damiano Albani <damiano.albani@gmail.com>
Damian Szuberski <szuberskidamian@gmail.com>
Damian Wojsław <damian@wojslaw.pl>
Daniel Berlin <dberlin@dberlin.org>
Daniel Hiepler <d-git@coderdu.de>
Daniel Hoffman <dj.hoffman@delphix.com>
Daniel Kobras <d.kobras@science-computing.de>
Daniel Kolesa <daniel@octaforge.org>
Daniel Perry <dtperry@amazon.com>
Daniel Reichelt <hacking@nachtgeist.net>
Daniel Stevenson <bot@dstev.net>
Daniel Verite <daniel@verite.pro>
Daniil Lunev <d.lunev.mail@gmail.com>
Dan Kimmel <dan.kimmel@delphix.com>
Dan McDonald <danmcd@nexenta.com>
Dan Swartzendruber <dswartz@druber.com>
Dan Vatca <dan.vatca@gmail.com>
Daniel Hoffman <dj.hoffman@delphix.com>
Daniel Verite <daniel@verite.pro>
Daniil Lunev <d.lunev.mail@gmail.com>
Darik Horn <dajhorn@vanadac.com>
Dave Eddy <dave@daveeddy.com>
David Hedberg <david@qzx.se>
David Lamparter <equinox@diac24.net>
David Qian <david.qian@intel.com>
David Quigley <david.quigley@intel.com>
Debabrata Banerjee <dbanerje@akamai.com>
D. Ebdrup <debdrup@freebsd.org>
Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com>
Denys Rtveliashvili <denys@rtveliashvili.name>
Derek Dai <daiderek@gmail.com>
Derek Schrock <dereks@lifeofadishwasher.com>
Dex Wood <slash2314@gmail.com>
DHE <git@dehacked.net>
Didier Roche <didrocks@ubuntu.com>
Dimitri John Ledkov <xnox@ubuntu.com>
Dimitry Andric <dimitry@andric.com>
Dirkjan Bussink <d.bussink@gmail.com>
Dmitry Khasanov <pik4ez@gmail.com>
Dominic Pearson <dsp@technoanimal.net>
Dominik Hassler <hadfl@omniosce.org>
Dominik Honnef <dominikh@fork-bomb.org>
Don Brady <don.brady@delphix.com>
Doug Rabson <dfr@rabson.org>
Dr. András Korn <korn-github.com@elan.rulez.org>
Dries Michiels <driesm.michiels@gmail.com>
Edmund Nadolski <edmund.nadolski@ixsystems.com>
Eitan Adler <lists@eitanadler.com>
Eli Rosenthal <eli.rosenthal@delphix.com>
Eli Schwartz <eschwartz93@gmail.com>
Eric Desrochers <eric.desrochers@canonical.com>
Eric Dillmann <eric@jave.fr>
Eric Schrock <Eric.Schrock@delphix.com>
Ethan Coe-Renner <coerenner1@llnl.gov>
Etienne Dechamps <etienne@edechamps.fr>
Evan Allrich <eallrich@gmail.com>
Evan Harris <eharris@puremagic.com>
Evan Susarret <evansus@gmail.com>
Fabian Grünbichler <f.gruenbichler@proxmox.com>
Fabio Buso <dev.siroibaf@gmail.com>
Fabio Scaccabarozzi <fsvm88@gmail.com>
Fajar A. Nugraha <github@fajar.net>
Fan Yong <fan.yong@intel.com>
fbynite <fbynite@users.noreply.github.com>
Fedor Uporov <fuporov.vstack@gmail.com>
Felix Dörre <felix@dogcraft.de>
Felix Neumärker <xdch47@posteo.de>
Feng Sun <loyou85@gmail.com>
Finix Yan <yancw@info2soft.com>
Francesco Mazzoli <f@mazzo.li>
Frederik Wessels <wessels147@gmail.com>
Frédéric Vanniere <f.vanniere@planet-work.com>
Gabriel A. Devenyi <gdevenyi@gmail.com>
Garrett D'Amore <garrett@nexenta.com>
Garrett Fields <ghfields@gmail.com>
Garrison Jensen <garrison.jensen@gmail.com>
Gary Mills <gary_mills@fastmail.fm>
Gaurav Kumar <gauravk.18@gmail.com>
GeLiXin <ge.lixin@zte.com.cn>
George Amanakis <g_amanakis@yahoo.com>
George Diamantopoulos <georgediam@gmail.com>
George Gaydarov <git@gg7.io>
George Melikov <mail@gmelikov.ru>
George Wilson <gwilson@delphix.com>
Georgy Yakovlev <ya@sysdump.net>
Gerardwx <gerardw@alum.mit.edu>
Gian-Carlo DeFazio <defazio1@llnl.gov>
Gionatan Danti <g.danti@assyoma.it>
Giuseppe Di Natale <guss80@gmail.com>
Glenn Washburn <development@efficientek.com>
glibg10b <glibg10b@users.noreply.github.com>
gofaster <felix.gofaster@gmail.com>
Gordan Bobic <gordan@redsleeve.org>
Gordon Bergling <gbergling@googlemail.com>
Gordon Ross <gwr@nexenta.com>
Gordon Tetlow <gordon@freebsd.org>
Graham Christensen <graham@grahamc.com>
Graham Perrin <grahamperrin@gmail.com>
Gregor Kopka <gregor@kopka.net>
Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
grembo <freebsd@grem.de>
Grischa Zengel <github.zfsonlinux@zengel.info>
grodik <pat@litke.dev>
Gunnar Beutner <gunnar@beutner.name>
Gvozden Neskovic <neskovic@gmail.com>
Hajo Möller <dasjoe@gmail.com>
Han Gao <rabenda.cn@gmail.com>
Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Harald van Dijk <harald@gigawatt.nl>
Harry Mallon <hjmallon@gmail.com>
Harry Sintonen <github-piru@kyber.fi>
HC <mmttdebbcc@yahoo.com>
hedong zhang <h_d_zhang@163.com>
Heitor Alves de Siqueira <halves@canonical.com>
Henrik Riomar <henrik.riomar@gmail.com>
Herb Wartens <wartens2@llnl.gov>
Hiếu Lê <leorize+oss@disroot.org>
Huang Liu <liu.huang@zte.com.cn>
Håkan Johansson <f96hajo@chalmers.se>
Igor K <igor@dilos.org>
Igor Kozhukhov <ikozhukhov@gmail.com>
Igor Lvovsky <ilvovsky@gmail.com>
ilbsmart <wgqimut@gmail.com>
Ilkka Sovanto <github@ilkka.kapsi.fi>
illiliti <illiliti@protonmail.com>
ilovezfs <ilovezfs@icloud.com>
InsanePrawn <Insane.Prawny@gmail.com>
Isaac Huang <he.huang@intel.com>
JK Dingwall <james@dingwall.me.uk>
Jacek Fefliński <feflik@gmail.com>
Jacob Adams <tookmund@gmail.com>
Jake Howard <git@theorangeone.net>
James Cowgill <james.cowgill@mips.com>
James H <james@kagisoft.co.uk>
James Lee <jlee@thestaticvoid.com>
James Pan <jiaming.pan@yahoo.com>
James Wah <james@laird-wah.net>
Jan Engelhardt <jengelh@inai.de>
Jan Kryl <jan.kryl@nexenta.com>
Jan Sanislo <oystr@cs.washington.edu>
Jaron Kent-Dobias <jaron@kent-dobias.com>
Jason Cohen <jwittlincohen@gmail.com>
Jason Harmening <jason.harmening@gmail.com>
Jason King <jason.brian.king@gmail.com>
Jason Lee <jasonlee@lanl.gov>
Jason Zaman <jasonzaman@gmail.com>
Javen Wu <wu.javen@gmail.com>
Jean-Baptiste Lallement <jean-baptiste@ubuntu.com>
Jeff Dike <jdike@akamai.com>
Jeremy Faulkner <gldisater@gmail.com>
Jeremy Gill <jgill@parallax-innovations.com>
Jeremy Jones <jeremy@delphix.com>
Jeremy Visser <jeremy.visser@gmail.com>
Jerry Jelinek <jerry.jelinek@joyent.com>
Jessica Clarke <jrtc27@jrtc27.com>
Jinshan Xiong <jinshan.xiong@intel.com>
Jitendra Patidar <jitendra.patidar@nutanix.com>
JK Dingwall <james@dingwall.me.uk>
Joe Stein <joe.stein@delphix.com>
John-Mark Gurney <jmg@funkthat.com>
John Albietz <inthecloud247@gmail.com>
John Eismeier <john.eismeier@gmail.com>
John L. Hammond <john.hammond@intel.com>
John Gallagher <john.gallagher@delphix.com>
John Layman <jlayman@sagecloud.com>
John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
John Wren Kennedy <john.kennedy@delphix.com>
John L. Hammond <john.hammond@intel.com>
John M. Layman <jml@frijid.net>
Johnny Stenback <github@jstenback.com>
John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
John Poduska <jpoduska@datto.com>
John Ramsden <johnramsden@riseup.net>
John Wren Kennedy <john.kennedy@delphix.com>
jokersus <lolivampireslave@gmail.com>
Jonathon Fernyhough <jonathon@m2x.dev>
Jorgen Lundman <lundman@lundman.net>
Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Jose Luis Duran <jlduran@gmail.com>
Josh Soref <jsoref@users.noreply.github.com>
Joshua M. Clulow <josh@sysmgr.org>
José Luis Salvador Rufo <salvador.joseluis@gmail.com>
João Carlos Mendes Luís <jonny@jonny.eng.br>
Julian Brunner <julian.brunner@gmail.com>
Julian Heuking <JulianH@beckhoff.com>
jumbi77 <jumbi77@users.noreply.github.com>
Justin Bedő <cu@cua0.org>
Justin Gottula <justin@jgottula.com>
Justin Hibbits <chmeeedalf@gmail.com>
Justin Keogh <github.com@v6y.net>
Justin Lecher <jlec@gentoo.org>
Justin Scholz <git@justinscholz.de>
Justin T. Gibbs <gibbs@FreeBSD.org>
jyxent <jordanp@gmail.com>
Jörg Thalheim <joerg@higgsboson.tk>
KORN Andras <korn@elan.rulez.org>
ka7 <ka7@la-evento.com>
Ka Ho Ng <khng@FreeBSD.org>
Kamil Domański <kamil@domanski.co>
Karsten Kretschmer <kkretschmer@gmail.com>
Kash Pande <kash@tripleback.net>
Kay Pedersen <christianpe96@gmail.com>
Keith M Wesolowski <wesolows@foobazco.org>
Kent Ross <k@mad.cash>
KernelOfTruth <kerneloftruth@gmail.com>
Kevin Bowling <kevin.bowling@kev009.com>
Kevin Greene <kevin.greene@delphix.com>
Kevin Jin <lostking2008@hotmail.com>
Kevin P. Fleming <kevin@km6g.us>
Kevin Tanguy <kevin.tanguy@ovh.net>
KireinaHoro <i@jsteward.moe>
Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Kleber Tarcísio <klebertarcisio@yahoo.com.br>
Kody A Kantor <kody.kantor@gmail.com>
Kohsuke Kawaguchi <kk@kohsuke.org>
Konstantin Khorenko <khorenko@virtuozzo.com>
KORN Andras <korn@elan.rulez.org>
Kristof Provost <github@sigsegv.be>
Krzysztof Piecuch <piecuch@kpiecuch.pl>
Kyle Blatter <kyleblatter@llnl.gov>
Kyle Evans <kevans@FreeBSD.org>
Kyle Fuller <inbox@kylefuller.co.uk>
Loli <ezomori.nozomu@gmail.com>
Laevos <Laevos@users.noreply.github.com>
Lalufu <Lalufu@users.noreply.github.com>
Lars Johannsen <laj@it.dk>
Laura Hild <lsh@jlab.org>
Laurențiu Nicola <lnicola@dend.ro>
Lauri Tirkkonen <lauri@hacktheplanet.fi>
liaoyuxiangqin <guo.yong33@zte.com.cn>
Li Dongyang <dongyang.li@anu.edu.au>
Liu Hua <liu.hua130@zte.com.cn>
Liu Qing <winglq@gmail.com>
Li Wei <W.Li@Sun.COM>
Loli <ezomori.nozomu@gmail.com>
lorddoskias <lorddoskias@gmail.com>
Lorenz Brun <lorenz@dolansoft.org>
Lorenz Hüdepohl <dev@stellardeath.org>
louwrentius <louwrentius@gmail.com>
Lukas Wunner <lukas@wunner.de>
luozhengzheng <luo.zhengzheng@zte.com.cn>
Luís Henriques <henrix@camandro.org>
Madhav Suresh <madhav.suresh@delphix.com>
manfromafar <jonsonb10@gmail.com>
Manoj Joseph <manoj.joseph@delphix.com>
Manuel Amador (Rudd-O) <rudd-o@rudd-o.com>
Marcel Huber <marcelhuberfoo@gmail.com>
Marcel Menzel <mail@mcl.gg>
Marcel Schilling <marcel.schilling@uni-luebeck.de>
Marcel Telka <marcel.telka@nexenta.com>
Marcel Wysocki <maci.stgn@gmail.com>
Marcin Skarbek <git@skarbek.name>
Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Mark Johnston <markj@FreeBSD.org>
Mark Maybee <mark.maybee@delphix.com>
Mark Roper <markroper@gmail.com>
Mark Shellenbaum <Mark.Shellenbaum@Oracle.COM>
marku89 <mar42@kola.li>
Mark Wright <markwright@internode.on.net>
Mart Frauenlob <allkind@fastest.cc>
Martin Matuska <mm@FreeBSD.org>
Martin Rüegg <martin.rueegg@metaworx.ch>
Martin Wagner <martin.wagner.dev@gmail.com>
Massimo Maggi <me@massimo-maggi.eu>
Matt Johnston <matt@fugro-fsi.com.au>
Matt Kemp <matt@mattikus.com>
Mateusz Guzik <mjguzik@gmail.com>
Mateusz Piotrowski <0mp@FreeBSD.org>
Mathieu Velten <matmaul@gmail.com>
Matt Fiddaman <github@m.fiddaman.uk>
Matthew Ahrens <matt@delphix.com>
Matthew Thode <mthode@mthode.org>
Matthias Blankertz <matthias@blankertz.org>
Matt Johnston <matt@fugro-fsi.com.au>
Matt Kemp <matt@mattikus.com>
Matt Macy <mmacy@freebsd.org>
Matus Kral <matuskral@me.com>
Mauricio Faria de Oliveira <mfo@canonical.com>
Max Grossman <max.grossman@delphix.com>
Maxim Filimonov <che@bein.link>
Maximilian Mehnert <maximilian.mehnert@gmx.de>
Max Zettlmeißl <max@zettlmeissl.de>
Md Islam <mdnahian@outlook.com>
megari <megari@iki.fi>
Michael D Labriola <michael.d.labriola@gmail.com>
Michael Franzl <michael@franzl.name>
Michael Gebetsroither <michael@mgeb.org>
Michael Kjorling <michael@kjorling.se>
Michael Martin <mgmartin.mgm@gmail.com>
Michael Niewöhner <foss@mniewoehner.de>
Michael Zhivich <mzhivich@akamai.com>
Michal Vasilek <michal@vasilek.cz>
MigeljanImeri <ImeriMigel@gmail.com>
Mike Gerdts <mike.gerdts@joyent.com>
Mike Harsch <mike@harschsystems.com>
Mike Leddy <mike.leddy@gmail.com>
Mike Swanson <mikeonthecomputer@gmail.com>
Milan Jurik <milan.jurik@xylab.cz>
Minsoo Choo <minsoochoo0122@proton.me>
Mohamed Tawfik <m_tawfik@aucegypt.edu>
Morgan Jones <mjones@rice.edu>
Moritz Maxeiner <moritz@ucworks.org>
Mo Zhou <cdluminate@gmail.com>
naivekun <naivekun@outlook.com>
nathancheek <myself@nathancheek.com>
Nathaniel Clark <Nathaniel.Clark@misrule.us>
Nathaniel Wesley Filardo <nwf@cs.jhu.edu>
Nathan Lewis <linux.robotdude@gmail.com>
Nav Ravindranath <nav@delphix.com>
Neal Gompa (ニール・ゴンパ) <ngompa13@gmail.com>
Ned Bass <bass6@llnl.gov>
Neependra Khare <neependra@kqinfotech.com>
Neil Stockbridge <neil@dist.ro>
Nick Black <dank@qemfd.net>
Nick Garvey <garvey.nick@gmail.com>
Nick Mattis <nickm970@gmail.com>
Nick Terrell <terrelln@fb.com>
Niklas Haas <github-c6e1c8@haasn.xyz>
Nikolay Borisov <n.borisov.lkml@gmail.com>
nordaux <nordaux@gmail.com>
ofthesun9 <olivier@ofthesun.net>
Olaf Faaland <faaland1@llnl.gov>
Oleg Drokin <green@linuxhacker.ru>
Oleg Stepura <oleg@stepura.com>
Olivier Certner <olce.freebsd@certner.fr>
Olivier Mazouffre <olivier.mazouffre@ims-bordeaux.fr>
omni <omni+vagant@hack.org>
Orivej Desh <orivej@gmx.fr>
Pablo Correa Gómez <ablocorrea@hotmail.com>
Palash Gandhi <pbg4930@rit.edu>
Patrick Mooney <pmooney@pfmooney.com>
Patrik Greco <sikevux@sikevux.se>
Paul B. Henson <henson@acm.org>
Paul Dagnelie <pcd@delphix.com>
Paul Zuchowski <pzuchowski@datto.com>
Pavel Boldin <boldin.pavel@gmail.com>
Pavel Snajdr <snajpa@snajpa.net>
Pavel Zakharov <pavel.zakharov@delphix.com>
Pawel Jakub Dawidek <pjd@FreeBSD.org>
Pedro Giffuni <pfg@freebsd.org>
Peng <peng.hse@xtaotech.com>
Peter Ashford <ashford@accs.com>
Peter Dave Hello <hsu@peterdavehello.org>
Peter Doherty <peterd@acranox.org>
Peter Levine <plevine457@gmail.com>
Peter Wirdemo <peter.wirdemo@gmail.com>
Petros Koutoupis <petros@petroskoutoupis.com>
Philip Pokorny <ppokorny@penguincomputing.com>
Philipp Riederer <pt@philipptoelke.de>
Phil Kauffman <philip@kauffman.me>
Ping Huang <huangping@smartx.com>
Piotr Kubaj <pkubaj@anongoth.pl>
Piotr P. Stefaniak <pstef@freebsd.org>
Prakash Surya <prakash.surya@delphix.com>
Prasad Joshi <prasadjoshi124@gmail.com>
privb0x23 <privb0x23@users.noreply.github.com>
P.SCH <p88@yahoo.com>
Qiuhao Chen <chenqiuhao1997@gmail.com>
Quartz <yyhran@163.com>
Quentin Zdanis <zdanisq@gmail.com>
Rafael Kitover <rkitover@gmail.com>
RageLtMan <sempervictus@users.noreply.github.com>
Ralf Ertzinger <ralf@skytale.net>
Randall Mason <ClashTheBunny@gmail.com>
Remy Blank <remy.blank@pobox.com>
renelson <bnelson@nelsonbe.com>
Reno Reckling <e-github@wthack.de>
Ricardo M. Correia <ricardo.correia@oracle.com>
Rich Ercolani <rincebrain@gmail.com>
Riccardo Schirone <rschirone91@gmail.com>
Richard Allen <belperite@gmail.com>
Richard Elling <Richard.Elling@RichardElling.com>
Richard Kojedzinszky <richard@kojedz.in>
Richard Laager <rlaager@wiktel.com>
Richard Lowe <richlowe@richlowe.net>
Richard Sharpe <rsharpe@samba.org>
Richard Yao <ryao@gentoo.org>
Rich Ercolani <rincebrain@gmail.com>
Rick Macklem <rmacklem@uoguelph.ca>
rilysh <nightquick@proton.me>
Robert Evans <evansr@google.com>
Robert Novak <sailnfool@gmail.com>
Roberto Ricci <ricci@disroot.org>
Rob Norris <robn@despairlabs.com>
Rob Wing <rew@FreeBSD.org>
Rohan Puri <rohan.puri15@gmail.com>
Romain Dolbeau <romain.dolbeau@atos.net>
Roman Strashkin <roman.strashkin@nexenta.com>
Ross Williams <ross@ross-williams.net>
Ruben Kerkhof <ruben@rubenkerkhof.com>
Ryan <errornointernet@envs.net>
Ryan Hirasaki <ryanhirasaki@gmail.com>
Ryan Lahfa <masterancpp@gmail.com>
Ryan Libby <rlibby@FreeBSD.org>
Ryan Moeller <freqlabs@FreeBSD.org>
Sam Atkinson <samatk@amazon.com>
Sam Hathaway <github.com@munkynet.org>
Sam James <sam@gentoo.org>
Sam Lunt <samuel.j.lunt@gmail.com>
Samuel VERSCHELDE <stormi-github@ylix.fr>
Samuel Wycliffe <samuelwycliffe@gmail.com>
Samuel Wycliffe J <samwyc@hpe.com>
Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Sara Hartse <sara.hartse@delphix.com>
Saso Kiselkov <saso.kiselkov@nexenta.com>
Satadru Pramanik <satadru@gmail.com>
Savyasachee Jha <genghizkhan91@hawkradius.com>
Scott Colby <scott@scolby.com>
Scot W. Stevenson <scot.stevenson@gmail.com>
Sean Eric Fagan <sef@ixsystems.com>
Sebastian Gottschall <s.gottschall@dd-wrt.com>
Sebastien Roy <seb@delphix.com>
Sen Haerens <sen@senhaerens.be>
Serapheim Dimitropoulos <serapheim@delphix.com>
Seth Forshee <seth.forshee@canonical.com>
Seth Troisi <sethtroisi@google.com>
Shaan Nobee <sniper111@gmail.com>
Shampavman <sham.pavman@nexenta.com>
Shaun Tancheff <shaun@aeonazure.com>
Shawn Bayern <sbayern@law.fsu.edu>
Shengqi Chen <harry-chen@outlook.com>
Shen Yan <shenyanxxxy@qq.com>
Simon Guest <simon.guest@tesujimath.org>
Simon Klinkert <simon.klinkert@gmail.com>
Sowrabha Gopal <sowrabha.gopal@delphix.com>
Spencer Kinny <spencerkinny1995@gmail.com>
Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Stanislav Seletskiy <s.seletskiy@gmail.com>
Stefan Lendl <s.lendl@proxmox.com>
Steffen Müthing <steffen.muething@iwr.uni-heidelberg.de>
Stephen Blinick <stephen.blinick@delphix.com>
sterlingjensen <sterlingjensen@users.noreply.github.com>
Steve Dougherty <sdougherty@barracuda.com>
Steve Mokris <smokris@softpixel.com>
Steven Burgess <sburgess@dattobackup.com>
Steven Hartland <smh@freebsd.org>
Steven Johnson <sjohnson@sakuraindustries.com>
Steven Noonan <steven@uplinklabs.net>
stf <s@ctrlc.hu>
Stian Ellingsen <stian@plaimi.net>
Stoiko Ivanov <github@nomore.at>
Stéphane Lesimple <speed47_github@speed47.net>
Suman Chakravartula <schakrava@gmail.com>
Sydney Vanda <sydney.m.vanda@intel.com>
Sören Tempel <soeren+git@soeren-tempel.net>
Tamas TEVESZ <ice@extreme.hu>
Teodor Spæren <teodor_spaeren@riseup.net>
TerraTech <TerraTech@users.noreply.github.com>
Thijs Cramer <thijs.cramer@gmail.com>
Thomas Bertschinger <bertschinger@lanl.gov>
Thomas Geppert <geppi@digitx.de>
Thomas Lamprecht <guggentom@hotmail.de>
Till Maas <opensource@till.name>
Tim Chase <tim@chase2k.com>
Tim Connors <tconnors@rather.puzzling.org>
Tim Crawford <tcrawford@datto.com>
Tim Haley <Tim.Haley@Sun.COM>
timor <timor.dd@googlemail.com>
Timothy Day <tday141@gmail.com>
Tim Schumacher <timschumi@gmx.de>
Tino Reichardt <milky-zfs@mcmilk.de>
Tobin Harding <me@tobin.cc>
Todd Seidelmann <seidelma@users.noreply.github.com>
Tom Caputi <tcaputi@datto.com>
Tom Matthews <tom@axiom-partners.com>
Tom Prince <tom.prince@ualberta.net>
Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Tom Prince <tom.prince@ualberta.net>
Tony Hutter <hutter2@llnl.gov>
Tony Nguyen <tony.nguyen@delphix.com>
Tony Perkins <tperkins@datto.com>
Toomas Soome <tsoome@me.com>
Torsten Wörtwein <twoertwein@gmail.com>
Toyam Cox <aviator45003@gmail.com>
Trevor Bautista <trevrb@trevrb.net>
Trey Dockendorf <treydock@gmail.com>
Troels Nørgaard <tnn@tradeshift.com>
Tulsi Jain <tulsi.jain@delphix.com>
Turbo Fredriksson <turbo@bayour.com>
Tyler J. Stachecki <stachecki.tyler@gmail.com>
Umer Saleem <usaleem@ixsystems.com>
Vaibhav Bhanawat <vaibhav.bhanawat@delphix.com>
Valmiky Arquissandas <kayvlim@gmail.com>
Val Packett <val@packett.cool>
Vince van Oosten <techhazard@codeforyouand.me>
Violet Purcell <vimproved@inventati.org>
Vipin Kumar Verma <vipin.verma@hpe.com>
Vitaut Bajaryn <vitaut.bayaryn@gmail.com>
Volker Mauel <volkermauel@gmail.com>
Václav Skála <skala@vshosting.cz>
Walter Huf <hufman@gmail.com>
Warner Losh <imp@bsdimp.com>
Weigang Li <weigang.li@intel.com>
WHR <msl0000023508@gmail.com>
Will Andrews <will@freebsd.org>
Will Rouesnel <w.rouesnel@gmail.com>
Windel Bouwman <windel@windel.nl>
Wojciech Małota-Wójcik <outofforest@users.noreply.github.com>
Wolfgang Bumiller <w.bumiller@proxmox.com>
Xin Li <delphij@FreeBSD.org>
Xinliang Liu <xinliang.liu@linaro.org>
xtouqh <xtouqh@hotmail.com>
Yann Collet <cyan@fb.com>
Yanping Gao <yanping.gao@xtaotech.com>
Ying Zhu <casualfisher@gmail.com>
Youzhong Yang <youzhong@gmail.com>
yparitcher <y@paritcher.com>
yuina822 <ayuichi@club.kyutech.ac.jp>
YunQiang Su <syq@debian.org>
Yuri Pankov <yuri.pankov@gmail.com>
Yuxin Wang <yuxinwang9999@gmail.com>
Yuxuan Shui <yshuiv7@gmail.com>
Zachary Bedell <zac@thebedells.org>
Zach Dykstra <dykstra.zachary@gmail.com>
zgock <zgock@nuc.base.zgock-lab.net>
Zhao Yongming <zym@apache.org>
Zhenlei Huang <zlei@FreeBSD.org>
Zhu Chuang <chuang@melty.land>
Érico Nogueira <erico.erc@gmail.com>
Đoàn Trần Công Danh <congdanhqx@gmail.com>
韩朴宇 <w12101111@gmail.com>
+2 -2
View File
@@ -1,2 +1,2 @@
The [OpenZFS Code of Conduct](http://www.open-zfs.org/wiki/Code_of_Conduct)
applies to spaces associated with the ZFS on Linux project, including GitHub.
The [OpenZFS Code of Conduct](https://openzfs.org/wiki/Code_of_Conduct)
applies to spaces associated with the OpenZFS project, including GitHub.
+5 -1
View File
@@ -19,7 +19,11 @@ notable exceptions and their respective licenses include:
* AES Implementation: module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman
* AES Implementation: module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl
* PBKDF2 Implementation: lib/libzfs/THIRDPARTYLICENSE.openssl
* SPL Implementation: module/spl/THIRDPARTYLICENSE.gplv2
* SPL Implementation: module/os/linux/spl/THIRDPARTYLICENSE.gplv2
* GCM Implementation: module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams
* GCM Implementation: module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl
* GHASH Implementation: module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams
* GHASH Implementation: module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl
This product includes software developed by the OpenSSL Project for use
in the OpenSSL Toolkit (http://www.openssl.org/)
+4 -4
View File
@@ -1,10 +1,10 @@
Meta: 1
Name: zfs
Branch: 1.0
Version: 0.8.0
Version: 2.2.5
Release: 1
Release-Tags: relext
License: CDDL
Author: OpenZFS on Linux
Linux-Maximum: 5.1
Linux-Minimum: 2.6.32
Author: OpenZFS
Linux-Maximum: 6.9
Linux-Minimum: 3.10
+137 -96
View File
@@ -1,46 +1,83 @@
CLEANFILES =
dist_noinst_DATA =
INSTALL_DATA_HOOKS =
ALL_LOCAL =
CLEAN_LOCAL =
CHECKS = shellcheck checkbashisms
include $(top_srcdir)/config/Rules.am
include $(top_srcdir)/config/CppCheck.am
include $(top_srcdir)/config/Shellcheck.am
include $(top_srcdir)/config/Substfiles.am
include $(top_srcdir)/scripts/Makefile.am
ACLOCAL_AMFLAGS = -I config
include config/rpm.am
include config/deb.am
include config/tgz.am
SUBDIRS = include rpm
if CONFIG_USER
SUBDIRS += udev etc man scripts lib tests cmd contrib
SUBDIRS = include
if BUILD_LINUX
include $(srcdir)/%D%/rpm/Makefile.am
endif
if CONFIG_USER
include $(srcdir)/%D%/cmd/Makefile.am
include $(srcdir)/%D%/contrib/Makefile.am
include $(srcdir)/%D%/etc/Makefile.am
include $(srcdir)/%D%/lib/Makefile.am
include $(srcdir)/%D%/man/Makefile.am
include $(srcdir)/%D%/tests/Makefile.am
if BUILD_LINUX
include $(srcdir)/%D%/udev/Makefile.am
endif
endif
CPPCHECKDIRS += module
if CONFIG_KERNEL
SUBDIRS += module
extradir = $(prefix)/src/zfs-$(VERSION)
extra_HEADERS = zfs.release.in zfs_config.h.in
kerneldir = $(prefix)/src/zfs-$(VERSION)/$(LINUX_VERSION)
nodist_kernel_HEADERS = zfs.release zfs_config.h module/$(LINUX_SYMBOLS)
endif
AUTOMAKE_OPTIONS = foreign
EXTRA_DIST = autogen.sh copy-builtin
EXTRA_DIST += config/config.awk config/rpm.am config/deb.am config/tgz.am
EXTRA_DIST += META AUTHORS COPYRIGHT LICENSE NEWS NOTICE README.md
EXTRA_DIST += CODE_OF_CONDUCT.md
dist_noinst_DATA += autogen.sh copy-builtin
dist_noinst_DATA += AUTHORS CODE_OF_CONDUCT.md COPYRIGHT LICENSE META NEWS NOTICE
dist_noinst_DATA += README.md RELEASES.md
dist_noinst_DATA += module/lua/README.zfs module/os/linux/spl/README.md
# Include all the extra licensing information for modules
EXTRA_DIST += module/icp/algs/skein/THIRDPARTYLICENSE module/icp/algs/skein/THIRDPARTYLICENSE.descrip
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman.descrip
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl.descrip
EXTRA_DIST += module/spl/THIRDPARTYLICENSE.gplv2 module/spl/THIRDPARTYLICENSE.gplv2.descrip
EXTRA_DIST += module/zfs/THIRDPARTYLICENSE.cityhash module/zfs/THIRDPARTYLICENSE.cityhash.descrip
dist_noinst_DATA += module/icp/algs/skein/THIRDPARTYLICENSE
dist_noinst_DATA += module/icp/algs/skein/THIRDPARTYLICENSE.descrip
dist_noinst_DATA += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman
dist_noinst_DATA += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman.descrip
dist_noinst_DATA += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl
dist_noinst_DATA += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl.descrip
dist_noinst_DATA += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams
dist_noinst_DATA += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams.descrip
dist_noinst_DATA += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl
dist_noinst_DATA += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl.descrip
dist_noinst_DATA += module/os/linux/spl/THIRDPARTYLICENSE.gplv2
dist_noinst_DATA += module/os/linux/spl/THIRDPARTYLICENSE.gplv2.descrip
dist_noinst_DATA += module/zfs/THIRDPARTYLICENSE.cityhash
dist_noinst_DATA += module/zfs/THIRDPARTYLICENSE.cityhash.descrip
@CODE_COVERAGE_RULES@
.PHONY: gitrev
GITREV = include/zfs_gitrev.h
CLEANFILES += $(GITREV)
PHONY += gitrev
gitrev:
-${top_srcdir}/scripts/make_gitrev.sh
$(AM_V_GEN)$(top_srcdir)/scripts/make_gitrev.sh $(GITREV)
BUILT_SOURCES = gitrev
all: gitrev
distclean-local::
-$(RM) -R autom4te*.cache
PHONY += install-data-hook $(INSTALL_DATA_HOOKS)
install-data-hook: $(INSTALL_DATA_HOOKS)
PHONY += maintainer-clean-local
maintainer-clean-local:
-$(RM) $(GITREV)
PHONY += distclean-local
distclean-local:
-$(RM) -R autom4te*.cache build
-find . \( -name SCCS -o -name BitKeeper -o -name .svn -o -name CVS \
-o -name .pc -o -name .hg -o -name .git \) -prune -o \
\( -name '*.orig' -o -name '*.rej' -o -name '*~' \
@@ -49,118 +86,122 @@ distclean-local::
-o -name 'core' -o -name 'Makefile' -o -name 'Module.symvers' \
-o -name '*.order' -o -name '*.markers' -o -name '*.gcda' \
-o -name '*.gcno' \) \
-type f -print | xargs $(RM)
-type f -delete
all-local:
-${top_srcdir}/scripts/zfs-tests.sh -c
PHONY += $(CLEAN_LOCAL)
clean-local: $(CLEAN_LOCAL)
dist-hook: gitrev
cp ${top_srcdir}/include/zfs_gitrev.h $(distdir)/include; \
sed -i 's/Release:[[:print:]]*/Release: $(RELEASE)/' \
$(distdir)/META
PHONY += $(ALL_LOCAL)
all-local: $(ALL_LOCAL)
# For compatibility, create a matching spl-x.y.z directly which contains
# symlinks to the updated header and object file locations. These
# compatibility links will be removed in the next major release.
if CONFIG_KERNEL
install-data-hook:
rm -rf $(DESTDIR)$(prefix)/src/spl-$(VERSION) && \
mkdir $(DESTDIR)$(prefix)/src/spl-$(VERSION) && \
cd $(DESTDIR)$(prefix)/src/spl-$(VERSION) && \
ln -s ../zfs-$(VERSION)/include/spl include && \
ln -s ../zfs-$(VERSION)/$(LINUX_VERSION) $(LINUX_VERSION) && \
ln -s ../zfs-$(VERSION)/zfs_config.h.in spl_config.h.in && \
ln -s ../zfs-$(VERSION)/zfs.release.in spl.release.in && \
cd $(DESTDIR)$(prefix)/src/zfs-$(VERSION)/$(LINUX_VERSION) && \
ln -fs zfs_config.h spl_config.h && \
ln -fs zfs.release spl.release
endif
dist-hook:
$(top_srcdir)/scripts/make_gitrev.sh -D $(distdir) $(GITREV)
$(SED) $(ac_inplace) 's/\(Release:[[:space:]]*\).*/\1$(RELEASE)/' $(distdir)/META
codecheck: cstyle shellcheck flake8 mancheck testscheck vcscheck
PHONY += codecheck $(CHECKS)
codecheck: $(CHECKS)
SHELLCHECKSCRIPTS += autogen.sh
PHONY += checkstyle
checkstyle: codecheck commitcheck
PHONY += commitcheck
commitcheck:
@if git rev-parse --git-dir > /dev/null 2>&1; then \
$(AM_V_at)if git rev-parse --git-dir > /dev/null 2>&1; then \
${top_srcdir}/scripts/commitcheck.sh; \
fi
if HAVE_PARALLEL
cstyle_line = -print0 | parallel -X0 ${top_srcdir}/scripts/cstyle.pl -cpP {}
else
cstyle_line = -exec ${top_srcdir}/scripts/cstyle.pl -cpP {} +
endif
CHECKS += cstyle
cstyle:
@find ${top_srcdir} -name '*.[hc]' ! -name 'zfs_config.*' \
! -name '*.mod.c' -type f \
-exec ${top_srcdir}/scripts/cstyle.pl -cpP {} \+
shellcheck:
@if type shellcheck > /dev/null 2>&1; then \
shellcheck --exclude=SC1090 --format=gcc \
$$(find ${top_srcdir}/scripts/*.sh -type f) \
$$(find ${top_srcdir}/cmd/zed/zed.d/*.sh -type f) \
$$(find ${top_srcdir}/cmd/zpool/zpool.d/* -executable); \
else \
echo "skipping shellcheck because shellcheck is not installed"; \
fi
mancheck:
@if type mandoc > /dev/null 2>&1; then \
find ${top_srcdir}/man/man8 -type f -name 'zfs.8' \
-o -name 'zpool.8' -o -name 'zdb.8' \
-o -name 'zgenhostid.8' | \
xargs mandoc -Tlint -Werror; \
else \
echo "skipping mancheck because mandoc is not installed"; \
fi
$(AM_V_at)find $(top_srcdir) -name build -prune \
-o -type f -name '*.[hc]' \
! -name 'zfs_config.*' ! -name '*.mod.c' \
! -name 'opt_global.h' ! -name '*_if*.h' \
! -name 'zstd_compat_wrapper.h' \
! -path './module/zstd/lib/*' \
! -path './include/sys/lua/*' \
! -path './module/lua/l*.[ch]' \
! -path './module/zfs/lz4.c' \
$(cstyle_line)
filter_executable = -exec test -x '{}' \; -print
CHECKS += testscheck
testscheck:
@find ${top_srcdir}/tests/zfs-tests/tests -type f \
\( -name '*.ksh' -not -executable \) -o \
\( -name '*.kshlib' -executable \) -o \
\( -name '*.cfg' -executable \) | \
xargs -r stat -c '%A %n' | \
awk '{c++; print} END {if(c>0) exit 1}'
$(AM_V_at)[ $$(find $(top_srcdir)/tests/zfs-tests -type f \
\( -name '*.ksh' -not $(filter_executable) \) -o \
\( -name '*.kshlib' $(filter_executable) \) -o \
\( -name '*.shlib' $(filter_executable) \) -o \
\( -name '*.cfg' $(filter_executable) \) | \
tee /dev/stderr | wc -l) -eq 0 ]
CHECKS += vcscheck
vcscheck:
@if git rev-parse --git-dir > /dev/null 2>&1; then \
$(AM_V_at)if git rev-parse --git-dir > /dev/null 2>&1; then \
git ls-files . --exclude-standard --others | \
awk '{c++; print} END {if(c>0) exit 1}' ; \
fi
CHECKS += zstdcheck
zstdcheck:
@$(MAKE) -C module check-zstd-symbols
PHONY += lint
lint: cppcheck paxcheck
cppcheck:
@if type cppcheck > /dev/null 2>&1; then \
cppcheck --quiet --force --error-exitcode=2 --inline-suppr \
--suppressions-list=.github/suppressions.txt \
-UHAVE_SSE2 -UHAVE_AVX512F -UHAVE_UIO_ZEROCOPY \
${top_srcdir}; \
else \
echo "skipping cppcheck because cppcheck is not installed"; \
fi
PHONY += paxcheck
paxcheck:
@if type scanelf > /dev/null 2>&1; then \
${top_srcdir}/scripts/paxcheck.sh ${top_srcdir}; \
$(AM_V_at)if type scanelf > /dev/null 2>&1; then \
$(top_srcdir)/scripts/paxcheck.sh $(top_builddir); \
else \
echo "skipping paxcheck because scanelf is not installed"; \
fi
CHECKS += flake8
flake8:
@if type flake8 > /dev/null 2>&1; then \
flake8 ${top_srcdir}; \
$(AM_V_at)if type flake8 > /dev/null 2>&1; then \
flake8 $(top_srcdir); \
else \
echo "skipping flake8 because flake8 is not installed"; \
fi
PHONY += regen-tests
regen-tests:
@$(MAKE) -C tests/zfs-tests/tests regen
PHONY += ctags
ctags:
$(RM) tags
find $(top_srcdir) -name .git -prune -o -name '*.[hc]' | xargs ctags
find $(top_srcdir) -name '.?*' -prune \
-o -type f -name '*.[hcS]' -exec ctags -a {} +
PHONY += etags
etags:
$(RM) TAGS
find $(top_srcdir) -name .pc -prune -o -name '*.[hc]' | xargs etags -a
find $(top_srcdir) -name '.?*' -prune \
-o -type f -name '*.[hcS]' -exec etags -a {} +
PHONY += cscopelist
cscopelist:
find $(top_srcdir) -name '.?*' -prune \
-o -type f -name '*.[hc]' -print >cscope.files
PHONY += tags
tags: ctags etags
PHONY += pkg pkg-dkms pkg-kmod pkg-utils
pkg: @DEFAULT_PACKAGE@
pkg-dkms: @DEFAULT_PACKAGE@-dkms
pkg-kmod: @DEFAULT_PACKAGE@-kmod
pkg-utils: @DEFAULT_PACKAGE@-utils
include config/rpm.am
include config/deb.am
include config/tgz.am
.PHONY: $(PHONY)
+1 -1
View File
@@ -1,3 +1,3 @@
Descriptions of all releases can be found on github:
https://github.com/zfsonlinux/zfs/releases
https://github.com/openzfs/zfs/releases
+16 -12
View File
@@ -1,31 +1,35 @@
![img](http://zfsonlinux.org/images/zfs-linux.png)
![img](https://openzfs.github.io/openzfs-docs/_static/img/logo/480px-Open-ZFS-Secondary-Logo-Colour-halfsize.png)
ZFS on Linux is an advanced file system and volume manager which was originally
OpenZFS is an advanced file system and volume manager which was originally
developed for Solaris and is now maintained by the OpenZFS community.
This repository contains the code for running OpenZFS on Linux and FreeBSD.
[![codecov](https://codecov.io/gh/zfsonlinux/zfs/branch/master/graph/badge.svg)](https://codecov.io/gh/zfsonlinux/zfs)
[![coverity](https://scan.coverity.com/projects/1973/badge.svg)](https://scan.coverity.com/projects/zfsonlinux-zfs)
[![codecov](https://codecov.io/gh/openzfs/zfs/branch/master/graph/badge.svg)](https://codecov.io/gh/openzfs/zfs)
[![coverity](https://scan.coverity.com/projects/1973/badge.svg)](https://scan.coverity.com/projects/openzfs-zfs)
# Official Resources
* [Site](http://zfsonlinux.org)
* [Wiki](https://github.com/zfsonlinux/zfs/wiki)
* [Mailing lists](https://github.com/zfsonlinux/zfs/wiki/Mailing-Lists)
* [OpenZFS site](http://open-zfs.org/)
* [Documentation](https://openzfs.github.io/openzfs-docs/) - for using and developing this repo
* [ZoL Site](https://zfsonlinux.org) - Linux release info & links
* [Mailing lists](https://openzfs.github.io/openzfs-docs/Project%20and%20Community/Mailing%20Lists.html)
* [OpenZFS site](https://openzfs.org/) - for conference videos and info on other platforms (illumos, OSX, Windows, etc)
# Installation
Full documentation for installing ZoL on your favorite Linux distribution can
be found at [our site](http://zfsonlinux.org/).
Full documentation for installing OpenZFS on your favorite operating system can
be found at the [Getting Started Page](https://openzfs.github.io/openzfs-docs/Getting%20Started/index.html).
# Contribute & Develop
We have a separate document with [contribution guidelines](./.github/CONTRIBUTING.md).
We have a [Code of Conduct](./CODE_OF_CONDUCT.md).
# Release
ZFS on Linux is released under a CDDL license.
OpenZFS is released under a CDDL license.
For more details see the NOTICE, LICENSE and COPYRIGHT files; `UCRL-CODE-235197`
# Supported Kernels
* The `META` file contains the officially recognized supported kernel versions.
* The `META` file contains the officially recognized supported Linux kernel versions.
* Supported FreeBSD versions are any supported branches and releases starting from 12.4-RELEASE.
+37
View File
@@ -0,0 +1,37 @@
OpenZFS uses the MAJOR.MINOR.PATCH versioning scheme described here:
* MAJOR - Incremented at the discretion of the OpenZFS developers to indicate
a particularly noteworthy feature or change. An increase in MAJOR number
does not indicate any incompatible on-disk format change. The ability
to import a ZFS pool is controlled by the feature flags enabled on the
pool and the feature flags supported by the installed OpenZFS version.
Increasing the MAJOR version is expected to be an infrequent occurrence.
* MINOR - Incremented to indicate new functionality such as a new feature
flag, pool/dataset property, zfs/zpool sub-command, new user/kernel
interface, etc. MINOR releases may introduce incompatible changes to the
user space library APIs (libzfs.so). Existing user/kernel interfaces are
considered to be stable to maximize compatibility between OpenZFS releases.
Additions to the user/kernel interface are backwards compatible.
* PATCH - Incremented when applying documentation updates, important bug
fixes, minor performance improvements, and kernel compatibility patches.
The user space library APIs and user/kernel interface are considered to
be stable. PATCH releases for a MAJOR.MINOR are published as needed.
Two release branches are maintained for OpenZFS, they are:
* OpenZFS LTS - A designated MAJOR.MINOR release with periodic PATCH
releases that incorporate important changes backported from newer OpenZFS
releases. This branch is intended for use in environments using an
LTS, enterprise, or similarly managed kernel (RHEL, Ubuntu LTS, Debian).
Minor changes to support these distribution kernels will be applied as
needed. New kernel versions released after the OpenZFS LTS release are
not supported. LTS releases will receive patches for at least 2 years.
The current LTS release is OpenZFS 2.1.
* OpenZFS current - Tracks the newest MAJOR.MINOR release. This branch
includes support for the latest OpenZFS features and recently releases
kernels. When a new MINOR release is tagged the previous MINOR release
will no longer be maintained (unless it is an LTS release). New MINOR
releases are planned to occur roughly annually.
-61
View File
@@ -48,64 +48,3 @@
#TEST_ZFSSTRESS_VDEV="/var/tmp/vdev"
#TEST_ZFSSTRESS_DIR="/$TEST_ZFSSTRESS_POOL/$TEST_ZFSSTRESS_FS"
#TEST_ZFSSTRESS_OPTIONS=""
### per-builder customization
#
# BB_NAME=builder-name <distribution-version-architecture-type>
# - distribution=Amazon,Debian,Fedora,RHEL,SUSE,Ubuntu
# - version=x.y
# - architecture=x86_64,i686,arm,aarch64
# - type=build,test
#
case "$BB_NAME" in
Amazon*)
# ZFS enabled xfstests fails to build
TEST_XFSTESTS_SKIP="yes"
;;
CentOS-7*)
# ZFS enabled xfstests fails to build
TEST_XFSTESTS_SKIP="yes"
;;
CentOS-6*)
;;
Debian*)
;;
Fedora*)
;;
RHEL*)
;;
SUSE*)
;;
Ubuntu-16.04*)
# ZFS enabled xfstests fails to build
TEST_XFSTESTS_SKIP="yes"
;;
Ubuntu*)
;;
*)
;;
esac
###
#
# Run ztest longer on the "coverage" builders to gain more code coverage
# data out of ztest, libzpool, etc.
#
case "$BB_NAME" in
*coverage*)
TEST_ZTEST_TIMEOUT=3600
;;
*)
TEST_ZTEST_TIMEOUT=900
;;
esac
###
#
# Disable the following test suites on 32-bit systems.
#
if [ $(getconf LONG_BIT) = "32" ]; then
TEST_ZTEST_SKIP="yes"
TEST_XFSTESTS_SKIP="yes"
TEST_ZFSSTRESS_SKIP="yes"
fi
+60 -2
View File
@@ -1,4 +1,62 @@
#!/bin/sh
[ "${0%/*}" = "$0" ] || cd "${0%/*}" || exit
autoreconf -fiv || exit 1
rm -Rf autom4te.cache
# %reldir%/%canon_reldir% (%D%/%C%) only appeared in automake 1.14, but RHEL/CentOS 7 has 1.13.4
# This is an (overly) simplistic preprocessor that papers around this for the duration of the generation step,
# and can be removed once support for CentOS 7 is dropped
automake --version | awk '{print $NF; exit}' | (
IFS=. read -r AM_MAJ AM_MIN _
[ "$AM_MAJ" -gt 1 ] || [ "$AM_MIN" -ge 14 ]
) || {
process_root() {
root="$1"; shift
grep -q '%[CD]%' "$root/Makefile.am" || return
find "$root" -name Makefile.am "$@" | while read -r dir; do
dir="${dir%/Makefile.am}"
grep -q '%[CD]%' "$dir/Makefile.am" || continue
reldir="${dir#"$root"}"
reldir="${reldir#/}"
canon_reldir="$(printf '%s' "$reldir" | tr -C 'a-zA-Z0-9@_' '_')"
reldir_slash="$reldir/"
canon_reldir_slash="${canon_reldir}_"
[ -z "$reldir" ] && reldir_slash=
[ -z "$reldir" ] && canon_reldir_slash=
echo "$dir/Makefile.am" >&3
sed -i~ -e "s:%D%/:$reldir_slash:g" -e "s:%D%:$reldir:g" \
-e "s:%C%_:$canon_reldir_slash:g" -e "s:%C%:$canon_reldir:g" "$dir/Makefile.am"
done 3>>"$substituted_files"
}
rollback() {
while read -r f; do
mv "$f~" "$f"
done < "$substituted_files"
rm -f "$substituted_files"
}
echo "Automake <1.14; papering over missing %reldir%/%canon_reldir% support" >&2
substituted_files="$(mktemp)"
trap rollback EXIT
roots="$(sed '/Makefile$/!d;/module/d;s:^\s*:./:;s:/Makefile::;/^\.$/d' configure.ac)"
IFS="
"
for root in $roots; do
root="${root#./}"
process_root "$root"
done
set -f
# shellcheck disable=SC2086,SC2046
process_root . $(printf '!\n-path\n%s/*\n' $roots)
}
autoreconf -fiv && rm -rf autom4te.cache
+116 -3
View File
@@ -1,3 +1,116 @@
SUBDIRS = zfs zpool zdb zhack zinject zstreamdump ztest
SUBDIRS += mount_zfs fsck_zfs zvol_id vdev_id arcstat dbufstat zed
SUBDIRS += arc_summary raidz_test zgenhostid
bin_SCRIPTS =
bin_PROGRAMS =
sbin_SCRIPTS =
sbin_PROGRAMS =
dist_bin_SCRIPTS =
zfsexec_PROGRAMS =
mounthelper_PROGRAMS =
sbin_SCRIPTS += fsck.zfs
SHELLCHECKSCRIPTS += fsck.zfs
CLEANFILES += fsck.zfs
dist_noinst_DATA += %D%/fsck.zfs.in
$(call SUBST,fsck.zfs,%D%/)
sbin_PROGRAMS += zfs_ids_to_path
CPPCHECKTARGETS += zfs_ids_to_path
zfs_ids_to_path_SOURCES = \
%D%/zfs_ids_to_path.c
zfs_ids_to_path_LDADD = \
libzfs.la
zhack_CPPFLAGS = $(AM_CPPFLAGS) $(FORCEDEBUG_CPPFLAGS)
sbin_PROGRAMS += zhack
CPPCHECKTARGETS += zhack
zhack_SOURCES = \
%D%/zhack.c
zhack_LDADD = \
libzpool.la \
libzfs_core.la \
libnvpair.la
ztest_CFLAGS = $(AM_CFLAGS) $(KERNEL_CFLAGS)
# Get rid of compiler warning for unchecked truncating snprintfs on gcc 7.1.1
ztest_CFLAGS += $(NO_FORMAT_TRUNCATION)
ztest_CPPFLAGS = $(AM_CPPFLAGS) $(FORCEDEBUG_CPPFLAGS)
sbin_PROGRAMS += ztest
CPPCHECKTARGETS += ztest
ztest_SOURCES = \
%D%/ztest.c
ztest_LDADD = \
libzpool.la \
libzfs_core.la \
libnvpair.la
ztest_LDADD += -lm
ztest_LDFLAGS = -pthread
include $(srcdir)/%D%/raidz_test/Makefile.am
include $(srcdir)/%D%/zdb/Makefile.am
include $(srcdir)/%D%/zfs/Makefile.am
include $(srcdir)/%D%/zinject/Makefile.am
include $(srcdir)/%D%/zpool/Makefile.am
include $(srcdir)/%D%/zpool_influxdb/Makefile.am
include $(srcdir)/%D%/zstream/Makefile.am
if BUILD_LINUX
mounthelper_PROGRAMS += mount.zfs
CPPCHECKTARGETS += mount.zfs
mount_zfs_SOURCES = \
%D%/mount_zfs.c
mount_zfs_LDADD = \
libzfs.la \
libzfs_core.la \
libnvpair.la
mount_zfs_LDADD += $(LTLIBINTL)
CPPCHECKTARGETS += raidz_test
sbin_PROGRAMS += zgenhostid
CPPCHECKTARGETS += zgenhostid
zgenhostid_SOURCES = \
%D%/zgenhostid.c
dist_bin_SCRIPTS += %D%/zvol_wait
SHELLCHECKSCRIPTS += %D%/zvol_wait
include $(srcdir)/%D%/zed/Makefile.am
endif
if USING_PYTHON
bin_SCRIPTS += arc_summary arcstat dbufstat zilstat
CLEANFILES += arc_summary arcstat dbufstat zilstat
dist_noinst_DATA += %D%/arc_summary %D%/arcstat.in %D%/dbufstat.in %D%/zilstat.in
$(call SUBST,arcstat,%D%/)
$(call SUBST,dbufstat,%D%/)
$(call SUBST,zilstat,%D%/)
arc_summary: %D%/arc_summary
$(AM_V_at)cp $< $@
endif
PHONY += cmd
cmd: $(bin_SCRIPTS) $(bin_PROGRAMS) $(sbin_SCRIPTS) $(sbin_PROGRAMS) $(dist_bin_SCRIPTS) $(zfsexec_PROGRAMS) $(mounthelper_PROGRAMS)
+425 -257
View File
@@ -1,4 +1,4 @@
#!/usr/bin/python3
#!/usr/bin/env python3
#
# Copyright (c) 2008 Ben Rockwood <benr@cuddletech.com>,
# Copyright (c) 2010 Martin Matuska <mm@FreeBSD.org>,
@@ -32,7 +32,7 @@
Provides basic information on the ARC, its efficiency, the L2ARC (if present),
the Data Management Unit (DMU), Virtual Devices (VDEVs), and tunables. See
the in-source documentation and code at
https://github.com/zfsonlinux/zfs/blob/master/module/zfs/arc.c for details.
https://github.com/openzfs/zfs/blob/master/module/zfs/arc.c for details.
The original introduction to arc_summary can be found at
http://cuddletech.com/?p=454
"""
@@ -42,13 +42,17 @@ import os
import subprocess
import sys
import time
import errno
DECRIPTION = 'Print ARC and other statistics for ZFS on Linux'
# We can't use env -S portably, and we need python3 -u to handle pipes in
# the shell abruptly closing the way we want to, so...
import io
if isinstance(sys.__stderr__.buffer, io.BufferedWriter):
os.execv(sys.executable, [sys.executable, "-u"] + sys.argv)
DESCRIPTION = 'Print ARC and other statistics for OpenZFS'
INDENT = ' '*8
LINE_LENGTH = 72
PROC_PATH = '/proc/spl/kstat/zfs/'
SPL_PATH = '/sys/module/spl/parameters/'
TUNABLES_PATH = '/sys/module/zfs/parameters/'
DATE_FORMAT = '%a %b %d %H:%M:%S %Y'
TITLE = 'ZFS Subsystem Report'
@@ -60,12 +64,10 @@ SECTION_HELP = 'print info from one section ('+' '.join(SECTIONS)+')'
SECTION_PATHS = {'arc': 'arcstats',
'dmu': 'dmu_tx',
'l2arc': 'arcstats', # L2ARC stuff lives in arcstats
'vdev': 'vdev_cache_stats',
'xuio': 'xuio_stats',
'zfetch': 'zfetchstats',
'zil': 'zil'}
parser = argparse.ArgumentParser(description=DECRIPTION)
parser = argparse.ArgumentParser(description=DESCRIPTION)
parser.add_argument('-a', '--alternate', action='store_true', default=False,
help='use alternate formatting for tunables and SPL',
dest='alt')
@@ -83,6 +85,160 @@ parser.add_argument('-s', '--section', dest='section', help=SECTION_HELP)
ARGS = parser.parse_args()
if sys.platform.startswith('freebsd'):
# Requires py36-sysctl on FreeBSD
import sysctl
def is_value(ctl):
return ctl.type != sysctl.CTLTYPE_NODE
def namefmt(ctl, base='vfs.zfs.'):
# base is removed from the name
cut = len(base)
return ctl.name[cut:]
def load_kstats(section):
base = 'kstat.zfs.misc.{section}.'.format(section=section)
fmt = lambda kstat: '{name} : {value}'.format(name=namefmt(kstat, base),
value=kstat.value)
kstats = sysctl.filter(base)
return [fmt(kstat) for kstat in kstats if is_value(kstat)]
def get_params(base):
ctls = sysctl.filter(base)
return {namefmt(ctl): str(ctl.value) for ctl in ctls if is_value(ctl)}
def get_tunable_params():
return get_params('vfs.zfs')
def get_vdev_params():
return get_params('vfs.zfs.vdev')
def get_version_impl(request):
# FreeBSD reports versions for zpl and spa instead of zfs and spl.
name = {'zfs': 'zpl',
'spl': 'spa'}[request]
mib = 'vfs.zfs.version.{}'.format(name)
version = sysctl.filter(mib)[0].value
return '{} version {}'.format(name, version)
def get_descriptions(_request):
ctls = sysctl.filter('vfs.zfs')
return {namefmt(ctl): ctl.description for ctl in ctls if is_value(ctl)}
elif sys.platform.startswith('linux'):
KSTAT_PATH = '/proc/spl/kstat/zfs'
SPL_PATH = '/sys/module/spl/parameters'
TUNABLES_PATH = '/sys/module/zfs/parameters'
def load_kstats(section):
path = os.path.join(KSTAT_PATH, section)
with open(path) as f:
return list(f)[2:] # Get rid of header
def get_params(basepath):
"""Collect information on the Solaris Porting Layer (SPL) or the
tunables, depending on the PATH given. Does not check if PATH is
legal.
"""
result = {}
for name in os.listdir(basepath):
path = os.path.join(basepath, name)
with open(path) as f:
value = f.read()
result[name] = value.strip()
return result
def get_spl_params():
return get_params(SPL_PATH)
def get_tunable_params():
return get_params(TUNABLES_PATH)
def get_vdev_params():
return get_params(TUNABLES_PATH)
def get_version_impl(request):
# The original arc_summary called /sbin/modinfo/{spl,zfs} to get
# the version information. We switch to /sys/module/{spl,zfs}/version
# to make sure we get what is really loaded in the kernel
try:
with open("/sys/module/{}/version".format(request)) as f:
return f.read().strip()
except:
return "(unknown)"
def get_descriptions(request):
"""Get the descriptions of the Solaris Porting Layer (SPL) or the
tunables, return with minimal formatting.
"""
if request not in ('spl', 'zfs'):
print('ERROR: description of "{0}" requested)'.format(request))
sys.exit(1)
descs = {}
target_prefix = 'parm:'
# We would prefer to do this with /sys/modules -- see the discussion at
# get_version() -- but there isn't a way to get the descriptions from
# there, so we fall back on modinfo
command = ["/sbin/modinfo", request, "-0"]
info = ''
try:
info = subprocess.run(command, stdout=subprocess.PIPE,
check=True, universal_newlines=True)
raw_output = info.stdout.split('\0')
except subprocess.CalledProcessError:
print("Error: Descriptions not available",
"(can't access kernel module)")
sys.exit(1)
for line in raw_output:
if not line.startswith(target_prefix):
continue
line = line[len(target_prefix):].strip()
name, raw_desc = line.split(':', 1)
desc = raw_desc.rsplit('(', 1)[0]
if desc == '':
desc = '(No description found)'
descs[name.strip()] = desc.strip()
return descs
def handle_unraisableException(exc_type, exc_value=None, exc_traceback=None,
err_msg=None, object=None):
handle_Exception(exc_type, object, exc_traceback)
def handle_Exception(ex_cls, ex, tb):
if ex_cls is KeyboardInterrupt:
sys.exit()
if ex_cls is BrokenPipeError:
# It turns out that while sys.exit() triggers an exception
# not handled message on Python 3.8+, os._exit() does not.
os._exit(0)
if ex_cls is OSError:
if ex.errno == errno.ENOTCONN:
sys.exit()
raise ex
if hasattr(sys,'unraisablehook'): # Python 3.8+
sys.unraisablehook = handle_unraisableException
sys.excepthook = handle_Exception
def cleanup_line(single_line):
"""Format a raw line of data from /proc and isolate the name value
part, returning a tuple with each. Currently, this gets rid of the
@@ -109,16 +265,14 @@ def draw_graph(kstats_dict):
arc_perc = f_perc(arc_stats['size'], arc_stats['c_max'])
mfu_size = f_bytes(arc_stats['mfu_size'])
mru_size = f_bytes(arc_stats['mru_size'])
meta_limit = f_bytes(arc_stats['arc_meta_limit'])
meta_size = f_bytes(arc_stats['arc_meta_used'])
dnode_limit = f_bytes(arc_stats['arc_dnode_limit'])
dnode_size = f_bytes(arc_stats['dnode_size'])
info_form = ('ARC: {0} ({1}) MFU: {2} MRU: {3} META: {4} ({5}) '
'DNODE {6} ({7})')
info_form = ('ARC: {0} ({1}) MFU: {2} MRU: {3} META: {4} '
'DNODE {5} ({6})')
info_line = info_form.format(arc_size, arc_perc, mfu_size, mru_size,
meta_size, meta_limit, dnode_size,
dnode_limit)
meta_size, dnode_size, dnode_limit)
info_spc = ' '*int((GRAPH_WIDTH-len(info_line))/2)
info_line = GRAPH_INDENT+info_spc+info_line
@@ -238,139 +392,48 @@ def format_raw_line(name, value):
if ARGS.alt:
result = '{0}{1}={2}'.format(INDENT, name, value)
else:
spc = LINE_LENGTH-(len(INDENT)+len(value))
result = '{0}{1:<{spc}}{2}'.format(INDENT, name, value, spc=spc)
# Right-align the value within the line length if it fits,
# otherwise just separate it from the name by a single space.
fit = LINE_LENGTH - len(INDENT) - len(name)
overflow = len(value) + 1
w = max(fit, overflow)
result = '{0}{1}{2:>{w}}'.format(INDENT, name, value, w=w)
return result
def get_kstats():
"""Collect information on the ZFS subsystem from the /proc Linux virtual
file system. The step does not perform any further processing, giving us
the option to only work on what is actually needed. The name "kstat" is a
holdover from the Solaris utility of the same name.
"""Collect information on the ZFS subsystem. The step does not perform any
further processing, giving us the option to only work on what is actually
needed. The name "kstat" is a holdover from the Solaris utility of the same
name.
"""
result = {}
secs = SECTION_PATHS.values()
for section in secs:
with open(PROC_PATH+section, 'r') as proc_location:
lines = [line for line in proc_location]
del lines[0:2] # Get rid of header
result[section] = lines
for section in SECTION_PATHS.values():
if section not in result:
result[section] = load_kstats(section)
return result
def get_spl_tunables(PATH):
"""Collect information on the Solaris Porting Layer (SPL) or the
tunables, depending on the PATH given. Does not check if PATH is
legal.
"""
result = {}
parameters = os.listdir(PATH)
for name in parameters:
with open(PATH+name, 'r') as para_file:
value = para_file.read()
result[name] = value.strip()
return result
def get_descriptions(request):
"""Get the decriptions of the Solaris Porting Layer (SPL) or the
tunables, return with minimal formatting.
"""
if request not in ('spl', 'zfs'):
print('ERROR: description of "{0}" requested)'.format(request))
sys.exit(1)
descs = {}
target_prefix = 'parm:'
# We would prefer to do this with /sys/modules -- see the discussion at
# get_version() -- but there isn't a way to get the descriptions from
# there, so we fall back on modinfo
command = ["/sbin/modinfo", request, "-0"]
# The recommended way to do this is with subprocess.run(). However,
# some installed versions of Python are < 3.5, so we offer them
# the option of doing it the old way (for now)
info = ''
try:
if 'run' in dir(subprocess):
info = subprocess.run(command, stdout=subprocess.PIPE,
universal_newlines=True)
raw_output = info.stdout.split('\0')
else:
info = subprocess.check_output(command, universal_newlines=True)
raw_output = info.split('\0')
except subprocess.CalledProcessError:
print("Error: Descriptions not available (can't access kernel module)")
sys.exit(1)
for line in raw_output:
if not line.startswith(target_prefix):
continue
line = line[len(target_prefix):].strip()
name, raw_desc = line.split(':', 1)
desc = raw_desc.rsplit('(', 1)[0]
if desc == '':
desc = '(No description found)'
descs[name.strip()] = desc.strip()
return descs
def get_version(request):
"""Get the version number of ZFS or SPL on this machine for header.
Returns an error string, but does not raise an error, if we can't
get the ZFS/SPL version via modinfo.
get the ZFS/SPL version.
"""
if request not in ('spl', 'zfs'):
error_msg = '(ERROR: "{0}" requested)'.format(request)
return error_msg
# The original arc_summary called /sbin/modinfo/{spl,zfs} to get
# the version information. We switch to /sys/module/{spl,zfs}/version
# to make sure we get what is really loaded in the kernel
command = ["cat", "/sys/module/{0}/version".format(request)]
req = request.upper()
version = "(Can't get {0} version)".format(req)
# The recommended way to do this is with subprocess.run(). However,
# some installed versions of Python are < 3.5, so we offer them
# the option of doing it the old way (for now)
info = ''
if 'run' in dir(subprocess):
info = subprocess.run(command, stdout=subprocess.PIPE,
universal_newlines=True)
version = info.stdout.strip()
else:
info = subprocess.check_output(command, universal_newlines=True)
version = info.strip()
return version
return get_version_impl(request)
def print_header():
"""Print the initial heading with date and time as well as info on the
Linux and ZFS versions. This is not called for the graph.
kernel and ZFS versions. This is not called for the graph.
"""
# datetime is now recommended over time but we keep the exact formatting
@@ -488,12 +551,28 @@ def section_arc(kstats_dict):
arc_target_size = arc_stats['c']
arc_max = arc_stats['c_max']
arc_min = arc_stats['c_min']
mfu_size = arc_stats['mfu_size']
mru_size = arc_stats['mru_size']
meta_limit = arc_stats['arc_meta_limit']
meta_size = arc_stats['arc_meta_used']
meta = arc_stats['meta']
pd = arc_stats['pd']
pm = arc_stats['pm']
anon_data = arc_stats['anon_data']
anon_metadata = arc_stats['anon_metadata']
mfu_data = arc_stats['mfu_data']
mfu_metadata = arc_stats['mfu_metadata']
mru_data = arc_stats['mru_data']
mru_metadata = arc_stats['mru_metadata']
mfug_data = arc_stats['mfu_ghost_data']
mfug_metadata = arc_stats['mfu_ghost_metadata']
mrug_data = arc_stats['mru_ghost_data']
mrug_metadata = arc_stats['mru_ghost_metadata']
unc_data = arc_stats['uncached_data']
unc_metadata = arc_stats['uncached_metadata']
bonus_size = arc_stats['bonus_size']
dnode_limit = arc_stats['arc_dnode_limit']
dnode_size = arc_stats['dnode_size']
dbuf_size = arc_stats['dbuf_size']
hdr_size = arc_stats['hdr_size']
l2_hdr_size = arc_stats['l2_hdr_size']
abd_chunk_waste_size = arc_stats['abd_chunk_waste_size']
target_size_ratio = '{0}:1'.format(int(arc_max) // int(arc_min))
prt_2('ARC size (current):',
@@ -504,19 +583,56 @@ def section_arc(kstats_dict):
f_perc(arc_min, arc_max), f_bytes(arc_min))
prt_i2('Max size (high water):',
target_size_ratio, f_bytes(arc_max))
caches_size = int(mfu_size)+int(mru_size)
prt_i2('Most Frequently Used (MFU) cache size:',
f_perc(mfu_size, caches_size), f_bytes(mfu_size))
prt_i2('Most Recently Used (MRU) cache size:',
f_perc(mru_size, caches_size), f_bytes(mru_size))
prt_i2('Metadata cache size (hard limit):',
f_perc(meta_limit, arc_max), f_bytes(meta_limit))
prt_i2('Metadata cache size (current):',
f_perc(meta_size, meta_limit), f_bytes(meta_size))
prt_i2('Dnode cache size (hard limit):',
f_perc(dnode_limit, meta_limit), f_bytes(dnode_limit))
prt_i2('Dnode cache size (current):',
caches_size = int(anon_data)+int(anon_metadata)+\
int(mfu_data)+int(mfu_metadata)+int(mru_data)+int(mru_metadata)+\
int(unc_data)+int(unc_metadata)
prt_i2('Anonymous data size:',
f_perc(anon_data, caches_size), f_bytes(anon_data))
prt_i2('Anonymous metadata size:',
f_perc(anon_metadata, caches_size), f_bytes(anon_metadata))
s = 4294967296
v = (s-int(pd))*(s-int(meta))/s
prt_i2('MFU data target:', f_perc(v, s),
f_bytes(v / 65536 * caches_size / 65536))
prt_i2('MFU data size:',
f_perc(mfu_data, caches_size), f_bytes(mfu_data))
prt_i1('MFU ghost data size:', f_bytes(mfug_data))
v = (s-int(pm))*int(meta)/s
prt_i2('MFU metadata target:', f_perc(v, s),
f_bytes(v / 65536 * caches_size / 65536))
prt_i2('MFU metadata size:',
f_perc(mfu_metadata, caches_size), f_bytes(mfu_metadata))
prt_i1('MFU ghost metadata size:', f_bytes(mfug_metadata))
v = int(pd)*(s-int(meta))/s
prt_i2('MRU data target:', f_perc(v, s),
f_bytes(v / 65536 * caches_size / 65536))
prt_i2('MRU data size:',
f_perc(mru_data, caches_size), f_bytes(mru_data))
prt_i1('MRU ghost data size:', f_bytes(mrug_data))
v = int(pm)*int(meta)/s
prt_i2('MRU metadata target:', f_perc(v, s),
f_bytes(v / 65536 * caches_size / 65536))
prt_i2('MRU metadata size:',
f_perc(mru_metadata, caches_size), f_bytes(mru_metadata))
prt_i1('MRU ghost metadata size:', f_bytes(mrug_metadata))
prt_i2('Uncached data size:',
f_perc(unc_data, caches_size), f_bytes(unc_data))
prt_i2('Uncached metadata size:',
f_perc(unc_metadata, caches_size), f_bytes(unc_metadata))
prt_i2('Bonus size:',
f_perc(bonus_size, arc_size), f_bytes(bonus_size))
prt_i2('Dnode cache target:',
f_perc(dnode_limit, arc_max), f_bytes(dnode_limit))
prt_i2('Dnode cache size:',
f_perc(dnode_size, dnode_limit), f_bytes(dnode_size))
prt_i2('Dbuf size:',
f_perc(dbuf_size, arc_size), f_bytes(dbuf_size))
prt_i2('Header size:',
f_perc(hdr_size, arc_size), f_bytes(hdr_size))
prt_i2('L2 header size:',
f_perc(l2_hdr_size, arc_size), f_bytes(l2_hdr_size))
prt_i2('ABD chunk waste size:',
f_perc(abd_chunk_waste_size, arc_size), f_bytes(abd_chunk_waste_size))
print()
print('ARC hash breakdown:')
@@ -534,6 +650,20 @@ def section_arc(kstats_dict):
prt_i1('Deleted:', f_hits(arc_stats['deleted']))
prt_i1('Mutex misses:', f_hits(arc_stats['mutex_miss']))
prt_i1('Eviction skips:', f_hits(arc_stats['evict_skip']))
prt_i1('Eviction skips due to L2 writes:',
f_hits(arc_stats['evict_l2_skip']))
prt_i1('L2 cached evictions:', f_bytes(arc_stats['evict_l2_cached']))
prt_i1('L2 eligible evictions:', f_bytes(arc_stats['evict_l2_eligible']))
prt_i2('L2 eligible MFU evictions:',
f_perc(arc_stats['evict_l2_eligible_mfu'],
arc_stats['evict_l2_eligible']),
f_bytes(arc_stats['evict_l2_eligible_mfu']))
prt_i2('L2 eligible MRU evictions:',
f_perc(arc_stats['evict_l2_eligible_mru'],
arc_stats['evict_l2_eligible']),
f_bytes(arc_stats['evict_l2_eligible_mru']))
prt_i1('L2 ineligible evictions:',
f_bytes(arc_stats['evict_l2_ineligible']))
print()
@@ -542,78 +672,119 @@ def section_archits(kstats_dict):
"""
arc_stats = isolate_section('arcstats', kstats_dict)
all_accesses = int(arc_stats['hits'])+int(arc_stats['misses'])
actual_hits = int(arc_stats['mfu_hits'])+int(arc_stats['mru_hits'])
prt_1('ARC total accesses (hits + misses):', f_hits(all_accesses))
ta_todo = (('Cache hit ratio:', arc_stats['hits']),
('Cache miss ratio:', arc_stats['misses']),
('Actual hit ratio (MFU + MRU hits):', actual_hits))
all_accesses = int(arc_stats['hits'])+int(arc_stats['iohits'])+\
int(arc_stats['misses'])
prt_1('ARC total accesses:', f_hits(all_accesses))
ta_todo = (('Total hits:', arc_stats['hits']),
('Total I/O hits:', arc_stats['iohits']),
('Total misses:', arc_stats['misses']))
for title, value in ta_todo:
prt_i2(title, f_perc(value, all_accesses), f_hits(value))
print()
dd_total = int(arc_stats['demand_data_hits']) +\
int(arc_stats['demand_data_iohits']) +\
int(arc_stats['demand_data_misses'])
prt_i2('Data demand efficiency:',
f_perc(arc_stats['demand_data_hits'], dd_total),
f_hits(dd_total))
dp_total = int(arc_stats['prefetch_data_hits']) +\
int(arc_stats['prefetch_data_misses'])
prt_i2('Data prefetch efficiency:',
f_perc(arc_stats['prefetch_data_hits'], dp_total),
f_hits(dp_total))
known_hits = int(arc_stats['mfu_hits']) +\
int(arc_stats['mru_hits']) +\
int(arc_stats['mfu_ghost_hits']) +\
int(arc_stats['mru_ghost_hits'])
anon_hits = int(arc_stats['hits'])-known_hits
prt_2('ARC demand data accesses:', f_perc(dd_total, all_accesses),
f_hits(dd_total))
dd_todo = (('Demand data hits:', arc_stats['demand_data_hits']),
('Demand data I/O hits:', arc_stats['demand_data_iohits']),
('Demand data misses:', arc_stats['demand_data_misses']))
for title, value in dd_todo:
prt_i2(title, f_perc(value, dd_total), f_hits(value))
print()
print('Cache hits by cache type:')
dm_total = int(arc_stats['demand_metadata_hits']) +\
int(arc_stats['demand_metadata_iohits']) +\
int(arc_stats['demand_metadata_misses'])
prt_2('ARC demand metadata accesses:', f_perc(dm_total, all_accesses),
f_hits(dm_total))
dm_todo = (('Demand metadata hits:', arc_stats['demand_metadata_hits']),
('Demand metadata I/O hits:',
arc_stats['demand_metadata_iohits']),
('Demand metadata misses:', arc_stats['demand_metadata_misses']))
for title, value in dm_todo:
prt_i2(title, f_perc(value, dm_total), f_hits(value))
print()
pd_total = int(arc_stats['prefetch_data_hits']) +\
int(arc_stats['prefetch_data_iohits']) +\
int(arc_stats['prefetch_data_misses'])
prt_2('ARC prefetch data accesses:', f_perc(pd_total, all_accesses),
f_hits(pd_total))
pd_todo = (('Prefetch data hits:', arc_stats['prefetch_data_hits']),
('Prefetch data I/O hits:', arc_stats['prefetch_data_iohits']),
('Prefetch data misses:', arc_stats['prefetch_data_misses']))
for title, value in pd_todo:
prt_i2(title, f_perc(value, pd_total), f_hits(value))
print()
pm_total = int(arc_stats['prefetch_metadata_hits']) +\
int(arc_stats['prefetch_metadata_iohits']) +\
int(arc_stats['prefetch_metadata_misses'])
prt_2('ARC prefetch metadata accesses:', f_perc(pm_total, all_accesses),
f_hits(pm_total))
pm_todo = (('Prefetch metadata hits:',
arc_stats['prefetch_metadata_hits']),
('Prefetch metadata I/O hits:',
arc_stats['prefetch_metadata_iohits']),
('Prefetch metadata misses:',
arc_stats['prefetch_metadata_misses']))
for title, value in pm_todo:
prt_i2(title, f_perc(value, pm_total), f_hits(value))
print()
all_prefetches = int(arc_stats['predictive_prefetch'])+\
int(arc_stats['prescient_prefetch'])
prt_2('ARC predictive prefetches:',
f_perc(arc_stats['predictive_prefetch'], all_prefetches),
f_hits(arc_stats['predictive_prefetch']))
prt_i2('Demand hits after predictive:',
f_perc(arc_stats['demand_hit_predictive_prefetch'],
arc_stats['predictive_prefetch']),
f_hits(arc_stats['demand_hit_predictive_prefetch']))
prt_i2('Demand I/O hits after predictive:',
f_perc(arc_stats['demand_iohit_predictive_prefetch'],
arc_stats['predictive_prefetch']),
f_hits(arc_stats['demand_iohit_predictive_prefetch']))
never = int(arc_stats['predictive_prefetch']) -\
int(arc_stats['demand_hit_predictive_prefetch']) -\
int(arc_stats['demand_iohit_predictive_prefetch'])
prt_i2('Never demanded after predictive:',
f_perc(never, arc_stats['predictive_prefetch']),
f_hits(never))
print()
prt_2('ARC prescient prefetches:',
f_perc(arc_stats['prescient_prefetch'], all_prefetches),
f_hits(arc_stats['prescient_prefetch']))
prt_i2('Demand hits after prescient:',
f_perc(arc_stats['demand_hit_prescient_prefetch'],
arc_stats['prescient_prefetch']),
f_hits(arc_stats['demand_hit_prescient_prefetch']))
prt_i2('Demand I/O hits after prescient:',
f_perc(arc_stats['demand_iohit_prescient_prefetch'],
arc_stats['prescient_prefetch']),
f_hits(arc_stats['demand_iohit_prescient_prefetch']))
never = int(arc_stats['prescient_prefetch'])-\
int(arc_stats['demand_hit_prescient_prefetch'])-\
int(arc_stats['demand_iohit_prescient_prefetch'])
prt_i2('Never demanded after prescient:',
f_perc(never, arc_stats['prescient_prefetch']),
f_hits(never))
print()
print('ARC states hits of all accesses:')
cl_todo = (('Most frequently used (MFU):', arc_stats['mfu_hits']),
('Most recently used (MRU):', arc_stats['mru_hits']),
('Most frequently used (MFU) ghost:',
arc_stats['mfu_ghost_hits']),
('Most recently used (MRU) ghost:',
arc_stats['mru_ghost_hits']))
arc_stats['mru_ghost_hits']),
('Uncached:', arc_stats['uncached_hits']))
for title, value in cl_todo:
prt_i2(title, f_perc(value, arc_stats['hits']), f_hits(value))
# For some reason, anon_hits can turn negative, which is weird. Until we
# have figured out why this happens, we just hide the problem, following
# the behavior of the original arc_summary.
if anon_hits >= 0:
prt_i2('Anonymously used:',
f_perc(anon_hits, arc_stats['hits']), f_hits(anon_hits))
print()
print('Cache hits by data type:')
dt_todo = (('Demand data:', arc_stats['demand_data_hits']),
('Demand prefetch data:', arc_stats['prefetch_data_hits']),
('Demand metadata:', arc_stats['demand_metadata_hits']),
('Demand prefetch metadata:',
arc_stats['prefetch_metadata_hits']))
for title, value in dt_todo:
prt_i2(title, f_perc(value, arc_stats['hits']), f_hits(value))
print()
print('Cache misses by data type:')
dm_todo = (('Demand data:', arc_stats['demand_data_misses']),
('Demand prefetch data:',
arc_stats['prefetch_data_misses']),
('Demand metadata:', arc_stats['demand_metadata_misses']),
('Demand prefetch metadata:',
arc_stats['prefetch_metadata_misses']))
for title, value in dm_todo:
prt_i2(title, f_perc(value, arc_stats['misses']), f_hits(value))
prt_i2(title, f_perc(value, all_accesses), f_hits(value))
print()
@@ -622,13 +793,28 @@ def section_dmu(kstats_dict):
zfetch_stats = isolate_section('zfetchstats', kstats_dict)
zfetch_access_total = int(zfetch_stats['hits'])+int(zfetch_stats['misses'])
zfetch_access_total = int(zfetch_stats['hits']) +\
int(zfetch_stats['future']) + int(zfetch_stats['stride']) +\
int(zfetch_stats['past']) + int(zfetch_stats['misses'])
prt_1('DMU prefetch efficiency:', f_hits(zfetch_access_total))
prt_i2('Hit ratio:', f_perc(zfetch_stats['hits'], zfetch_access_total),
prt_1('DMU predictive prefetcher calls:', f_hits(zfetch_access_total))
prt_i2('Stream hits:',
f_perc(zfetch_stats['hits'], zfetch_access_total),
f_hits(zfetch_stats['hits']))
prt_i2('Miss ratio:', f_perc(zfetch_stats['misses'], zfetch_access_total),
future = int(zfetch_stats['future']) + int(zfetch_stats['stride'])
prt_i2('Hits ahead of stream:', f_perc(future, zfetch_access_total),
f_hits(future))
prt_i2('Hits behind stream:',
f_perc(zfetch_stats['past'], zfetch_access_total),
f_hits(zfetch_stats['past']))
prt_i2('Stream misses:',
f_perc(zfetch_stats['misses'], zfetch_access_total),
f_hits(zfetch_stats['misses']))
prt_i2('Streams limit reached:',
f_perc(zfetch_stats['max_streams'], zfetch_stats['misses']),
f_hits(zfetch_stats['max_streams']))
prt_i1('Stream strides:', f_hits(zfetch_stats['stride']))
prt_i1('Prefetches issued', f_hits(zfetch_stats['io_issued']))
print()
@@ -660,7 +846,8 @@ def section_l2arc(kstats_dict):
('Free on write:', 'l2_free_on_write'),
('R/W clashes:', 'l2_rw_clash'),
('Bad checksums:', 'l2_cksum_bad'),
('I/O errors:', 'l2_io_error'))
('Read errors:', 'l2_io_error'),
('Write errors:', 'l2_writes_error'))
for title, value in l2_todo:
prt_i1(title, f_hits(arc_stats[value]))
@@ -672,46 +859,57 @@ def section_l2arc(kstats_dict):
prt_i2('Header size:',
f_perc(arc_stats['l2_hdr_size'], arc_stats['l2_size']),
f_bytes(arc_stats['l2_hdr_size']))
prt_i2('MFU allocated size:',
f_perc(arc_stats['l2_mfu_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_mfu_asize']))
prt_i2('MRU allocated size:',
f_perc(arc_stats['l2_mru_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_mru_asize']))
prt_i2('Prefetch allocated size:',
f_perc(arc_stats['l2_prefetch_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_prefetch_asize']))
prt_i2('Data (buffer content) allocated size:',
f_perc(arc_stats['l2_bufc_data_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_bufc_data_asize']))
prt_i2('Metadata (buffer content) allocated size:',
f_perc(arc_stats['l2_bufc_metadata_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_bufc_metadata_asize']))
print()
prt_1('L2ARC breakdown:', f_hits(l2_access_total))
prt_i2('Hit ratio:',
f_perc(arc_stats['l2_hits'], l2_access_total),
f_bytes(arc_stats['l2_hits']))
f_hits(arc_stats['l2_hits']))
prt_i2('Miss ratio:',
f_perc(arc_stats['l2_misses'], l2_access_total),
f_bytes(arc_stats['l2_misses']))
prt_i1('Feeds:', f_hits(arc_stats['l2_feeds']))
f_hits(arc_stats['l2_misses']))
print()
print('L2ARC writes:')
if arc_stats['l2_writes_done'] != arc_stats['l2_writes_sent']:
prt_i2('Writes sent:', 'FAULTED', f_hits(arc_stats['l2_writes_sent']))
prt_i2('Done ratio:',
f_perc(arc_stats['l2_writes_done'],
arc_stats['l2_writes_sent']),
f_bytes(arc_stats['l2_writes_done']))
prt_i2('Error ratio:',
f_perc(arc_stats['l2_writes_error'],
arc_stats['l2_writes_sent']),
f_bytes(arc_stats['l2_writes_error']))
else:
prt_i2('Writes sent:', '100 %', f_bytes(arc_stats['l2_writes_sent']))
print('L2ARC I/O:')
prt_i2('Reads:',
f_bytes(arc_stats['l2_read_bytes']),
f_hits(arc_stats['l2_hits']))
prt_i2('Writes:',
f_bytes(arc_stats['l2_write_bytes']),
f_hits(arc_stats['l2_writes_sent']))
print()
print('L2ARC evicts:')
prt_i1('Lock retries:', f_hits(arc_stats['l2_evict_lock_retry']))
prt_i1('Upon reading:', f_hits(arc_stats['l2_evict_reading']))
prt_i1('L1 cached:', f_hits(arc_stats['l2_evict_l1cached']))
prt_i1('While reading:', f_hits(arc_stats['l2_evict_reading']))
print()
def section_spl(*_):
"""Print the SPL parameters, if requested with alternative format
and/or decriptions. This does not use kstats.
and/or descriptions. This does not use kstats.
"""
spls = get_spl_tunables(SPL_PATH)
if sys.platform.startswith('freebsd'):
# No SPL support in FreeBSD
return
spls = get_spl_params()
keylist = sorted(spls.keys())
print('Solaris Porting Layer (SPL):')
@@ -725,7 +923,7 @@ def section_spl(*_):
try:
print(INDENT+'#', descriptions[key])
except KeyError:
print(INDENT+'# (No decription found)') # paranoid
print(INDENT+'# (No description found)') # paranoid
print(format_raw_line(key, value))
@@ -734,10 +932,10 @@ def section_spl(*_):
def section_tunables(*_):
"""Print the tunables, if requested with alternative format and/or
decriptions. This does not use kstasts.
descriptions. This does not use kstasts.
"""
tunables = get_spl_tunables(TUNABLES_PATH)
tunables = get_tunable_params()
keylist = sorted(tunables.keys())
print('Tunables:')
@@ -751,45 +949,16 @@ def section_tunables(*_):
try:
print(INDENT+'#', descriptions[key])
except KeyError:
print(INDENT+'# (No decription found)') # paranoid
print(INDENT+'# (No description found)') # paranoid
print(format_raw_line(key, value))
print()
def section_vdev(kstats_dict):
"""Collect information on VDEV caches"""
# Currently [Nov 2017] the VDEV cache is disabled, because it is actually
# harmful. When this is the case, we just skip the whole entry. See
# https://github.com/zfsonlinux/zfs/blob/master/module/zfs/vdev_cache.c
# for details
tunables = get_spl_tunables(TUNABLES_PATH)
if tunables['zfs_vdev_cache_size'] == '0':
print('VDEV cache disabled, skipping section\n')
return
vdev_stats = isolate_section('vdev_cache_stats', kstats_dict)
vdev_cache_total = int(vdev_stats['hits']) +\
int(vdev_stats['misses']) +\
int(vdev_stats['delegations'])
prt_1('VDEV cache summary:', f_hits(vdev_cache_total))
prt_i2('Hit ratio:', f_perc(vdev_stats['hits'], vdev_cache_total),
f_hits(vdev_stats['hits']))
prt_i2('Miss ratio:', f_perc(vdev_stats['misses'], vdev_cache_total),
f_hits(vdev_stats['misses']))
prt_i2('Delegations:', f_perc(vdev_stats['delegations'], vdev_cache_total),
f_hits(vdev_stats['delegations']))
print()
def section_zil(kstats_dict):
"""Collect information on the ZFS Intent Log. Some of the information
taken from https://github.com/zfsonlinux/zfs/blob/master/include/sys/zil.h
taken from https://github.com/openzfs/zfs/blob/master/include/sys/zil.h
"""
zil_stats = isolate_section('zil', kstats_dict)
@@ -814,7 +983,6 @@ section_calls = {'arc': section_arc,
'l2arc': section_l2arc,
'spl': section_spl,
'tunables': section_tunables,
'vdev': section_vdev,
'zil': section_zil}
-13
View File
@@ -1,13 +0,0 @@
EXTRA_DIST = arc_summary2 arc_summary3
if USING_PYTHON_2
dist_bin_SCRIPTS = arc_summary2
install-exec-hook:
mv $(DESTDIR)$(bindir)/arc_summary2 $(DESTDIR)$(bindir)/arc_summary
endif
if USING_PYTHON_3
dist_bin_SCRIPTS = arc_summary3
install-exec-hook:
mv $(DESTDIR)$(bindir)/arc_summary3 $(DESTDIR)$(bindir)/arc_summary
endif
File diff suppressed because it is too large Load Diff
Executable
+680
View File
@@ -0,0 +1,680 @@
#!/usr/bin/env @PYTHON_SHEBANG@
#
# Print out ZFS ARC Statistics exported via kstat(1)
# For a definition of fields, or usage, use arcstat -v
#
# This script was originally a fork of the original arcstat.pl (0.1)
# by Neelakanth Nadgir, originally published on his Sun blog on
# 09/18/2007
# http://blogs.sun.com/realneel/entry/zfs_arc_statistics
#
# A new version aimed to improve upon the original by adding features
# and fixing bugs as needed. This version was maintained by Mike
# Harsch and was hosted in a public open source repository:
# http://github.com/mharsch/arcstat
#
# but has since moved to the illumos-gate repository.
#
# This Python port was written by John Hixson for FreeNAS, introduced
# in commit e2c29f:
# https://github.com/freenas/freenas
#
# and has been improved by many people since.
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License, Version 1.0 only
# (the "License"). You may not use this file except in compliance
# with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Fields have a fixed width. Every interval, we fill the "v"
# hash with its corresponding value (v[field]=value) using calculate().
# @hdr is the array of fields that needs to be printed, so we
# just iterate over this array and print the values using our pretty printer.
#
# This script must remain compatible with Python 3.6+.
#
import sys
import time
import getopt
import re
import copy
from signal import signal, SIGINT, SIGWINCH, SIG_DFL
cols = {
# HDR: [Size, Scale, Description]
"time": [8, -1, "Time"],
"hits": [4, 1000, "ARC hits per second"],
"iohs": [4, 1000, "ARC I/O hits per second"],
"miss": [4, 1000, "ARC misses per second"],
"read": [4, 1000, "Total ARC accesses per second"],
"hit%": [4, 100, "ARC hit percentage"],
"ioh%": [4, 100, "ARC I/O hit percentage"],
"miss%": [5, 100, "ARC miss percentage"],
"dhit": [4, 1000, "Demand hits per second"],
"dioh": [4, 1000, "Demand I/O hits per second"],
"dmis": [4, 1000, "Demand misses per second"],
"dh%": [3, 100, "Demand hit percentage"],
"di%": [3, 100, "Demand I/O hit percentage"],
"dm%": [3, 100, "Demand miss percentage"],
"ddhit": [5, 1000, "Demand data hits per second"],
"ddioh": [5, 1000, "Demand data I/O hits per second"],
"ddmis": [5, 1000, "Demand data misses per second"],
"ddh%": [4, 100, "Demand data hit percentage"],
"ddi%": [4, 100, "Demand data I/O hit percentage"],
"ddm%": [4, 100, "Demand data miss percentage"],
"dmhit": [5, 1000, "Demand metadata hits per second"],
"dmioh": [5, 1000, "Demand metadata I/O hits per second"],
"dmmis": [5, 1000, "Demand metadata misses per second"],
"dmh%": [4, 100, "Demand metadata hit percentage"],
"dmi%": [4, 100, "Demand metadata I/O hit percentage"],
"dmm%": [4, 100, "Demand metadata miss percentage"],
"phit": [4, 1000, "Prefetch hits per second"],
"pioh": [4, 1000, "Prefetch I/O hits per second"],
"pmis": [4, 1000, "Prefetch misses per second"],
"ph%": [3, 100, "Prefetch hits percentage"],
"pi%": [3, 100, "Prefetch I/O hits percentage"],
"pm%": [3, 100, "Prefetch miss percentage"],
"pdhit": [5, 1000, "Prefetch data hits per second"],
"pdioh": [5, 1000, "Prefetch data I/O hits per second"],
"pdmis": [5, 1000, "Prefetch data misses per second"],
"pdh%": [4, 100, "Prefetch data hits percentage"],
"pdi%": [4, 100, "Prefetch data I/O hits percentage"],
"pdm%": [4, 100, "Prefetch data miss percentage"],
"pmhit": [5, 1000, "Prefetch metadata hits per second"],
"pmioh": [5, 1000, "Prefetch metadata I/O hits per second"],
"pmmis": [5, 1000, "Prefetch metadata misses per second"],
"pmh%": [4, 100, "Prefetch metadata hits percentage"],
"pmi%": [4, 100, "Prefetch metadata I/O hits percentage"],
"pmm%": [4, 100, "Prefetch metadata miss percentage"],
"mhit": [4, 1000, "Metadata hits per second"],
"mioh": [4, 1000, "Metadata I/O hits per second"],
"mmis": [4, 1000, "Metadata misses per second"],
"mread": [5, 1000, "Metadata accesses per second"],
"mh%": [3, 100, "Metadata hit percentage"],
"mi%": [3, 100, "Metadata I/O hit percentage"],
"mm%": [3, 100, "Metadata miss percentage"],
"arcsz": [5, 1024, "ARC size"],
"size": [5, 1024, "ARC size"],
"c": [5, 1024, "ARC target size"],
"mfu": [4, 1000, "MFU list hits per second"],
"mru": [4, 1000, "MRU list hits per second"],
"mfug": [4, 1000, "MFU ghost list hits per second"],
"mrug": [4, 1000, "MRU ghost list hits per second"],
"unc": [4, 1000, "Uncached list hits per second"],
"eskip": [5, 1000, "evict_skip per second"],
"el2skip": [7, 1000, "evict skip, due to l2 writes, per second"],
"el2cach": [7, 1024, "Size of L2 cached evictions per second"],
"el2el": [5, 1024, "Size of L2 eligible evictions per second"],
"el2mfu": [6, 1024, "Size of L2 eligible MFU evictions per second"],
"el2mru": [6, 1024, "Size of L2 eligible MRU evictions per second"],
"el2inel": [7, 1024, "Size of L2 ineligible evictions per second"],
"mtxmis": [6, 1000, "mutex_miss per second"],
"dread": [5, 1000, "Demand accesses per second"],
"ddread": [6, 1000, "Demand data accesses per second"],
"dmread": [6, 1000, "Demand metadata accesses per second"],
"pread": [5, 1000, "Prefetch accesses per second"],
"pdread": [6, 1000, "Prefetch data accesses per second"],
"pmread": [6, 1000, "Prefetch metadata accesses per second"],
"l2hits": [6, 1000, "L2ARC hits per second"],
"l2miss": [6, 1000, "L2ARC misses per second"],
"l2read": [6, 1000, "Total L2ARC accesses per second"],
"l2hit%": [6, 100, "L2ARC access hit percentage"],
"l2miss%": [7, 100, "L2ARC access miss percentage"],
"l2pref": [6, 1024, "L2ARC prefetch allocated size"],
"l2mfu": [5, 1024, "L2ARC MFU allocated size"],
"l2mru": [5, 1024, "L2ARC MRU allocated size"],
"l2data": [6, 1024, "L2ARC data allocated size"],
"l2meta": [6, 1024, "L2ARC metadata allocated size"],
"l2pref%": [7, 100, "L2ARC prefetch percentage"],
"l2mfu%": [6, 100, "L2ARC MFU percentage"],
"l2mru%": [6, 100, "L2ARC MRU percentage"],
"l2data%": [7, 100, "L2ARC data percentage"],
"l2meta%": [7, 100, "L2ARC metadata percentage"],
"l2asize": [7, 1024, "Actual (compressed) size of the L2ARC"],
"l2size": [6, 1024, "Size of the L2ARC"],
"l2bytes": [7, 1024, "Bytes read per second from the L2ARC"],
"grow": [4, 1000, "ARC grow disabled"],
"need": [5, 1024, "ARC reclaim need"],
"free": [5, 1024, "ARC free memory"],
"avail": [5, 1024, "ARC available memory"],
"waste": [5, 1024, "Wasted memory due to round up to pagesize"],
"ztotal": [6, 1000, "zfetch total prefetcher calls per second"],
"zhits": [5, 1000, "zfetch stream hits per second"],
"zahead": [6, 1000, "zfetch hits ahead of streams per second"],
"zpast": [5, 1000, "zfetch hits behind streams per second"],
"zmisses": [7, 1000, "zfetch stream misses per second"],
"zmax": [4, 1000, "zfetch limit reached per second"],
"zfuture": [7, 1000, "zfetch stream future per second"],
"zstride": [7, 1000, "zfetch stream strides per second"],
"zissued": [7, 1000, "zfetch prefetches issued per second"],
"zactive": [7, 1000, "zfetch prefetches active per second"],
}
v = {}
hdr = ["time", "read", "ddread", "ddh%", "dmread", "dmh%", "pread", "ph%",
"size", "c", "avail"]
xhdr = ["time", "mfu", "mru", "mfug", "mrug", "unc", "eskip", "mtxmis",
"dread", "pread", "read"]
zhdr = ["time", "ztotal", "zhits", "zahead", "zpast", "zmisses", "zmax",
"zfuture", "zstride", "zissued", "zactive"]
sint = 1 # Default interval is 1 second
count = 1 # Default count is 1
hdr_intr = 20 # Print header every 20 lines of output
opfile = None
sep = " " # Default separator is 2 spaces
l2exist = False
cmd = ("Usage: arcstat [-havxp] [-f fields] [-o file] [-s string] [interval "
"[count]]\n")
cur = {}
d = {}
out = None
kstat = None
pretty_print = True
if sys.platform.startswith('freebsd'):
# Requires py-sysctl on FreeBSD
import sysctl
def kstat_update():
global kstat
k = [ctl for ctl in sysctl.filter('kstat.zfs.misc.arcstats')
if ctl.type != sysctl.CTLTYPE_NODE]
k += [ctl for ctl in sysctl.filter('kstat.zfs.misc.zfetchstats')
if ctl.type != sysctl.CTLTYPE_NODE]
if not k:
sys.exit(1)
kstat = {}
for s in k:
if not s:
continue
name, value = s.name, s.value
if "arcstats" in name:
# Trims 'kstat.zfs.misc.arcstats' from the name
kstat[name[24:]] = int(value)
else:
kstat["zfetch_" + name[27:]] = int(value)
elif sys.platform.startswith('linux'):
def kstat_update():
global kstat
k1 = [line.strip() for line in open('/proc/spl/kstat/zfs/arcstats')]
k2 = ["zfetch_" + line.strip() for line in
open('/proc/spl/kstat/zfs/zfetchstats')]
if k1 is None or k2 is None:
sys.exit(1)
del k1[0:2]
del k2[0:2]
k = k1 + k2
kstat = {}
for s in k:
if not s:
continue
name, unused, value = s.split()
kstat[name] = int(value)
def detailed_usage():
sys.stderr.write("%s\n" % cmd)
sys.stderr.write("Field definitions are as follows:\n")
for key in cols:
sys.stderr.write("%11s : %s\n" % (key, cols[key][2]))
sys.stderr.write("\n")
sys.exit(0)
def usage():
sys.stderr.write("%s\n" % cmd)
sys.stderr.write("\t -h : Print this help message\n")
sys.stderr.write("\t -a : Print all possible stats\n")
sys.stderr.write("\t -v : List all possible field headers and definitions"
"\n")
sys.stderr.write("\t -x : Print extended stats\n")
sys.stderr.write("\t -z : Print zfetch stats\n")
sys.stderr.write("\t -f : Specify specific fields to print (see -v)\n")
sys.stderr.write("\t -o : Redirect output to the specified file\n")
sys.stderr.write("\t -s : Override default field separator with custom "
"character or string\n")
sys.stderr.write("\t -p : Disable auto-scaling of numerical fields\n")
sys.stderr.write("\nExamples:\n")
sys.stderr.write("\tarcstat -o /tmp/a.log 2 10\n")
sys.stderr.write("\tarcstat -s \",\" -o /tmp/a.log 2 10\n")
sys.stderr.write("\tarcstat -v\n")
sys.stderr.write("\tarcstat -f time,hit%,dh%,ph%,mh% 1\n")
sys.stderr.write("\n")
sys.exit(1)
def snap_stats():
global cur
global kstat
prev = copy.deepcopy(cur)
kstat_update()
cur = kstat
for key in cur:
if re.match(key, "class"):
continue
if key in prev:
d[key] = cur[key] - prev[key]
else:
d[key] = cur[key]
def prettynum(sz, scale, num=0):
suffix = [' ', 'K', 'M', 'G', 'T', 'P', 'E', 'Z']
index = 0
save = 0
# Special case for date field
if scale == -1:
return "%s" % num
# Rounding error, return 0
elif 0 < num < 1:
num = 0
while abs(num) > scale and index < 5:
save = num
num = num / scale
index += 1
if index == 0:
return "%*d" % (sz, num)
if abs(save / scale) < 10:
return "%*.1f%s" % (sz - 1, num, suffix[index])
else:
return "%*d%s" % (sz - 1, num, suffix[index])
def print_values():
global hdr
global sep
global v
global pretty_print
if pretty_print:
fmt = lambda col: prettynum(cols[col][0], cols[col][1], v[col])
else:
fmt = lambda col: str(v[col])
sys.stdout.write(sep.join(fmt(col) for col in hdr))
sys.stdout.write("\n")
sys.stdout.flush()
def print_header():
global hdr
global sep
global pretty_print
if pretty_print:
fmt = lambda col: "%*s" % (cols[col][0], col)
else:
fmt = lambda col: col
sys.stdout.write(sep.join(fmt(col) for col in hdr))
sys.stdout.write("\n")
def get_terminal_lines():
try:
import fcntl
import termios
import struct
data = fcntl.ioctl(sys.stdout.fileno(), termios.TIOCGWINSZ, '1234')
sz = struct.unpack('hh', data)
return sz[0]
except Exception:
pass
def update_hdr_intr():
global hdr_intr
lines = get_terminal_lines()
if lines and lines > 3:
hdr_intr = lines - 3
def resize_handler(signum, frame):
update_hdr_intr()
def init():
global sint
global count
global hdr
global xhdr
global zhdr
global opfile
global sep
global out
global l2exist
global pretty_print
desired_cols = None
aflag = False
xflag = False
hflag = False
vflag = False
zflag = False
i = 1
try:
opts, args = getopt.getopt(
sys.argv[1:],
"axzo:hvs:f:p",
[
"all",
"extended",
"zfetch",
"outfile",
"help",
"verbose",
"separator",
"columns",
"parsable"
]
)
except getopt.error as msg:
sys.stderr.write("Error: %s\n" % str(msg))
usage()
opts = None
for opt, arg in opts:
if opt in ('-a', '--all'):
aflag = True
if opt in ('-x', '--extended'):
xflag = True
if opt in ('-o', '--outfile'):
opfile = arg
i += 1
if opt in ('-h', '--help'):
hflag = True
if opt in ('-v', '--verbose'):
vflag = True
if opt in ('-s', '--separator'):
sep = arg
i += 1
if opt in ('-f', '--columns'):
desired_cols = arg
i += 1
if opt in ('-p', '--parsable'):
pretty_print = False
if opt in ('-z', '--zfetch'):
zflag = True
i += 1
argv = sys.argv[i:]
sint = int(argv[0]) if argv else sint
count = int(argv[1]) if len(argv) > 1 else (0 if len(argv) > 0 else 1)
if hflag or (xflag and zflag) or ((zflag or xflag) and desired_cols):
usage()
if vflag:
detailed_usage()
if xflag:
hdr = xhdr
if zflag:
hdr = zhdr
update_hdr_intr()
# check if L2ARC exists
snap_stats()
l2_size = cur.get("l2_size")
if l2_size:
l2exist = True
if desired_cols:
hdr = desired_cols.split(",")
invalid = []
incompat = []
for ele in hdr:
if ele not in cols:
invalid.append(ele)
elif not l2exist and ele.startswith("l2"):
sys.stdout.write("No L2ARC Here\n%s\n" % ele)
incompat.append(ele)
if len(invalid) > 0:
sys.stderr.write("Invalid column definition! -- %s\n" % invalid)
usage()
if len(incompat) > 0:
sys.stderr.write("Incompatible field specified! -- %s\n" %
incompat)
usage()
if aflag:
if l2exist:
hdr = cols.keys()
else:
hdr = [col for col in cols.keys() if not col.startswith("l2")]
if opfile:
try:
out = open(opfile, "w")
sys.stdout = out
except IOError:
sys.stderr.write("Cannot open %s for writing\n" % opfile)
sys.exit(1)
def calculate():
global d
global v
global l2exist
v = dict()
v["time"] = time.strftime("%H:%M:%S", time.localtime())
v["hits"] = d["hits"] // sint
v["iohs"] = d["iohits"] // sint
v["miss"] = d["misses"] // sint
v["read"] = v["hits"] + v["iohs"] + v["miss"]
v["hit%"] = 100 * v["hits"] // v["read"] if v["read"] > 0 else 0
v["ioh%"] = 100 * v["iohs"] // v["read"] if v["read"] > 0 else 0
v["miss%"] = 100 - v["hit%"] - v["ioh%"] if v["read"] > 0 else 0
v["dhit"] = (d["demand_data_hits"] + d["demand_metadata_hits"]) // sint
v["dioh"] = (d["demand_data_iohits"] + d["demand_metadata_iohits"]) // sint
v["dmis"] = (d["demand_data_misses"] + d["demand_metadata_misses"]) // sint
v["dread"] = v["dhit"] + v["dioh"] + v["dmis"]
v["dh%"] = 100 * v["dhit"] // v["dread"] if v["dread"] > 0 else 0
v["di%"] = 100 * v["dioh"] // v["dread"] if v["dread"] > 0 else 0
v["dm%"] = 100 - v["dh%"] - v["di%"] if v["dread"] > 0 else 0
v["ddhit"] = d["demand_data_hits"] // sint
v["ddioh"] = d["demand_data_iohits"] // sint
v["ddmis"] = d["demand_data_misses"] // sint
v["ddread"] = v["ddhit"] + v["ddioh"] + v["ddmis"]
v["ddh%"] = 100 * v["ddhit"] // v["ddread"] if v["ddread"] > 0 else 0
v["ddi%"] = 100 * v["ddioh"] // v["ddread"] if v["ddread"] > 0 else 0
v["ddm%"] = 100 - v["ddh%"] - v["ddi%"] if v["ddread"] > 0 else 0
v["dmhit"] = d["demand_metadata_hits"] // sint
v["dmioh"] = d["demand_metadata_iohits"] // sint
v["dmmis"] = d["demand_metadata_misses"] // sint
v["dmread"] = v["dmhit"] + v["dmioh"] + v["dmmis"]
v["dmh%"] = 100 * v["dmhit"] // v["dmread"] if v["dmread"] > 0 else 0
v["dmi%"] = 100 * v["dmioh"] // v["dmread"] if v["dmread"] > 0 else 0
v["dmm%"] = 100 - v["dmh%"] - v["dmi%"] if v["dmread"] > 0 else 0
v["phit"] = (d["prefetch_data_hits"] + d["prefetch_metadata_hits"]) // sint
v["pioh"] = (d["prefetch_data_iohits"] +
d["prefetch_metadata_iohits"]) // sint
v["pmis"] = (d["prefetch_data_misses"] +
d["prefetch_metadata_misses"]) // sint
v["pread"] = v["phit"] + v["pioh"] + v["pmis"]
v["ph%"] = 100 * v["phit"] // v["pread"] if v["pread"] > 0 else 0
v["pi%"] = 100 * v["pioh"] // v["pread"] if v["pread"] > 0 else 0
v["pm%"] = 100 - v["ph%"] - v["pi%"] if v["pread"] > 0 else 0
v["pdhit"] = d["prefetch_data_hits"] // sint
v["pdioh"] = d["prefetch_data_iohits"] // sint
v["pdmis"] = d["prefetch_data_misses"] // sint
v["pdread"] = v["pdhit"] + v["pdioh"] + v["pdmis"]
v["pdh%"] = 100 * v["pdhit"] // v["pdread"] if v["pdread"] > 0 else 0
v["pdi%"] = 100 * v["pdioh"] // v["pdread"] if v["pdread"] > 0 else 0
v["pdm%"] = 100 - v["pdh%"] - v["pdi%"] if v["pdread"] > 0 else 0
v["pmhit"] = d["prefetch_metadata_hits"] // sint
v["pmioh"] = d["prefetch_metadata_iohits"] // sint
v["pmmis"] = d["prefetch_metadata_misses"] // sint
v["pmread"] = v["pmhit"] + v["pmioh"] + v["pmmis"]
v["pmh%"] = 100 * v["pmhit"] // v["pmread"] if v["pmread"] > 0 else 0
v["pmi%"] = 100 * v["pmioh"] // v["pmread"] if v["pmread"] > 0 else 0
v["pmm%"] = 100 - v["pmh%"] - v["pmi%"] if v["pmread"] > 0 else 0
v["mhit"] = (d["prefetch_metadata_hits"] +
d["demand_metadata_hits"]) // sint
v["mioh"] = (d["prefetch_metadata_iohits"] +
d["demand_metadata_iohits"]) // sint
v["mmis"] = (d["prefetch_metadata_misses"] +
d["demand_metadata_misses"]) // sint
v["mread"] = v["mhit"] + v["mioh"] + v["mmis"]
v["mh%"] = 100 * v["mhit"] // v["mread"] if v["mread"] > 0 else 0
v["mi%"] = 100 * v["mioh"] // v["mread"] if v["mread"] > 0 else 0
v["mm%"] = 100 - v["mh%"] - v["mi%"] if v["mread"] > 0 else 0
v["arcsz"] = cur["size"]
v["size"] = cur["size"]
v["c"] = cur["c"]
v["mfu"] = d["mfu_hits"] // sint
v["mru"] = d["mru_hits"] // sint
v["mrug"] = d["mru_ghost_hits"] // sint
v["mfug"] = d["mfu_ghost_hits"] // sint
v["unc"] = d["uncached_hits"] // sint
v["eskip"] = d["evict_skip"] // sint
v["el2skip"] = d["evict_l2_skip"] // sint
v["el2cach"] = d["evict_l2_cached"] // sint
v["el2el"] = d["evict_l2_eligible"] // sint
v["el2mfu"] = d["evict_l2_eligible_mfu"] // sint
v["el2mru"] = d["evict_l2_eligible_mru"] // sint
v["el2inel"] = d["evict_l2_ineligible"] // sint
v["mtxmis"] = d["mutex_miss"] // sint
v["ztotal"] = (d["zfetch_hits"] + d["zfetch_future"] + d["zfetch_stride"] +
d["zfetch_past"] + d["zfetch_misses"]) // sint
v["zhits"] = d["zfetch_hits"] // sint
v["zahead"] = (d["zfetch_future"] + d["zfetch_stride"]) // sint
v["zpast"] = d["zfetch_past"] // sint
v["zmisses"] = d["zfetch_misses"] // sint
v["zmax"] = d["zfetch_max_streams"] // sint
v["zfuture"] = d["zfetch_future"] // sint
v["zstride"] = d["zfetch_stride"] // sint
v["zissued"] = d["zfetch_io_issued"] // sint
v["zactive"] = d["zfetch_io_active"] // sint
if l2exist:
v["l2hits"] = d["l2_hits"] // sint
v["l2miss"] = d["l2_misses"] // sint
v["l2read"] = v["l2hits"] + v["l2miss"]
v["l2hit%"] = 100 * v["l2hits"] // v["l2read"] if v["l2read"] > 0 else 0
v["l2miss%"] = 100 - v["l2hit%"] if v["l2read"] > 0 else 0
v["l2asize"] = cur["l2_asize"]
v["l2size"] = cur["l2_size"]
v["l2bytes"] = d["l2_read_bytes"] // sint
v["l2pref"] = cur["l2_prefetch_asize"]
v["l2mfu"] = cur["l2_mfu_asize"]
v["l2mru"] = cur["l2_mru_asize"]
v["l2data"] = cur["l2_bufc_data_asize"]
v["l2meta"] = cur["l2_bufc_metadata_asize"]
v["l2pref%"] = 100 * v["l2pref"] // v["l2asize"]
v["l2mfu%"] = 100 * v["l2mfu"] // v["l2asize"]
v["l2mru%"] = 100 * v["l2mru"] // v["l2asize"]
v["l2data%"] = 100 * v["l2data"] // v["l2asize"]
v["l2meta%"] = 100 * v["l2meta"] // v["l2asize"]
v["grow"] = 0 if cur["arc_no_grow"] else 1
v["need"] = cur["arc_need_free"]
v["free"] = cur["memory_free_bytes"]
v["avail"] = cur["memory_available_bytes"]
v["waste"] = cur["abd_chunk_waste_size"]
def main():
global sint
global count
global hdr_intr
i = 0
count_flag = 0
init()
if count > 0:
count_flag = 1
signal(SIGINT, SIG_DFL)
signal(SIGWINCH, resize_handler)
while True:
if i == 0:
print_header()
snap_stats()
calculate()
print_values()
if count_flag == 1:
if count <= 1:
break
count -= 1
i = 0 if i >= hdr_intr else i + 1
time.sleep(sint)
if out:
out.close()
if __name__ == '__main__':
main()
-13
View File
@@ -1,13 +0,0 @@
dist_bin_SCRIPTS = arcstat
#
# The arcstat script is compatibile with both Python 2.6 and 3.4.
# As such the python 3 shebang can be replaced at install time when
# targeting a python 2 system. This allows us to maintain a single
# version of the source.
#
if USING_PYTHON_2
install-exec-hook:
sed --in-place 's|^#!/usr/bin/python3|#!/usr/bin/python2|' \
$(DESTDIR)$(bindir)/arcstat
endif
-470
View File
@@ -1,470 +0,0 @@
#!/usr/bin/python3
#
# Print out ZFS ARC Statistics exported via kstat(1)
# For a definition of fields, or usage, use arctstat.pl -v
#
# This script is a fork of the original arcstat.pl (0.1) by
# Neelakanth Nadgir, originally published on his Sun blog on
# 09/18/2007
# http://blogs.sun.com/realneel/entry/zfs_arc_statistics
#
# This version aims to improve upon the original by adding features
# and fixing bugs as needed. This version is maintained by
# Mike Harsch and is hosted in a public open source repository:
# http://github.com/mharsch/arcstat
#
# Comments, Questions, or Suggestions are always welcome.
# Contact the maintainer at ( mike at harschsystems dot com )
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License, Version 1.0 only
# (the "License"). You may not use this file except in compliance
# with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Fields have a fixed width. Every interval, we fill the "v"
# hash with its corresponding value (v[field]=value) using calculate().
# @hdr is the array of fields that needs to be printed, so we
# just iterate over this array and print the values using our pretty printer.
#
# This script must remain compatible with Python 2.6+ and Python 3.4+.
#
import sys
import time
import getopt
import re
import copy
from decimal import Decimal
from signal import signal, SIGINT, SIGWINCH, SIG_DFL
cols = {
# HDR: [Size, Scale, Description]
"time": [8, -1, "Time"],
"hits": [4, 1000, "ARC reads per second"],
"miss": [4, 1000, "ARC misses per second"],
"read": [4, 1000, "Total ARC accesses per second"],
"hit%": [4, 100, "ARC Hit percentage"],
"miss%": [5, 100, "ARC miss percentage"],
"dhit": [4, 1000, "Demand hits per second"],
"dmis": [4, 1000, "Demand misses per second"],
"dh%": [3, 100, "Demand hit percentage"],
"dm%": [3, 100, "Demand miss percentage"],
"phit": [4, 1000, "Prefetch hits per second"],
"pmis": [4, 1000, "Prefetch misses per second"],
"ph%": [3, 100, "Prefetch hits percentage"],
"pm%": [3, 100, "Prefetch miss percentage"],
"mhit": [4, 1000, "Metadata hits per second"],
"mmis": [4, 1000, "Metadata misses per second"],
"mread": [5, 1000, "Metadata accesses per second"],
"mh%": [3, 100, "Metadata hit percentage"],
"mm%": [3, 100, "Metadata miss percentage"],
"arcsz": [5, 1024, "ARC Size"],
"c": [4, 1024, "ARC Target Size"],
"mfu": [4, 1000, "MFU List hits per second"],
"mru": [4, 1000, "MRU List hits per second"],
"mfug": [4, 1000, "MFU Ghost List hits per second"],
"mrug": [4, 1000, "MRU Ghost List hits per second"],
"eskip": [5, 1000, "evict_skip per second"],
"mtxmis": [6, 1000, "mutex_miss per second"],
"dread": [5, 1000, "Demand accesses per second"],
"pread": [5, 1000, "Prefetch accesses per second"],
"l2hits": [6, 1000, "L2ARC hits per second"],
"l2miss": [6, 1000, "L2ARC misses per second"],
"l2read": [6, 1000, "Total L2ARC accesses per second"],
"l2hit%": [6, 100, "L2ARC access hit percentage"],
"l2miss%": [7, 100, "L2ARC access miss percentage"],
"l2asize": [7, 1024, "Actual (compressed) size of the L2ARC"],
"l2size": [6, 1024, "Size of the L2ARC"],
"l2bytes": [7, 1024, "bytes read per second from the L2ARC"],
"grow": [4, 1000, "ARC Grow disabled"],
"need": [4, 1024, "ARC Reclaim need"],
"free": [4, 1024, "ARC Free memory"],
}
v = {}
hdr = ["time", "read", "miss", "miss%", "dmis", "dm%", "pmis", "pm%", "mmis",
"mm%", "arcsz", "c"]
xhdr = ["time", "mfu", "mru", "mfug", "mrug", "eskip", "mtxmis", "dread",
"pread", "read"]
sint = 1 # Default interval is 1 second
count = 1 # Default count is 1
hdr_intr = 20 # Print header every 20 lines of output
opfile = None
sep = " " # Default separator is 2 spaces
version = "0.4"
l2exist = False
cmd = ("Usage: arcstat [-hvx] [-f fields] [-o file] [-s string] [interval "
"[count]]\n")
cur = {}
d = {}
out = None
kstat = None
def detailed_usage():
sys.stderr.write("%s\n" % cmd)
sys.stderr.write("Field definitions are as follows:\n")
for key in cols:
sys.stderr.write("%11s : %s\n" % (key, cols[key][2]))
sys.stderr.write("\n")
sys.exit(0)
def usage():
sys.stderr.write("%s\n" % cmd)
sys.stderr.write("\t -h : Print this help message\n")
sys.stderr.write("\t -v : List all possible field headers and definitions"
"\n")
sys.stderr.write("\t -x : Print extended stats\n")
sys.stderr.write("\t -f : Specify specific fields to print (see -v)\n")
sys.stderr.write("\t -o : Redirect output to the specified file\n")
sys.stderr.write("\t -s : Override default field separator with custom "
"character or string\n")
sys.stderr.write("\nExamples:\n")
sys.stderr.write("\tarcstat -o /tmp/a.log 2 10\n")
sys.stderr.write("\tarcstat -s \",\" -o /tmp/a.log 2 10\n")
sys.stderr.write("\tarcstat -v\n")
sys.stderr.write("\tarcstat -f time,hit%,dh%,ph%,mh% 1\n")
sys.stderr.write("\n")
sys.exit(1)
def kstat_update():
global kstat
k = [line.strip() for line in open('/proc/spl/kstat/zfs/arcstats')]
if not k:
sys.exit(1)
del k[0:2]
kstat = {}
for s in k:
if not s:
continue
name, unused, value = s.split()
kstat[name] = Decimal(value)
def snap_stats():
global cur
global kstat
prev = copy.deepcopy(cur)
kstat_update()
cur = kstat
for key in cur:
if re.match(key, "class"):
continue
if key in prev:
d[key] = cur[key] - prev[key]
else:
d[key] = cur[key]
def prettynum(sz, scale, num=0):
suffix = [' ', 'K', 'M', 'G', 'T', 'P', 'E', 'Z']
index = 0
save = 0
# Special case for date field
if scale == -1:
return "%s" % num
# Rounding error, return 0
elif 0 < num < 1:
num = 0
while num > scale and index < 5:
save = num
num = num / scale
index += 1
if index == 0:
return "%*d" % (sz, num)
if (save / scale) < 10:
return "%*.1f%s" % (sz - 1, num, suffix[index])
else:
return "%*d%s" % (sz - 1, num, suffix[index])
def print_values():
global hdr
global sep
global v
for col in hdr:
sys.stdout.write("%s%s" % (
prettynum(cols[col][0], cols[col][1], v[col]),
sep
))
sys.stdout.write("\n")
sys.stdout.flush()
def print_header():
global hdr
global sep
for col in hdr:
sys.stdout.write("%*s%s" % (cols[col][0], col, sep))
sys.stdout.write("\n")
def get_terminal_lines():
try:
import fcntl
import termios
import struct
data = fcntl.ioctl(sys.stdout.fileno(), termios.TIOCGWINSZ, '1234')
sz = struct.unpack('hh', data)
return sz[0]
except Exception:
pass
def update_hdr_intr():
global hdr_intr
lines = get_terminal_lines()
if lines and lines > 3:
hdr_intr = lines - 3
def resize_handler(signum, frame):
update_hdr_intr()
def init():
global sint
global count
global hdr
global xhdr
global opfile
global sep
global out
global l2exist
desired_cols = None
xflag = False
hflag = False
vflag = False
i = 1
try:
opts, args = getopt.getopt(
sys.argv[1:],
"xo:hvs:f:",
[
"extended",
"outfile",
"help",
"verbose",
"separator",
"columns"
]
)
except getopt.error as msg:
sys.stderr.write("Error: %s\n" % str(msg))
usage()
opts = None
for opt, arg in opts:
if opt in ('-x', '--extended'):
xflag = True
if opt in ('-o', '--outfile'):
opfile = arg
i += 1
if opt in ('-h', '--help'):
hflag = True
if opt in ('-v', '--verbose'):
vflag = True
if opt in ('-s', '--separator'):
sep = arg
i += 1
if opt in ('-f', '--columns'):
desired_cols = arg
i += 1
i += 1
argv = sys.argv[i:]
sint = Decimal(argv[0]) if argv else sint
count = int(argv[1]) if len(argv) > 1 else count
if len(argv) > 1:
sint = Decimal(argv[0])
count = int(argv[1])
elif len(argv) > 0:
sint = Decimal(argv[0])
count = 0
if hflag or (xflag and desired_cols):
usage()
if vflag:
detailed_usage()
if xflag:
hdr = xhdr
update_hdr_intr()
# check if L2ARC exists
snap_stats()
l2_size = cur.get("l2_size")
if l2_size:
l2exist = True
if desired_cols:
hdr = desired_cols.split(",")
invalid = []
incompat = []
for ele in hdr:
if ele not in cols:
invalid.append(ele)
elif not l2exist and ele.startswith("l2"):
sys.stdout.write("No L2ARC Here\n%s\n" % ele)
incompat.append(ele)
if len(invalid) > 0:
sys.stderr.write("Invalid column definition! -- %s\n" % invalid)
usage()
if len(incompat) > 0:
sys.stderr.write("Incompatible field specified! -- %s\n" %
incompat)
usage()
if opfile:
try:
out = open(opfile, "w")
sys.stdout = out
except IOError:
sys.stderr.write("Cannot open %s for writing\n" % opfile)
sys.exit(1)
def calculate():
global d
global v
global l2exist
v = dict()
v["time"] = time.strftime("%H:%M:%S", time.localtime())
v["hits"] = d["hits"] / sint
v["miss"] = d["misses"] / sint
v["read"] = v["hits"] + v["miss"]
v["hit%"] = 100 * v["hits"] / v["read"] if v["read"] > 0 else 0
v["miss%"] = 100 - v["hit%"] if v["read"] > 0 else 0
v["dhit"] = (d["demand_data_hits"] + d["demand_metadata_hits"]) / sint
v["dmis"] = (d["demand_data_misses"] + d["demand_metadata_misses"]) / sint
v["dread"] = v["dhit"] + v["dmis"]
v["dh%"] = 100 * v["dhit"] / v["dread"] if v["dread"] > 0 else 0
v["dm%"] = 100 - v["dh%"] if v["dread"] > 0 else 0
v["phit"] = (d["prefetch_data_hits"] + d["prefetch_metadata_hits"]) / sint
v["pmis"] = (d["prefetch_data_misses"] +
d["prefetch_metadata_misses"]) / sint
v["pread"] = v["phit"] + v["pmis"]
v["ph%"] = 100 * v["phit"] / v["pread"] if v["pread"] > 0 else 0
v["pm%"] = 100 - v["ph%"] if v["pread"] > 0 else 0
v["mhit"] = (d["prefetch_metadata_hits"] +
d["demand_metadata_hits"]) / sint
v["mmis"] = (d["prefetch_metadata_misses"] +
d["demand_metadata_misses"]) / sint
v["mread"] = v["mhit"] + v["mmis"]
v["mh%"] = 100 * v["mhit"] / v["mread"] if v["mread"] > 0 else 0
v["mm%"] = 100 - v["mh%"] if v["mread"] > 0 else 0
v["arcsz"] = cur["size"]
v["c"] = cur["c"]
v["mfu"] = d["mfu_hits"] / sint
v["mru"] = d["mru_hits"] / sint
v["mrug"] = d["mru_ghost_hits"] / sint
v["mfug"] = d["mfu_ghost_hits"] / sint
v["eskip"] = d["evict_skip"] / sint
v["mtxmis"] = d["mutex_miss"] / sint
if l2exist:
v["l2hits"] = d["l2_hits"] / sint
v["l2miss"] = d["l2_misses"] / sint
v["l2read"] = v["l2hits"] + v["l2miss"]
v["l2hit%"] = 100 * v["l2hits"] / v["l2read"] if v["l2read"] > 0 else 0
v["l2miss%"] = 100 - v["l2hit%"] if v["l2read"] > 0 else 0
v["l2asize"] = cur["l2_asize"]
v["l2size"] = cur["l2_size"]
v["l2bytes"] = d["l2_read_bytes"] / sint
v["grow"] = 0 if cur["arc_no_grow"] else 1
v["need"] = cur["arc_need_free"]
v["free"] = cur["arc_sys_free"]
def main():
global sint
global count
global hdr_intr
i = 0
count_flag = 0
init()
if count > 0:
count_flag = 1
signal(SIGINT, SIG_DFL)
signal(SIGWINCH, resize_handler)
while True:
if i == 0:
print_header()
snap_stats()
calculate()
print_values()
if count_flag == 1:
if count <= 1:
break
count -= 1
i = 0 if i >= hdr_intr else i + 1
time.sleep(sint)
if out:
out.close()
if __name__ == '__main__':
main()
+22 -7
View File
@@ -1,4 +1,4 @@
#!/usr/bin/python3
#!/usr/bin/env @PYTHON_SHEBANG@
#
# Print out statistics for all cached dmu buffers. This information
# is available through the dbufs kstat and may be post-processed as
@@ -12,7 +12,7 @@
# with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# or https://opensource.org/licenses/CDDL-1.0.
# See the License for the specific language governing permissions
# and limitations under the License.
#
@@ -27,7 +27,7 @@
# Copyright (C) 2013 Lawrence Livermore National Security, LLC.
# Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
#
# This script must remain compatible with Python 2.6+ and Python 3.4+.
# This script must remain compatible with and Python 3.6+.
#
import sys
@@ -113,10 +113,25 @@ cmd = ("Usage: dbufstat [-bdhnrtvx] [-i file] [-f fields] [-o file] "
raw = 0
if sys.platform.startswith("freebsd"):
import io
# Requires py-sysctl on FreeBSD
import sysctl
def default_ifile():
dbufs = sysctl.filter("kstat.zfs.misc.dbufs")[0].value
sys.stdin = io.StringIO(dbufs)
return "-"
elif sys.platform.startswith("linux"):
def default_ifile():
return "/proc/spl/kstat/zfs/dbufs"
def print_incompat_helper(incompat):
cnt = 0
for key in sorted(incompat):
if cnt is 0:
if cnt == 0:
sys.stderr.write("\t")
elif cnt > 8:
sys.stderr.write(",\n\t")
@@ -343,7 +358,7 @@ def get_compstring(c):
"ZIO_COMPRESS_GZIP_6", "ZIO_COMPRESS_GZIP_7",
"ZIO_COMPRESS_GZIP_8", "ZIO_COMPRESS_GZIP_9",
"ZIO_COMPRESS_ZLE", "ZIO_COMPRESS_LZ4",
"ZIO_COMPRESS_FUNCTION"]
"ZIO_COMPRESS_ZSTD", "ZIO_COMPRESS_FUNCTION"]
# If "-rr" option is used, don't convert to string representation
if raw > 1:
@@ -645,9 +660,9 @@ def main():
sys.exit(1)
if not ifile:
ifile = '/proc/spl/kstat/zfs/dbufs'
ifile = default_ifile()
if ifile is not "-":
if ifile != "-":
try:
tmp = open(ifile, "r")
sys.stdin = tmp
-13
View File
@@ -1,13 +0,0 @@
dist_bin_SCRIPTS = dbufstat
#
# The dbufstat script is compatibile with both Python 2.6 and 3.4.
# As such the python 3 shebang can be replaced at install time when
# targeting a python 2 system. This allows us to maintain a single
# version of the source.
#
if USING_PYTHON_2
install-exec-hook:
sed --in-place 's|^#!/usr/bin/python3|#!/usr/bin/python2|' \
$(DESTDIR)$(bindir)/dbufstat
endif
+44
View File
@@ -0,0 +1,44 @@
#!/bin/sh
#
# fsck.zfs: A fsck helper to accommodate distributions that expect
# to be able to execute a fsck on all filesystem types.
#
# This script simply bubbles up some already-known-about errors,
# see fsck.zfs(8)
#
if [ $# -eq 0 ]; then
echo "Usage: $0 [options] dataset…" >&2
exit 16
fi
ret=0
for dataset; do
case "$dataset" in
-*)
continue
;;
*)
;;
esac
pool="${dataset%%/*}"
case "$(@sbindir@/zpool list -Ho health "$pool")" in
DEGRADED)
ret=$(( ret | 4 ))
;;
FAULTED)
awk '!/^([[:space:]]*#.*)?$/ && $1 == "'"$dataset"'" && $3 == "zfs" {exit 1}' /etc/fstab || \
ret=$(( ret | 8 ))
;;
"")
# Pool not found, error printed by zpool(8)
ret=$(( ret | 8 ))
;;
*)
;;
esac
done
exit "$ret"
-1
View File
@@ -1 +0,0 @@
dist_sbin_SCRIPTS = fsck.zfs
-9
View File
@@ -1,9 +0,0 @@
#!/bin/sh
#
# fsck.zfs: A fsck helper to accomidate distributions that expect
# to be able to execute a fsck on all filesystem types. Currently
# this script does nothing but it could be extended to act as a
# compatibility wrapper for 'zpool scrub'.
#
exit 0
+87 -317
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -42,247 +42,46 @@
libzfs_handle_t *g_zfs;
typedef struct option_map {
const char *name;
unsigned long mntmask;
unsigned long zfsmask;
} option_map_t;
static const option_map_t option_map[] = {
/* Canonicalized filesystem independent options from mount(8) */
{ MNTOPT_NOAUTO, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_DEFAULTS, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_NODEVICES, MS_NODEV, ZS_COMMENT },
{ MNTOPT_DIRSYNC, MS_DIRSYNC, ZS_COMMENT },
{ MNTOPT_NOEXEC, MS_NOEXEC, ZS_COMMENT },
{ MNTOPT_GROUP, MS_GROUP, ZS_COMMENT },
{ MNTOPT_NETDEV, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_NOFAIL, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_NOSUID, MS_NOSUID, ZS_COMMENT },
{ MNTOPT_OWNER, MS_OWNER, ZS_COMMENT },
{ MNTOPT_REMOUNT, MS_REMOUNT, ZS_COMMENT },
{ MNTOPT_RO, MS_RDONLY, ZS_COMMENT },
{ MNTOPT_RW, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_SYNC, MS_SYNCHRONOUS, ZS_COMMENT },
{ MNTOPT_USER, MS_USERS, ZS_COMMENT },
{ MNTOPT_USERS, MS_USERS, ZS_COMMENT },
/* acl flags passed with util-linux-2.24 mount command */
{ MNTOPT_ACL, MS_POSIXACL, ZS_COMMENT },
{ MNTOPT_NOACL, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_POSIXACL, MS_POSIXACL, ZS_COMMENT },
#ifdef MS_NOATIME
{ MNTOPT_NOATIME, MS_NOATIME, ZS_COMMENT },
#endif
#ifdef MS_NODIRATIME
{ MNTOPT_NODIRATIME, MS_NODIRATIME, ZS_COMMENT },
#endif
#ifdef MS_RELATIME
{ MNTOPT_RELATIME, MS_RELATIME, ZS_COMMENT },
#endif
#ifdef MS_STRICTATIME
{ MNTOPT_STRICTATIME, MS_STRICTATIME, ZS_COMMENT },
#endif
#ifdef MS_LAZYTIME
{ MNTOPT_LAZYTIME, MS_LAZYTIME, ZS_COMMENT },
#endif
{ MNTOPT_CONTEXT, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_FSCONTEXT, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_DEFCONTEXT, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_ROOTCONTEXT, MS_COMMENT, ZS_COMMENT },
#ifdef MS_I_VERSION
{ MNTOPT_IVERSION, MS_I_VERSION, ZS_COMMENT },
#endif
#ifdef MS_MANDLOCK
{ MNTOPT_NBMAND, MS_MANDLOCK, ZS_COMMENT },
#endif
/* Valid options not found in mount(8) */
{ MNTOPT_BIND, MS_BIND, ZS_COMMENT },
#ifdef MS_REC
{ MNTOPT_RBIND, MS_BIND|MS_REC, ZS_COMMENT },
#endif
{ MNTOPT_COMMENT, MS_COMMENT, ZS_COMMENT },
#ifdef MS_NOSUB
{ MNTOPT_NOSUB, MS_NOSUB, ZS_COMMENT },
#endif
#ifdef MS_SILENT
{ MNTOPT_QUIET, MS_SILENT, ZS_COMMENT },
#endif
/* Custom zfs options */
{ MNTOPT_XATTR, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_NOXATTR, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_ZFSUTIL, MS_COMMENT, ZS_ZFSUTIL },
{ NULL, 0, 0 } };
/*
* Break the mount option in to a name/value pair. The name is
* validated against the option map and mount flags set accordingly.
* Opportunistically convert a target string into a pool name. If the
* string does not represent a block device with a valid zfs label
* then it is passed through without modification.
*/
static int
parse_option(char *mntopt, unsigned long *mntflags,
unsigned long *zfsflags, int sloppy)
static void
parse_dataset(const char *target, char **dataset)
{
const option_map_t *opt;
char *ptr, *name, *value = NULL;
int error = 0;
name = strdup(mntopt);
if (name == NULL)
return (ENOMEM);
for (ptr = name; ptr && *ptr; ptr++) {
if (*ptr == '=') {
*ptr = '\0';
value = ptr+1;
VERIFY3P(value, !=, NULL);
break;
}
}
for (opt = option_map; opt->name != NULL; opt++) {
if (strncmp(name, opt->name, strlen(name)) == 0) {
*mntflags |= opt->mntmask;
*zfsflags |= opt->zfsmask;
error = 0;
goto out;
}
}
if (!sloppy)
error = ENOENT;
out:
/* If required further process on the value may be done here */
free(name);
return (error);
}
/*
* Translate the mount option string in to MS_* mount flags for the
* kernel vfs. When sloppy is non-zero unknown options will be ignored
* otherwise they are considered fatal are copied in to badopt.
*/
static int
parse_options(char *mntopts, unsigned long *mntflags, unsigned long *zfsflags,
int sloppy, char *badopt, char *mtabopt)
{
int error = 0, quote = 0, flag = 0, count = 0;
char *ptr, *opt, *opts;
opts = strdup(mntopts);
if (opts == NULL)
return (ENOMEM);
*mntflags = 0;
opt = NULL;
/*
* Scan through all mount options which must be comma delimited.
* We must be careful to notice regions which are double quoted
* and skip commas in these regions. Each option is then checked
* to determine if it is a known option.
* Prior to util-linux 2.36.2, if a file or directory in the
* current working directory was named 'dataset' then mount(8)
* would prepend the current working directory to the dataset.
* Check for it and strip the prepended path when it is added.
*/
for (ptr = opts; ptr && !flag; ptr++) {
if (opt == NULL)
opt = ptr;
if (*ptr == '"')
quote = !quote;
if (quote)
continue;
if (*ptr == '\0')
flag = 1;
if ((*ptr == ',') || (*ptr == '\0')) {
*ptr = '\0';
error = parse_option(opt, mntflags, zfsflags, sloppy);
if (error) {
strcpy(badopt, opt);
goto out;
}
if (!(*mntflags & MS_REMOUNT) &&
!(*zfsflags & ZS_ZFSUTIL)) {
if (count > 0)
strlcat(mtabopt, ",", MNT_LINE_MAX);
strlcat(mtabopt, opt, MNT_LINE_MAX);
count++;
}
opt = NULL;
}
}
out:
free(opts);
return (error);
}
/*
* Return the pool/dataset to mount given the name passed to mount. This
* is expected to be of the form pool/dataset, however may also refer to
* a block device if that device contains a valid zfs label.
*/
static char *
parse_dataset(char *dataset)
{
char cwd[PATH_MAX];
struct stat64 statbuf;
int error;
int len;
/*
* We expect a pool/dataset to be provided, however if we're
* given a device which is a member of a zpool we attempt to
* extract the pool name stored in the label. Given the pool
* name we can mount the root dataset.
*/
error = stat64(dataset, &statbuf);
if (error == 0) {
nvlist_t *config;
char *name;
int fd;
fd = open(dataset, O_RDONLY);
if (fd < 0)
goto out;
error = zpool_read_label(fd, &config, NULL);
(void) close(fd);
if (error)
goto out;
error = nvlist_lookup_string(config,
ZPOOL_CONFIG_POOL_NAME, &name);
if (error) {
nvlist_free(config);
} else {
dataset = strdup(name);
nvlist_free(config);
return (dataset);
}
if (getcwd(cwd, PATH_MAX) == NULL) {
perror("getcwd");
return;
}
out:
/*
* If a file or directory in your current working directory is
* named 'dataset' then mount(8) will prepend your current working
* directory to the dataset. There is no way to prevent this
* behavior so we simply check for it and strip the prepended
* patch when it is added.
*/
if (getcwd(cwd, PATH_MAX) == NULL)
return (dataset);
int len = strlen(cwd);
if (strncmp(cwd, target, len) == 0)
target += len;
len = strlen(cwd);
/* Assume pool/dataset is more likely */
strlcpy(*dataset, target, PATH_MAX);
/* Do not add one when cwd already ends in a trailing '/' */
if (strncmp(cwd, dataset, len) == 0)
return (dataset + len + (cwd[len-1] != '/'));
int fd = open(target, O_RDONLY | O_CLOEXEC);
if (fd < 0)
return;
return (dataset);
nvlist_t *cfg = NULL;
if (zpool_read_label(fd, &cfg, NULL) == 0) {
const char *nm = NULL;
if (!nvlist_lookup_string(cfg, ZPOOL_CONFIG_POOL_NAME, &nm))
strlcpy(*dataset, nm, PATH_MAX);
nvlist_free(cfg);
}
if (close(fd))
perror("close");
}
/*
@@ -309,25 +108,26 @@ mtab_is_writeable(void)
}
static int
mtab_update(char *dataset, char *mntpoint, char *type, char *mntopts)
mtab_update(const char *dataset, const char *mntpoint, const char *type,
const char *mntopts)
{
struct mntent mnt;
FILE *fp;
int error;
mnt.mnt_fsname = dataset;
mnt.mnt_dir = mntpoint;
mnt.mnt_type = type;
mnt.mnt_opts = mntopts ? mntopts : "";
mnt.mnt_fsname = (char *)dataset;
mnt.mnt_dir = (char *)mntpoint;
mnt.mnt_type = (char *)type;
mnt.mnt_opts = (char *)(mntopts ?: "");
mnt.mnt_freq = 0;
mnt.mnt_passno = 0;
fp = setmntent("/etc/mtab", "a+");
fp = setmntent("/etc/mtab", "a+e");
if (!fp) {
(void) fprintf(stderr, gettext(
"filesystem '%s' was mounted, but /etc/mtab "
"could not be opened due to error %d\n"),
dataset, errno);
"could not be opened due to error: %s\n"),
dataset, strerror(errno));
return (MOUNT_FILEIO);
}
@@ -335,8 +135,8 @@ mtab_update(char *dataset, char *mntpoint, char *type, char *mntopts)
if (error) {
(void) fprintf(stderr, gettext(
"filesystem '%s' was mounted, but /etc/mtab "
"could not be updated due to error %d\n"),
dataset, errno);
"could not be updated due to error: %s\n"),
dataset, strerror(errno));
return (MOUNT_FILEIO);
}
@@ -345,34 +145,6 @@ mtab_update(char *dataset, char *mntpoint, char *type, char *mntopts)
return (MOUNT_SUCCESS);
}
static void
append_mntopt(const char *name, const char *val, char *mntopts,
char *mtabopt, boolean_t quote)
{
char tmp[MNT_LINE_MAX];
snprintf(tmp, MNT_LINE_MAX, quote ? ",%s=\"%s\"" : ",%s=%s", name, val);
if (mntopts)
strlcat(mntopts, tmp, MNT_LINE_MAX);
if (mtabopt)
strlcat(mtabopt, tmp, MNT_LINE_MAX);
}
static void
zfs_selinux_setcontext(zfs_handle_t *zhp, zfs_prop_t zpt, const char *name,
char *mntopts, char *mtabopt)
{
char context[ZFS_MAXPROPLEN];
if (zfs_prop_get(zhp, zpt, context, sizeof (context),
NULL, NULL, 0, B_FALSE) == 0) {
if (strcmp(context, "none") != 0)
append_mntopt(name, context, mntopts, mtabopt, B_TRUE);
}
}
int
main(int argc, char **argv)
{
@@ -383,12 +155,13 @@ main(int argc, char **argv)
char badopt[MNT_LINE_MAX] = { '\0' };
char mtabopt[MNT_LINE_MAX] = { '\0' };
char mntpoint[PATH_MAX];
char *dataset;
char dataset[PATH_MAX], *pdataset = dataset;
unsigned long mntflags = 0, zfsflags = 0, remount = 0;
int sloppy = 0, fake = 0, verbose = 0, nomtab = 0, zfsutil = 0;
int error, c;
(void) setlocale(LC_ALL, "");
(void) setlocale(LC_NUMERIC, "C");
(void) textdomain(TEXT_DOMAIN);
opterr = 0;
@@ -413,10 +186,11 @@ main(int argc, char **argv)
break;
case 'h':
case '?':
(void) fprintf(stderr, gettext("Invalid option '%c'\n"),
optopt);
if (optopt)
(void) fprintf(stderr,
gettext("Invalid option '%c'\n"), optopt);
(void) fprintf(stderr, gettext("Usage: mount.zfs "
"[-sfnv] [-o options] <dataset> <mountpoint>\n"));
"[-sfnvh] [-o options] <dataset> <mountpoint>\n"));
return (MOUNT_USAGE);
}
}
@@ -438,18 +212,18 @@ main(int argc, char **argv)
return (MOUNT_USAGE);
}
dataset = parse_dataset(argv[0]);
parse_dataset(argv[0], &pdataset);
/* canonicalize the mount point */
if (realpath(argv[1], mntpoint) == NULL) {
(void) fprintf(stderr, gettext("filesystem '%s' cannot be "
"mounted at '%s' due to canonicalization error %d.\n"),
dataset, argv[1], errno);
"mounted at '%s' due to canonicalization error: %s\n"),
dataset, argv[1], strerror(errno));
return (MOUNT_SYSERR);
}
/* validate mount options and set mntflags */
error = parse_options(mntopts, &mntflags, &zfsflags, sloppy,
error = zfs_parse_mount_options(mntopts, &mntflags, &zfsflags, sloppy,
badopt, mtabopt);
if (error) {
switch (error) {
@@ -473,13 +247,6 @@ main(int argc, char **argv)
}
}
if (verbose)
(void) fprintf(stdout, gettext("mount.zfs:\n"
" dataset: \"%s\"\n mountpoint: \"%s\"\n"
" mountflags: 0x%lx\n zfsflags: 0x%lx\n"
" mountopts: \"%s\"\n mtabopts: \"%s\"\n"),
dataset, mntpoint, mntflags, zfsflags, mntopts, mtabopt);
if (mntflags & MS_REMOUNT) {
nomtab = 1;
remount = 1;
@@ -489,7 +256,7 @@ main(int argc, char **argv)
zfsutil = 1;
if ((g_zfs = libzfs_init()) == NULL) {
(void) fprintf(stderr, "%s", libzfs_error_init(errno));
(void) fprintf(stderr, "%s\n", libzfs_error_init(errno));
return (MOUNT_SYSERR);
}
@@ -502,33 +269,11 @@ main(int argc, char **argv)
return (MOUNT_USAGE);
}
/*
* Checks to see if the ZFS_PROP_SELINUX_CONTEXT exists
* if it does, create a tmp variable in case it's needed
* checks to see if the selinux context is set to the default
* if it is, allow the setting of the other context properties
* this is needed because the 'context' property overrides others
* if it is not the default, set the 'context' property
*/
if (zfs_prop_get(zhp, ZFS_PROP_SELINUX_CONTEXT, prop, sizeof (prop),
NULL, NULL, 0, B_FALSE) == 0) {
if (strcmp(prop, "none") == 0) {
zfs_selinux_setcontext(zhp, ZFS_PROP_SELINUX_FSCONTEXT,
MNTOPT_FSCONTEXT, mntopts, mtabopt);
zfs_selinux_setcontext(zhp, ZFS_PROP_SELINUX_DEFCONTEXT,
MNTOPT_DEFCONTEXT, mntopts, mtabopt);
zfs_selinux_setcontext(zhp,
ZFS_PROP_SELINUX_ROOTCONTEXT, MNTOPT_ROOTCONTEXT,
mntopts, mtabopt);
} else {
append_mntopt(MNTOPT_CONTEXT, prop,
mntopts, mtabopt, B_TRUE);
}
if (!zfsutil || sloppy ||
libzfs_envvar_is_set("ZFS_MOUNT_HELPER")) {
zfs_adjust_mount_options(zhp, mntpoint, mntopts, mtabopt);
}
/* A hint used to determine an auto-mounted snapshot mount point */
append_mntopt(MNTOPT_MNTPOINT, mntpoint, mntopts, NULL, B_FALSE);
/* treat all snapshots as legacy mount points */
if (zfs_get_type(zhp) == ZFS_TYPE_SNAPSHOT)
(void) strlcpy(prop, ZFS_MOUNTPOINT_LEGACY, ZFS_MAXPROPLEN);
@@ -545,12 +290,11 @@ main(int argc, char **argv)
if (zfs_version == 0) {
fprintf(stderr, gettext("unable to fetch "
"ZFS version for filesystem '%s'\n"), dataset);
zfs_close(zhp);
libzfs_fini(g_zfs);
return (MOUNT_SYSERR);
}
zfs_close(zhp);
libzfs_fini(g_zfs);
/*
* Legacy mount points may only be mounted using 'mount', never using
* 'zfs mount'. However, since 'zfs mount' actually invokes 'mount'
@@ -568,6 +312,8 @@ main(int argc, char **argv)
"Use 'zfs set mountpoint=%s' or 'mount -t zfs %s %s'.\n"
"See zfs(8) for more information.\n"),
dataset, mntpoint, dataset, mntpoint);
zfs_close(zhp);
libzfs_fini(g_zfs);
return (MOUNT_USAGE);
}
@@ -578,14 +324,38 @@ main(int argc, char **argv)
"Use 'zfs set mountpoint=%s' or 'zfs mount %s'.\n"
"See zfs(8) for more information.\n"),
dataset, "legacy", dataset);
zfs_close(zhp);
libzfs_fini(g_zfs);
return (MOUNT_USAGE);
}
if (verbose)
(void) fprintf(stdout, gettext("mount.zfs:\n"
" dataset: \"%s\"\n mountpoint: \"%s\"\n"
" mountflags: 0x%lx\n zfsflags: 0x%lx\n"
" mountopts: \"%s\"\n mtabopts: \"%s\"\n"),
dataset, mntpoint, mntflags, zfsflags, mntopts, mtabopt);
if (!fake) {
error = mount(dataset, mntpoint, MNTTYPE_ZFS,
mntflags, mntopts);
if (zfsutil && !sloppy &&
!libzfs_envvar_is_set("ZFS_MOUNT_HELPER")) {
error = zfs_mount_at(zhp, mntopts, mntflags, mntpoint);
if (error) {
(void) fprintf(stderr, "zfs_mount_at() failed: "
"%s", libzfs_error_description(g_zfs));
zfs_close(zhp);
libzfs_fini(g_zfs);
return (MOUNT_SYSERR);
}
} else {
error = mount(dataset, mntpoint, MNTTYPE_ZFS,
mntflags, mntopts);
}
}
zfs_close(zhp);
libzfs_fini(g_zfs);
if (error) {
switch (errno) {
case ENOENT:
@@ -620,8 +390,8 @@ main(int argc, char **argv)
"mount the filesystem again.\n"), dataset);
return (MOUNT_SYSERR);
}
/* fallthru */
#endif
zfs_fallthrough;
default:
(void) fprintf(stderr, gettext("filesystem "
"'%s' can not be mounted: %s\n"), dataset,
-1
View File
@@ -1 +0,0 @@
mount.zfs
-21
View File
@@ -1,21 +0,0 @@
include $(top_srcdir)/config/Rules.am
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
#
# Ignore the prefix for the mount helper. It must be installed in /sbin/
# because this path is hardcoded in the mount(8) for security reasons.
# However, if needed, the configure option --with-mounthelperdir= can be used
# to override the default install location.
#
sbindir=$(mounthelperdir)
sbin_PROGRAMS = mount.zfs
mount_zfs_SOURCES = \
mount_zfs.c
mount_zfs_LDADD = \
$(top_builddir)/lib/libnvpair/libnvpair.la \
$(top_builddir)/lib/libzfs/libzfs.la
-1
View File
@@ -1 +0,0 @@
/raidz_test
+10 -17
View File
@@ -1,23 +1,16 @@
include $(top_srcdir)/config/Rules.am
raidz_test_CFLAGS = $(AM_CFLAGS) $(KERNEL_CFLAGS)
raidz_test_CPPFLAGS = $(AM_CPPFLAGS) $(FORCEDEBUG_CPPFLAGS)
# Includes kernel code, generate warnings for large stack frames
AM_CFLAGS += $(FRAME_LARGER_THAN)
# Unconditionally enable ASSERTs
AM_CPPFLAGS += -DDEBUG -UNDEBUG
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
bin_PROGRAMS = raidz_test
bin_PROGRAMS += raidz_test
CPPCHECKTARGETS += raidz_test
raidz_test_SOURCES = \
raidz_test.h \
raidz_test.c \
raidz_bench.c
%D%/raidz_bench.c \
%D%/raidz_test.c \
%D%/raidz_test.h
raidz_test_LDADD = \
$(top_builddir)/lib/libzpool/libzpool.la
libzpool.la \
libzfs_core.la
raidz_test_LDADD += -lm -ldl
raidz_test_LDADD += -lm
+25 -10
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -31,8 +31,6 @@
#include <sys/vdev_raidz_impl.h>
#include <stdio.h>
#include <sys/time.h>
#include "raidz_test.h"
#define GEN_BENCH_MEMORY (((uint64_t)1ULL)<<32)
@@ -65,7 +63,7 @@ bench_fini_raidz_maps(void)
{
/* tear down golden zio */
raidz_free(zio_bench.io_abd, max_data_size);
bzero(&zio_bench, sizeof (zio_t));
memset(&zio_bench, 0, sizeof (zio_t));
}
static inline void
@@ -83,8 +81,17 @@ run_gen_bench_impl(const char *impl)
/* create suitable raidz_map */
ncols = rto_opts.rto_dcols + fn + 1;
zio_bench.io_size = 1ULL << ds;
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, fn+1);
if (rto_opts.rto_expand) {
rm_bench = vdev_raidz_map_alloc_expanded(
zio_bench.io_abd,
zio_bench.io_size, zio_bench.io_offset,
rto_opts.rto_ashift, ncols+1, ncols,
fn+1, rto_opts.rto_expand_offset);
} else {
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, fn+1);
}
/* estimate iteration count */
iter_cnt = GEN_BENCH_MEMORY;
@@ -113,7 +120,7 @@ run_gen_bench_impl(const char *impl)
}
}
void
static void
run_gen_bench(void)
{
char **impl_name;
@@ -163,8 +170,16 @@ run_rec_bench_impl(const char *impl)
(1ULL << BENCH_ASHIFT))
continue;
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, PARITY_PQR);
if (rto_opts.rto_expand) {
rm_bench = vdev_raidz_map_alloc_expanded(
zio_bench.io_abd,
zio_bench.io_size, zio_bench.io_offset,
BENCH_ASHIFT, ncols+1, ncols,
PARITY_PQR, rto_opts.rto_expand_offset);
} else {
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, PARITY_PQR);
}
/* estimate iteration count */
iter_cnt = (REC_BENCH_MEMORY);
@@ -197,7 +212,7 @@ run_rec_bench_impl(const char *impl)
}
}
void
static void
run_rec_bench(void)
{
char **impl_name;
+312 -70
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -37,11 +37,11 @@
static int *rand_data;
raidz_test_opts_t rto_opts;
static char gdb[256];
static const char gdb_tmpl[] = "gdb -ex \"set pagination 0\" -p %d";
static char pid_s[16];
static void sig_handler(int signo)
{
int old_errno = errno;
struct sigaction action;
/*
* Restore default action and re-raise signal so SIGSEGV and
@@ -52,22 +52,32 @@ static void sig_handler(int signo)
action.sa_flags = 0;
(void) sigaction(signo, &action, NULL);
if (rto_opts.rto_gdb)
if (system(gdb)) { }
if (rto_opts.rto_gdb) {
pid_t pid = fork();
if (pid == 0) {
execlp("gdb", "gdb", "-ex", "set pagination 0",
"-p", pid_s, NULL);
_exit(-1);
} else if (pid > 0)
while (waitpid(pid, NULL, 0) == -1 && errno == EINTR)
;
}
raise(signo);
errno = old_errno;
}
static void print_opts(raidz_test_opts_t *opts, boolean_t force)
{
char *verbose;
const char *verbose;
switch (opts->rto_v) {
case 0:
case D_ALL:
verbose = "no";
break;
case 1:
case D_INFO:
verbose = "info";
break;
case D_DEBUG:
default:
verbose = "debug";
break;
@@ -77,16 +87,20 @@ static void print_opts(raidz_test_opts_t *opts, boolean_t force)
(void) fprintf(stdout, DBLSEP "Running with options:\n"
" (-a) zio ashift : %zu\n"
" (-o) zio offset : 1 << %zu\n"
" (-e) expanded map : %s\n"
" (-r) reflow offset : %llx\n"
" (-d) number of raidz data columns : %zu\n"
" (-s) size of DATA : 1 << %zu\n"
" (-S) sweep parameters : %s \n"
" (-v) verbose : %s \n\n",
opts->rto_ashift, /* -a */
ilog2(opts->rto_offset), /* -o */
opts->rto_dcols, /* -d */
ilog2(opts->rto_dsize), /* -s */
opts->rto_sweep ? "yes" : "no", /* -S */
verbose); /* -v */
opts->rto_ashift, /* -a */
ilog2(opts->rto_offset), /* -o */
opts->rto_expand ? "yes" : "no", /* -e */
(u_longlong_t)opts->rto_expand_offset, /* -r */
opts->rto_dcols, /* -d */
ilog2(opts->rto_dsize), /* -s */
opts->rto_sweep ? "yes" : "no", /* -S */
verbose); /* -v */
}
}
@@ -104,7 +118,9 @@ static void usage(boolean_t requested)
"\t[-S parameter sweep (default: %s)]\n"
"\t[-t timeout for parameter sweep test]\n"
"\t[-B benchmark all raidz implementations]\n"
"\t[-v increase verbosity (default: %zu)]\n"
"\t[-e use expanded raidz map (default: %s)]\n"
"\t[-r expanded raidz map reflow offset (default: %llx)]\n"
"\t[-v increase verbosity (default: %d)]\n"
"\t[-h (print help)]\n"
"\t[-T test the test, see if failure would be detected]\n"
"\t[-D debug (attach gdb on SIGSEGV)]\n"
@@ -114,7 +130,9 @@ static void usage(boolean_t requested)
o->rto_dcols, /* -d */
ilog2(o->rto_dsize), /* -s */
rto_opts.rto_sweep ? "yes" : "no", /* -S */
o->rto_v); /* -d */
rto_opts.rto_expand ? "yes" : "no", /* -e */
(u_longlong_t)o->rto_expand_offset, /* -r */
o->rto_v); /* -v */
exit(requested ? 0 : 1);
}
@@ -123,19 +141,22 @@ static void process_options(int argc, char **argv)
{
size_t value;
int opt;
raidz_test_opts_t *o = &rto_opts;
bcopy(&rto_opts_defaults, o, sizeof (*o));
while ((opt = getopt(argc, argv, "TDBSvha:o:d:s:t:")) != -1) {
value = 0;
memcpy(o, &rto_opts_defaults, sizeof (*o));
while ((opt = getopt(argc, argv, "TDBSvha:er:o:d:s:t:")) != -1) {
switch (opt) {
case 'a':
value = strtoull(optarg, NULL, 0);
o->rto_ashift = MIN(13, MAX(9, value));
break;
case 'e':
o->rto_expand = 1;
break;
case 'r':
o->rto_expand_offset = strtoull(optarg, NULL, 0);
break;
case 'o':
value = strtoull(optarg, NULL, 0);
o->rto_offset = ((1ULL << MIN(12, value)) >> 9) << 9;
@@ -179,25 +200,34 @@ static void process_options(int argc, char **argv)
}
}
#define DATA_COL(rm, i) ((rm)->rm_col[raidz_parity(rm) + (i)].rc_abd)
#define DATA_COL_SIZE(rm, i) ((rm)->rm_col[raidz_parity(rm) + (i)].rc_size)
#define DATA_COL(rr, i) ((rr)->rr_col[rr->rr_firstdatacol + (i)].rc_abd)
#define DATA_COL_SIZE(rr, i) ((rr)->rr_col[rr->rr_firstdatacol + (i)].rc_size)
#define CODE_COL(rm, i) ((rm)->rm_col[(i)].rc_abd)
#define CODE_COL_SIZE(rm, i) ((rm)->rm_col[(i)].rc_size)
#define CODE_COL(rr, i) ((rr)->rr_col[(i)].rc_abd)
#define CODE_COL_SIZE(rr, i) ((rr)->rr_col[(i)].rc_size)
static int
cmp_code(raidz_test_opts_t *opts, const raidz_map_t *rm, const int parity)
{
int i, ret = 0;
int r, i, ret = 0;
VERIFY(parity >= 1 && parity <= 3);
for (i = 0; i < parity; i++) {
if (abd_cmp(CODE_COL(rm, i), CODE_COL(opts->rm_golden, i))
!= 0) {
ret++;
LOG_OPT(D_DEBUG, opts,
"\nParity block [%d] different!\n", i);
for (r = 0; r < rm->rm_nrows; r++) {
raidz_row_t * const rr = rm->rm_row[r];
raidz_row_t * const rrg = opts->rm_golden->rm_row[r];
for (i = 0; i < parity; i++) {
if (CODE_COL_SIZE(rrg, i) == 0) {
VERIFY0(CODE_COL_SIZE(rr, i));
continue;
}
if (abd_cmp(CODE_COL(rr, i),
CODE_COL(rrg, i)) != 0) {
ret++;
LOG_OPT(D_DEBUG, opts,
"\nParity block [%d] different!\n", i);
}
}
}
return (ret);
@@ -206,16 +236,26 @@ cmp_code(raidz_test_opts_t *opts, const raidz_map_t *rm, const int parity)
static int
cmp_data(raidz_test_opts_t *opts, raidz_map_t *rm)
{
int i, ret = 0;
int dcols = opts->rm_golden->rm_cols - raidz_parity(opts->rm_golden);
int r, i, dcols, ret = 0;
for (i = 0; i < dcols; i++) {
if (abd_cmp(DATA_COL(opts->rm_golden, i), DATA_COL(rm, i))
!= 0) {
ret++;
for (r = 0; r < rm->rm_nrows; r++) {
raidz_row_t *rr = rm->rm_row[r];
raidz_row_t *rrg = opts->rm_golden->rm_row[r];
dcols = opts->rm_golden->rm_row[0]->rr_cols -
raidz_parity(opts->rm_golden);
for (i = 0; i < dcols; i++) {
if (DATA_COL_SIZE(rrg, i) == 0) {
VERIFY0(DATA_COL_SIZE(rr, i));
continue;
}
LOG_OPT(D_DEBUG, opts,
"\nData block [%d] different!\n", i);
if (abd_cmp(DATA_COL(rrg, i),
DATA_COL(rr, i)) != 0) {
ret++;
LOG_OPT(D_DEBUG, opts,
"\nData block [%d] different!\n", i);
}
}
}
return (ret);
@@ -224,24 +264,21 @@ cmp_data(raidz_test_opts_t *opts, raidz_map_t *rm)
static int
init_rand(void *data, size_t size, void *private)
{
int i;
int *dst = (int *)data;
for (i = 0; i < size / sizeof (int); i++)
dst[i] = rand_data[i];
(void) private;
memcpy(data, rand_data, size);
return (0);
}
static void
corrupt_colums(raidz_map_t *rm, const int *tgts, const int cnt)
{
int i;
raidz_col_t *col;
for (i = 0; i < cnt; i++) {
col = &rm->rm_col[tgts[i]];
abd_iterate_func(col->rc_abd, 0, col->rc_size, init_rand, NULL);
for (int r = 0; r < rm->rm_nrows; r++) {
raidz_row_t *rr = rm->rm_row[r];
for (int i = 0; i < cnt; i++) {
raidz_col_t *col = &rr->rr_col[tgts[i]];
abd_iterate_func(col->rc_abd, 0, col->rc_size,
init_rand, NULL);
}
}
}
@@ -288,10 +325,22 @@ init_raidz_golden_map(raidz_test_opts_t *opts, const int parity)
VERIFY0(vdev_raidz_impl_set("original"));
opts->rm_golden = vdev_raidz_map_alloc(opts->zio_golden,
opts->rto_ashift, total_ncols, parity);
rm_test = vdev_raidz_map_alloc(zio_test,
opts->rto_ashift, total_ncols, parity);
if (opts->rto_expand) {
opts->rm_golden =
vdev_raidz_map_alloc_expanded(opts->zio_golden->io_abd,
opts->zio_golden->io_size, opts->zio_golden->io_offset,
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset);
rm_test = vdev_raidz_map_alloc_expanded(zio_test->io_abd,
zio_test->io_size, zio_test->io_offset,
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset);
} else {
opts->rm_golden = vdev_raidz_map_alloc(opts->zio_golden,
opts->rto_ashift, total_ncols, parity);
rm_test = vdev_raidz_map_alloc(zio_test,
opts->rto_ashift, total_ncols, parity);
}
VERIFY(opts->zio_golden);
VERIFY(opts->rm_golden);
@@ -312,6 +361,187 @@ init_raidz_golden_map(raidz_test_opts_t *opts, const int parity)
return (err);
}
/*
* If reflow is not in progress, reflow_offset should be UINT64_MAX.
* For each row, if the row is entirely before reflow_offset, it will
* come from the new location. Otherwise this row will come from the
* old location. Therefore, rows that straddle the reflow_offset will
* come from the old location.
*
* NOTE: Until raidz expansion is implemented this function is only
* needed by raidz_test.c to the multi-row raid_map_t functionality.
*/
raidz_map_t *
vdev_raidz_map_alloc_expanded(abd_t *abd, uint64_t size, uint64_t offset,
uint64_t ashift, uint64_t physical_cols, uint64_t logical_cols,
uint64_t nparity, uint64_t reflow_offset)
{
/* The zio's size in units of the vdev's minimum sector size. */
uint64_t s = size >> ashift;
uint64_t q, r, bc, devidx, asize = 0, tot;
/*
* "Quotient": The number of data sectors for this stripe on all but
* the "big column" child vdevs that also contain "remainder" data.
* AKA "full rows"
*/
q = s / (logical_cols - nparity);
/*
* "Remainder": The number of partial stripe data sectors in this I/O.
* This will add a sector to some, but not all, child vdevs.
*/
r = s - q * (logical_cols - nparity);
/* The number of "big columns" - those which contain remainder data. */
bc = (r == 0 ? 0 : r + nparity);
/*
* The total number of data and parity sectors associated with
* this I/O.
*/
tot = s + nparity * (q + (r == 0 ? 0 : 1));
/* How many rows contain data (not skip) */
uint64_t rows = howmany(tot, logical_cols);
int cols = MIN(tot, logical_cols);
raidz_map_t *rm = kmem_zalloc(offsetof(raidz_map_t, rm_row[rows]),
KM_SLEEP);
rm->rm_nrows = rows;
for (uint64_t row = 0; row < rows; row++) {
raidz_row_t *rr = kmem_alloc(offsetof(raidz_row_t,
rr_col[cols]), KM_SLEEP);
rm->rm_row[row] = rr;
/* The starting RAIDZ (parent) vdev sector of the row. */
uint64_t b = (offset >> ashift) + row * logical_cols;
/*
* If we are in the middle of a reflow, and any part of this
* row has not been copied, then use the old location of
* this row.
*/
int row_phys_cols = physical_cols;
if (b + (logical_cols - nparity) > reflow_offset >> ashift)
row_phys_cols--;
/* starting child of this row */
uint64_t child_id = b % row_phys_cols;
/* The starting byte offset on each child vdev. */
uint64_t child_offset = (b / row_phys_cols) << ashift;
/*
* We set cols to the entire width of the block, even
* if this row is shorter. This is needed because parity
* generation (for Q and R) needs to know the entire width,
* because it treats the short row as though it was
* full-width (and the "phantom" sectors were zero-filled).
*
* Another approach to this would be to set cols shorter
* (to just the number of columns that we might do i/o to)
* and have another mechanism to tell the parity generation
* about the "entire width". Reconstruction (at least
* vdev_raidz_reconstruct_general()) would also need to
* know about the "entire width".
*/
rr->rr_cols = cols;
rr->rr_bigcols = bc;
rr->rr_missingdata = 0;
rr->rr_missingparity = 0;
rr->rr_firstdatacol = nparity;
rr->rr_abd_empty = NULL;
rr->rr_nempty = 0;
for (int c = 0; c < rr->rr_cols; c++, child_id++) {
if (child_id >= row_phys_cols) {
child_id -= row_phys_cols;
child_offset += 1ULL << ashift;
}
rr->rr_col[c].rc_devidx = child_id;
rr->rr_col[c].rc_offset = child_offset;
rr->rr_col[c].rc_orig_data = NULL;
rr->rr_col[c].rc_error = 0;
rr->rr_col[c].rc_tried = 0;
rr->rr_col[c].rc_skipped = 0;
rr->rr_col[c].rc_need_orig_restore = B_FALSE;
uint64_t dc = c - rr->rr_firstdatacol;
if (c < rr->rr_firstdatacol) {
rr->rr_col[c].rc_size = 1ULL << ashift;
rr->rr_col[c].rc_abd =
abd_alloc_linear(rr->rr_col[c].rc_size,
B_TRUE);
} else if (row == rows - 1 && bc != 0 && c >= bc) {
/*
* Past the end, this for parity generation.
*/
rr->rr_col[c].rc_size = 0;
rr->rr_col[c].rc_abd = NULL;
} else {
/*
* "data column" (col excluding parity)
* Add an ASCII art diagram here
*/
uint64_t off;
if (c < bc || r == 0) {
off = dc * rows + row;
} else {
off = r * rows +
(dc - r) * (rows - 1) + row;
}
rr->rr_col[c].rc_size = 1ULL << ashift;
rr->rr_col[c].rc_abd = abd_get_offset_struct(
&rr->rr_col[c].rc_abdstruct,
abd, off << ashift, 1 << ashift);
}
asize += rr->rr_col[c].rc_size;
}
/*
* If all data stored spans all columns, there's a danger that
* parity will always be on the same device and, since parity
* isn't read during normal operation, that that device's I/O
* bandwidth won't be used effectively. We therefore switch
* the parity every 1MB.
*
* ...at least that was, ostensibly, the theory. As a practical
* matter unless we juggle the parity between all devices
* evenly, we won't see any benefit. Further, occasional writes
* that aren't a multiple of the LCM of the number of children
* and the minimum stripe width are sufficient to avoid pessimal
* behavior. Unfortunately, this decision created an implicit
* on-disk format requirement that we need to support for all
* eternity, but only for single-parity RAID-Z.
*
* If we intend to skip a sector in the zeroth column for
* padding we must make sure to note this swap. We will never
* intend to skip the first column since at least one data and
* one parity column must appear in each row.
*/
if (rr->rr_firstdatacol == 1 && rr->rr_cols > 1 &&
(offset & (1ULL << 20))) {
ASSERT(rr->rr_cols >= 2);
ASSERT(rr->rr_col[0].rc_size == rr->rr_col[1].rc_size);
devidx = rr->rr_col[0].rc_devidx;
uint64_t o = rr->rr_col[0].rc_offset;
rr->rr_col[0].rc_devidx = rr->rr_col[1].rc_devidx;
rr->rr_col[0].rc_offset = rr->rr_col[1].rc_offset;
rr->rr_col[1].rc_devidx = devidx;
rr->rr_col[1].rc_offset = o;
}
}
ASSERT3U(asize, ==, tot << ashift);
/* init RAIDZ parity ops */
rm->rm_ops = vdev_raidz_math_get_ops();
return (rm);
}
static raidz_map_t *
init_raidz_map(raidz_test_opts_t *opts, zio_t **zio, const int parity)
{
@@ -330,8 +560,15 @@ init_raidz_map(raidz_test_opts_t *opts, zio_t **zio, const int parity)
(*zio)->io_abd = raidz_alloc(alloc_dsize);
init_zio_abd(*zio);
rm = vdev_raidz_map_alloc(*zio, opts->rto_ashift,
total_ncols, parity);
if (opts->rto_expand) {
rm = vdev_raidz_map_alloc_expanded((*zio)->io_abd,
(*zio)->io_size, (*zio)->io_offset,
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset);
} else {
rm = vdev_raidz_map_alloc(*zio, opts->rto_ashift,
total_ncols, parity);
}
VERIFY(rm);
/* Make sure code columns are destroyed */
@@ -420,7 +657,7 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
if (fn < RAIDZ_REC_PQ) {
/* can reconstruct 1 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_cols - raidz_parity(rm))
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm))
continue;
/* Check if should stop */
@@ -445,10 +682,11 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
} else if (fn < RAIDZ_REC_PQR) {
/* can reconstruct 2 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_cols - raidz_parity(rm))
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm))
continue;
for (x1 = x0 + 1; x1 < opts->rto_dcols; x1++) {
if (x1 >= rm->rm_cols - raidz_parity(rm))
if (x1 >= rm->rm_row[0]->rr_cols -
raidz_parity(rm))
continue;
/* Check if should stop */
@@ -475,14 +713,15 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
} else {
/* can reconstruct 3 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_cols - raidz_parity(rm))
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm))
continue;
for (x1 = x0 + 1; x1 < opts->rto_dcols; x1++) {
if (x1 >= rm->rm_cols - raidz_parity(rm))
if (x1 >= rm->rm_row[0]->rr_cols -
raidz_parity(rm))
continue;
for (x2 = x1 + 1; x2 < opts->rto_dcols; x2++) {
if (x2 >=
rm->rm_cols - raidz_parity(rm))
if (x2 >= rm->rm_row[0]->rr_cols -
raidz_parity(rm))
continue;
/* Check if should stop */
@@ -598,7 +837,7 @@ static kcondvar_t sem_cv;
static int max_free_slots;
static int free_slots;
static void
static __attribute__((noreturn)) void
sweep_thread(void *arg)
{
int err = 0;
@@ -698,8 +937,10 @@ run_sweep(void)
opts = umem_zalloc(sizeof (raidz_test_opts_t), UMEM_NOFAIL);
opts->rto_ashift = ashift_v[a];
opts->rto_dcols = dcols_v[d];
opts->rto_offset = (1 << ashift_v[a]) * rand();
opts->rto_offset = (1ULL << ashift_v[a]) * rand();
opts->rto_dsize = size_v[s];
opts->rto_expand = rto_opts.rto_expand;
opts->rto_expand_offset = rto_opts.rto_expand_offset;
opts->rto_v = 0; /* be quiet */
VERIFY3P(thread_create(NULL, 0, sweep_thread, (void *) opts,
@@ -732,6 +973,7 @@ exit:
return (sweep_state == SWEEP_ERROR ? SWEEP_ERROR : 0);
}
int
main(int argc, char **argv)
{
@@ -739,8 +981,8 @@ main(int argc, char **argv)
struct sigaction action;
int err = 0;
/* init gdb string early */
(void) sprintf(gdb, gdb_tmpl, getpid());
/* init gdb pid string early */
(void) sprintf(pid_s, "%d", getpid());
action.sa_handler = sig_handler;
sigemptyset(&action.sa_mask);
@@ -757,7 +999,7 @@ main(int argc, char **argv)
process_options(argc, argv);
kernel_init(FREAD);
kernel_init(SPA_MODE_READ);
/* setup random data because rand() is not reentrant */
rand_data = (int *)umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
+24 -14
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -28,7 +28,7 @@
#include <sys/spa.h>
static const char *raidz_impl_names[] = {
static const char *const raidz_impl_names[] = {
"original",
"scalar",
"sse2",
@@ -38,18 +38,27 @@ static const char *raidz_impl_names[] = {
"avx512bw",
"aarch64_neon",
"aarch64_neonx2",
"powerpc_altivec",
NULL
};
enum raidz_verbosity {
D_ALL,
D_INFO,
D_DEBUG,
};
typedef struct raidz_test_opts {
size_t rto_ashift;
size_t rto_offset;
uint64_t rto_offset;
size_t rto_dcols;
size_t rto_dsize;
size_t rto_v;
enum raidz_verbosity rto_v;
size_t rto_sweep;
size_t rto_sweep_timeout;
size_t rto_benchmark;
size_t rto_expand;
uint64_t rto_expand_offset;
size_t rto_sanity;
size_t rto_gdb;
@@ -65,9 +74,11 @@ static const raidz_test_opts_t rto_opts_defaults = {
.rto_offset = 1ULL << 0,
.rto_dcols = 8,
.rto_dsize = 1<<19,
.rto_v = 0,
.rto_v = D_ALL,
.rto_sweep = 0,
.rto_benchmark = 0,
.rto_expand = 0,
.rto_expand_offset = -1ULL,
.rto_sanity = 0,
.rto_gdb = 0,
.rto_should_stop = B_FALSE
@@ -81,23 +92,19 @@ static inline size_t ilog2(size_t a)
}
#define D_ALL 0
#define D_INFO 1
#define D_DEBUG 2
#define LOG(lvl, a...) \
#define LOG(lvl, ...) \
{ \
if (rto_opts.rto_v >= lvl) \
(void) fprintf(stdout, a); \
(void) fprintf(stdout, __VA_ARGS__); \
} \
#define LOG_OPT(lvl, opt, a...) \
#define LOG_OPT(lvl, opt, ...) \
{ \
if (opt->rto_v >= lvl) \
(void) fprintf(stdout, a); \
(void) fprintf(stdout, __VA_ARGS__); \
} \
#define ERR(a...) (void) fprintf(stderr, a)
#define ERR(...) (void) fprintf(stderr, __VA_ARGS__)
#define DBLSEP "================\n"
@@ -112,4 +119,7 @@ void init_zio_abd(zio_t *zio);
void run_raidz_benchmark(void);
struct raidz_map *vdev_raidz_map_alloc_expanded(abd_t *, uint64_t, uint64_t,
uint64_t, uint64_t, uint64_t, uint64_t, uint64_t);
#endif /* RAIDZ_TEST_H */
-1
View File
@@ -1 +0,0 @@
dist_udev_SCRIPTS = vdev_id
-1
View File
@@ -1 +0,0 @@
/zdb
+12 -14
View File
@@ -1,19 +1,17 @@
include $(top_srcdir)/config/Rules.am
zdb_CPPFLAGS = $(AM_CPPFLAGS) $(FORCEDEBUG_CPPFLAGS)
zdb_CFLAGS = $(AM_CFLAGS) $(LIBCRYPTO_CFLAGS)
# Unconditionally enable debugging for zdb
AM_CPPFLAGS += -DDEBUG -UNDEBUG
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
sbin_PROGRAMS = zdb
sbin_PROGRAMS += zdb
CPPCHECKTARGETS += zdb
zdb_SOURCES = \
zdb.c \
zdb_il.c \
zdb.h
%D%/zdb.c \
%D%/zdb.h \
%D%/zdb_il.c
zdb_LDADD = \
$(top_builddir)/lib/libnvpair/libnvpair.la \
$(top_builddir)/lib/libzpool/libzpool.la
libzpool.la \
libzfs_core.la \
libnvpair.la
zdb_LDADD += $(LIBCRYPTO_LIBS)
+3948 -607
View File
File diff suppressed because it is too large Load Diff
+1 -1
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
+145 -35
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -60,11 +60,11 @@ print_log_bp(const blkptr_t *bp, const char *prefix)
(void) printf("%s%s\n", prefix, blkbuf);
}
/* ARGSUSED */
static void
zil_prt_rec_create(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_create(zilog_t *zilog, int txtype, const void *arg)
{
lr_create_t *lr = arg;
(void) zilog;
const lr_create_t *lr = arg;
time_t crtime = lr->lr_crtime[0];
char *name, *link;
lr_attr_t *lrattr;
@@ -96,44 +96,52 @@ zil_prt_rec_create(zilog_t *zilog, int txtype, void *arg)
(u_longlong_t)lr->lr_gen, (u_longlong_t)lr->lr_rdev);
}
/* ARGSUSED */
static void
zil_prt_rec_remove(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_remove(zilog_t *zilog, int txtype, const void *arg)
{
lr_remove_t *lr = arg;
(void) zilog, (void) txtype;
const lr_remove_t *lr = arg;
(void) printf("%sdoid %llu, name %s\n", tab_prefix,
(u_longlong_t)lr->lr_doid, (char *)(lr + 1));
}
/* ARGSUSED */
static void
zil_prt_rec_link(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_link(zilog_t *zilog, int txtype, const void *arg)
{
lr_link_t *lr = arg;
(void) zilog, (void) txtype;
const lr_link_t *lr = arg;
(void) printf("%sdoid %llu, link_obj %llu, name %s\n", tab_prefix,
(u_longlong_t)lr->lr_doid, (u_longlong_t)lr->lr_link_obj,
(char *)(lr + 1));
}
/* ARGSUSED */
static void
zil_prt_rec_rename(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_rename(zilog_t *zilog, int txtype, const void *arg)
{
lr_rename_t *lr = arg;
(void) zilog, (void) txtype;
const lr_rename_t *lr = arg;
char *snm = (char *)(lr + 1);
char *tnm = snm + strlen(snm) + 1;
(void) printf("%ssdoid %llu, tdoid %llu\n", tab_prefix,
(u_longlong_t)lr->lr_sdoid, (u_longlong_t)lr->lr_tdoid);
(void) printf("%ssrc %s tgt %s\n", tab_prefix, snm, tnm);
switch (txtype) {
case TX_RENAME_EXCHANGE:
(void) printf("%sflags RENAME_EXCHANGE\n", tab_prefix);
break;
case TX_RENAME_WHITEOUT:
(void) printf("%sflags RENAME_WHITEOUT\n", tab_prefix);
break;
}
}
/* ARGSUSED */
static int
zil_prt_rec_write_cb(void *data, size_t len, void *unused)
{
(void) unused;
char *cdata = data;
for (size_t i = 0; i < len; i++) {
@@ -146,13 +154,12 @@ zil_prt_rec_write_cb(void *data, size_t len, void *unused)
return (0);
}
/* ARGSUSED */
static void
zil_prt_rec_write(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
{
lr_write_t *lr = arg;
const lr_write_t *lr = arg;
abd_t *data;
blkptr_t *bp = &lr->lr_blkptr;
const blkptr_t *bp = &lr->lr_blkptr;
zbookmark_phys_t zb;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
int error;
@@ -161,7 +168,7 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, void *arg)
(u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_offset,
(u_longlong_t)lr->lr_length);
if (txtype == TX_WRITE2 || verbose < 5)
if (txtype == TX_WRITE2 || verbose < 4)
return;
if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) {
@@ -171,6 +178,8 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, void *arg)
"will claim" : "won't claim");
print_log_bp(bp, tab_prefix);
if (verbose < 5)
return;
if (BP_IS_HOLE(bp)) {
(void) printf("\t\t\tLSIZE 0x%llx\n",
(u_longlong_t)BP_GET_LSIZE(bp));
@@ -183,6 +192,7 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, void *arg)
return;
}
ASSERT3U(BP_GET_LSIZE(bp), !=, 0);
SET_BOOKMARK(&zb, dmu_objset_id(zilog->zl_os),
lr->lr_foid, ZB_ZIL_LEVEL,
lr->lr_offset / BP_GET_LSIZE(bp));
@@ -194,6 +204,9 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, void *arg)
if (error)
goto out;
} else {
if (verbose < 5)
return;
/* data is stored after the end of the lr_write record */
data = abd_alloc(lr->lr_length, B_FALSE);
abd_copy_from_buf(data, lr + 1, lr->lr_length);
@@ -209,22 +222,44 @@ out:
abd_free(data);
}
/* ARGSUSED */
static void
zil_prt_rec_truncate(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_write_enc(zilog_t *zilog, int txtype, const void *arg)
{
lr_truncate_t *lr = arg;
(void) txtype;
const lr_write_t *lr = arg;
const blkptr_t *bp = &lr->lr_blkptr;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
(void) printf("%s(encrypted)\n", tab_prefix);
if (verbose < 4)
return;
if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) {
(void) printf("%shas blkptr, %s\n", tab_prefix,
!BP_IS_HOLE(bp) &&
bp->blk_birth >= spa_min_claim_txg(zilog->zl_spa) ?
"will claim" : "won't claim");
print_log_bp(bp, tab_prefix);
}
}
static void
zil_prt_rec_truncate(zilog_t *zilog, int txtype, const void *arg)
{
(void) zilog, (void) txtype;
const lr_truncate_t *lr = arg;
(void) printf("%sfoid %llu, offset 0x%llx, length 0x%llx\n", tab_prefix,
(u_longlong_t)lr->lr_foid, (longlong_t)lr->lr_offset,
(u_longlong_t)lr->lr_length);
}
/* ARGSUSED */
static void
zil_prt_rec_setattr(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_setattr(zilog_t *zilog, int txtype, const void *arg)
{
lr_setattr_t *lr = arg;
(void) zilog, (void) txtype;
const lr_setattr_t *lr = arg;
time_t atime = (time_t)lr->lr_atime[0];
time_t mtime = (time_t)lr->lr_mtime[0];
@@ -266,19 +301,83 @@ zil_prt_rec_setattr(zilog_t *zilog, int txtype, void *arg)
}
}
/* ARGSUSED */
static void
zil_prt_rec_acl(zilog_t *zilog, int txtype, void *arg)
zil_prt_rec_setsaxattr(zilog_t *zilog, int txtype, const void *arg)
{
lr_acl_t *lr = arg;
(void) zilog, (void) txtype;
const lr_setsaxattr_t *lr = arg;
char *name = (char *)(lr + 1);
(void) printf("%sfoid %llu\n", tab_prefix,
(u_longlong_t)lr->lr_foid);
(void) printf("%sXAT_NAME %s\n", tab_prefix, name);
if (lr->lr_size == 0) {
(void) printf("%sXAT_VALUE NULL\n", tab_prefix);
} else {
(void) printf("%sXAT_VALUE ", tab_prefix);
char *val = name + (strlen(name) + 1);
for (int i = 0; i < lr->lr_size; i++) {
(void) printf("%c", *val);
val++;
}
}
}
static void
zil_prt_rec_acl(zilog_t *zilog, int txtype, const void *arg)
{
(void) zilog, (void) txtype;
const lr_acl_t *lr = arg;
(void) printf("%sfoid %llu, aclcnt %llu\n", tab_prefix,
(u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_aclcnt);
}
typedef void (*zil_prt_rec_func_t)(zilog_t *, int, void *);
static void
zil_prt_rec_clone_range(zilog_t *zilog, int txtype, const void *arg)
{
(void) zilog, (void) txtype;
const lr_clone_range_t *lr = arg;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
(void) printf("%sfoid %llu, offset %llx, length %llx, blksize %llx\n",
tab_prefix, (u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_offset,
(u_longlong_t)lr->lr_length, (u_longlong_t)lr->lr_blksz);
if (verbose < 4)
return;
for (unsigned int i = 0; i < lr->lr_nbps; i++) {
(void) printf("%s[%u/%llu] ", tab_prefix, i + 1,
(u_longlong_t)lr->lr_nbps);
print_log_bp(&lr->lr_bps[i], "");
}
}
static void
zil_prt_rec_clone_range_enc(zilog_t *zilog, int txtype, const void *arg)
{
(void) zilog, (void) txtype;
const lr_clone_range_t *lr = arg;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
(void) printf("%s(encrypted)\n", tab_prefix);
if (verbose < 4)
return;
for (unsigned int i = 0; i < lr->lr_nbps; i++) {
(void) printf("%s[%u/%llu] ", tab_prefix, i + 1,
(u_longlong_t)lr->lr_nbps);
print_log_bp(&lr->lr_bps[i], "");
}
}
typedef void (*zil_prt_rec_func_t)(zilog_t *, int, const void *);
typedef struct zil_rec_info {
zil_prt_rec_func_t zri_print;
zil_prt_rec_func_t zri_print_enc;
const char *zri_name;
uint64_t zri_count;
} zil_rec_info_t;
@@ -293,7 +392,9 @@ static zil_rec_info_t zil_rec_info[TX_MAX_TYPE] = {
{.zri_print = zil_prt_rec_remove, .zri_name = "TX_RMDIR "},
{.zri_print = zil_prt_rec_link, .zri_name = "TX_LINK "},
{.zri_print = zil_prt_rec_rename, .zri_name = "TX_RENAME "},
{.zri_print = zil_prt_rec_write, .zri_name = "TX_WRITE "},
{.zri_print = zil_prt_rec_write,
.zri_print_enc = zil_prt_rec_write_enc,
.zri_name = "TX_WRITE "},
{.zri_print = zil_prt_rec_truncate, .zri_name = "TX_TRUNCATE "},
{.zri_print = zil_prt_rec_setattr, .zri_name = "TX_SETATTR "},
{.zri_print = zil_prt_rec_acl, .zri_name = "TX_ACL_V0 "},
@@ -305,12 +406,19 @@ static zil_rec_info_t zil_rec_info[TX_MAX_TYPE] = {
{.zri_print = zil_prt_rec_create, .zri_name = "TX_MKDIR_ATTR "},
{.zri_print = zil_prt_rec_create, .zri_name = "TX_MKDIR_ACL_ATTR "},
{.zri_print = zil_prt_rec_write, .zri_name = "TX_WRITE2 "},
{.zri_print = zil_prt_rec_setsaxattr,
.zri_name = "TX_SETSAXATTR "},
{.zri_print = zil_prt_rec_rename, .zri_name = "TX_RENAME_EXCHANGE "},
{.zri_print = zil_prt_rec_rename, .zri_name = "TX_RENAME_WHITEOUT "},
{.zri_print = zil_prt_rec_clone_range,
.zri_print_enc = zil_prt_rec_clone_range_enc,
.zri_name = "TX_CLONE_RANGE "},
};
/* ARGSUSED */
static int
print_log_record(zilog_t *zilog, lr_t *lr, void *arg, uint64_t claim_txg)
print_log_record(zilog_t *zilog, const lr_t *lr, void *arg, uint64_t claim_txg)
{
(void) arg, (void) claim_txg;
int txtype;
int verbose = MAX(dump_opt['d'], dump_opt['i']);
@@ -330,6 +438,8 @@ print_log_record(zilog_t *zilog, lr_t *lr, void *arg, uint64_t claim_txg)
if (txtype && verbose >= 3) {
if (!zilog->zl_os->os_encrypted) {
zil_rec_info[txtype].zri_print(zilog, txtype, lr);
} else if (zil_rec_info[txtype].zri_print_enc) {
zil_rec_info[txtype].zri_print_enc(zilog, txtype, lr);
} else {
(void) printf("%s(encrypted)\n", tab_prefix);
}
@@ -341,10 +451,11 @@ print_log_record(zilog_t *zilog, lr_t *lr, void *arg, uint64_t claim_txg)
return (0);
}
/* ARGSUSED */
static int
print_log_block(zilog_t *zilog, blkptr_t *bp, void *arg, uint64_t claim_txg)
print_log_block(zilog_t *zilog, const blkptr_t *bp, void *arg,
uint64_t claim_txg)
{
(void) arg;
char blkbuf[BP_SPRINTF_LEN + 10];
int verbose = MAX(dump_opt['d'], dump_opt['i']);
const char *claim;
@@ -395,7 +506,6 @@ print_log_stats(int verbose)
(void) printf("\n");
}
/* ARGSUSED */
void
dump_intent_log(zilog_t *zilog)
{
+38 -93
View File
@@ -1,101 +1,46 @@
include $(top_srcdir)/config/Rules.am
include $(srcdir)/%D%/zed.d/Makefile.am
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
zed_CFLAGS = $(AM_CFLAGS)
zed_CFLAGS += $(LIBUDEV_CFLAGS) $(LIBUUID_CFLAGS)
EXTRA_DIST = zed.d/README \
zed.d/history_event-zfs-list-cacher.sh.in
sbin_PROGRAMS += zed
CPPCHECKTARGETS += zed
sbin_PROGRAMS = zed
ZED_SRC = \
zed.c \
zed.h \
zed_conf.c \
zed_conf.h \
zed_disk_event.c \
zed_disk_event.h \
zed_event.c \
zed_event.h \
zed_exec.c \
zed_exec.h \
zed_file.c \
zed_file.h \
zed_log.c \
zed_log.h \
zed_strings.c \
zed_strings.h
FMA_SRC = \
agents/zfs_agents.c \
agents/zfs_agents.h \
agents/zfs_diagnosis.c \
agents/zfs_mod.c \
agents/zfs_retire.c \
agents/fmd_api.c \
agents/fmd_api.h \
agents/fmd_serd.c \
agents/fmd_serd.h
zed_SOURCES = $(ZED_SRC) $(FMA_SRC)
zed_SOURCES = \
%D%/zed.c \
%D%/zed.h \
%D%/zed_conf.c \
%D%/zed_conf.h \
%D%/zed_disk_event.c \
%D%/zed_disk_event.h \
%D%/zed_event.c \
%D%/zed_event.h \
%D%/zed_exec.c \
%D%/zed_exec.h \
%D%/zed_file.c \
%D%/zed_file.h \
%D%/zed_log.c \
%D%/zed_log.h \
%D%/zed_strings.c \
%D%/zed_strings.h \
\
%D%/agents/fmd_api.c \
%D%/agents/fmd_api.h \
%D%/agents/fmd_serd.c \
%D%/agents/fmd_serd.h \
%D%/agents/zfs_agents.c \
%D%/agents/zfs_agents.h \
%D%/agents/zfs_diagnosis.c \
%D%/agents/zfs_mod.c \
%D%/agents/zfs_retire.c
zed_LDADD = \
$(top_builddir)/lib/libnvpair/libnvpair.la \
$(top_builddir)/lib/libuutil/libuutil.la \
$(top_builddir)/lib/libzfs/libzfs.la
libzfs.la \
libzfs_core.la \
libnvpair.la \
libuutil.la
zed_LDADD += -lrt
zed_LDADD += -lrt $(LIBATOMIC_LIBS) $(LIBUDEV_LIBS) $(LIBUUID_LIBS)
zed_LDFLAGS = -pthread
zedconfdir = $(sysconfdir)/zfs/zed.d
dist_zedconf_DATA = \
zed.d/zed-functions.sh \
zed.d/zed.rc
zedexecdir = $(zfsexecdir)/zed.d
dist_zedexec_SCRIPTS = \
zed.d/all-debug.sh \
zed.d/all-syslog.sh \
zed.d/data-notify.sh \
zed.d/generic-notify.sh \
zed.d/resilver_finish-notify.sh \
zed.d/scrub_finish-notify.sh \
zed.d/statechange-led.sh \
zed.d/statechange-notify.sh \
zed.d/vdev_clear-led.sh \
zed.d/vdev_attach-led.sh \
zed.d/pool_import-led.sh \
zed.d/resilver_finish-start-scrub.sh
nodist_zedexec_SCRIPTS = zed.d/history_event-zfs-list-cacher.sh
$(nodist_zedexec_SCRIPTS): %: %.in
-$(SED) -e 's,@bindir\@,$(bindir),g' \
-e 's,@runstatedir\@,$(runstatedir),g' \
-e 's,@sbindir\@,$(sbindir),g' \
-e 's,@sysconfdir\@,$(sysconfdir),g' \
$< >'$@'
zedconfdefaults = \
all-syslog.sh \
data-notify.sh \
resilver_finish-notify.sh \
scrub_finish-notify.sh \
statechange-led.sh \
statechange-notify.sh \
vdev_clear-led.sh \
vdev_attach-led.sh \
pool_import-led.sh \
resilver_finish-start-scrub.sh
install-data-hook:
$(MKDIR_P) "$(DESTDIR)$(zedconfdir)"
for f in $(zedconfdefaults); do \
test -f "$(DESTDIR)$(zedconfdir)/$${f}" -o \
-L "$(DESTDIR)$(zedconfdir)/$${f}" || \
ln -s "$(zedexecdir)/$${f}" "$(DESTDIR)$(zedconfdir)"; \
done
chmod 0600 "$(DESTDIR)$(zedconfdir)/zed.rc"
dist_noinst_DATA += %D%/agents/README.md
+62 -46
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -22,10 +22,11 @@
* Copyright (c) 2004, 2010, Oracle and/or its affiliates. All rights reserved.
*
* Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2023, Klara Inc.
*/
/*
* This file imlements the minimal FMD module API required to support the
* This file implements the minimal FMD module API required to support the
* fault logic modules in ZED. This support includes module registration,
* memory allocation, module property accessors, basic case management,
* one-shot timers and SERD engines.
@@ -38,7 +39,7 @@
#include <sys/fm/protocol.h>
#include <uuid/uuid.h>
#include <signal.h>
#include <strings.h>
#include <string.h>
#include <time.h>
#include "fmd_api.h"
@@ -97,6 +98,7 @@ _umem_logging_init(void)
int
fmd_hdl_register(fmd_hdl_t *hdl, int version, const fmd_hdl_info_t *mip)
{
(void) version;
fmd_module_t *mp = (fmd_module_t *)hdl;
mp->mod_info = mip;
@@ -179,18 +181,21 @@ fmd_hdl_getspecific(fmd_hdl_t *hdl)
void *
fmd_hdl_alloc(fmd_hdl_t *hdl, size_t size, int flags)
{
(void) hdl;
return (umem_alloc(size, flags));
}
void *
fmd_hdl_zalloc(fmd_hdl_t *hdl, size_t size, int flags)
{
(void) hdl;
return (umem_zalloc(size, flags));
}
void
fmd_hdl_free(fmd_hdl_t *hdl, void *data, size_t size)
{
(void) hdl;
umem_free(data, size);
}
@@ -217,6 +222,8 @@ fmd_hdl_debug(fmd_hdl_t *hdl, const char *format, ...)
int32_t
fmd_prop_get_int32(fmd_hdl_t *hdl, const char *name)
{
(void) hdl;
/*
* These can be looked up in mp->modinfo->fmdi_props
* For now we just hard code for phase 2. In the
@@ -225,26 +232,6 @@ fmd_prop_get_int32(fmd_hdl_t *hdl, const char *name)
if (strcmp(name, "spare_on_remove") == 0)
return (1);
if (strcmp(name, "io_N") == 0 || strcmp(name, "checksum_N") == 0)
return (10); /* N = 10 events */
return (0);
}
int64_t
fmd_prop_get_int64(fmd_hdl_t *hdl, const char *name)
{
/*
* These can be looked up in mp->modinfo->fmdi_props
* For now we just hard code for phase 2. In the
* future, there can be a ZED based override.
*/
if (strcmp(name, "remove_timeout") == 0)
return (15ULL * 1000ULL * 1000ULL * 1000ULL); /* 15 sec */
if (strcmp(name, "io_T") == 0 || strcmp(name, "checksum_T") == 0)
return (1000ULL * 1000ULL * 1000ULL * 600ULL); /* 10 min */
return (0);
}
@@ -334,22 +321,24 @@ fmd_case_uuresolved(fmd_hdl_t *hdl, const char *uuid)
fmd_hdl_debug(hdl, "case resolved by uuid (%s)", uuid);
}
int
boolean_t
fmd_case_solved(fmd_hdl_t *hdl, fmd_case_t *cp)
{
return ((cp->ci_state >= FMD_CASE_SOLVED) ? FMD_B_TRUE : FMD_B_FALSE);
(void) hdl;
return (cp->ci_state >= FMD_CASE_SOLVED);
}
void
fmd_case_add_ereport(fmd_hdl_t *hdl, fmd_case_t *cp, fmd_event_t *ep)
{
(void) hdl, (void) cp, (void) ep;
}
static void
zed_log_fault(nvlist_t *nvl, const char *uuid, const char *code)
{
nvlist_t *rsrc;
char *strval;
const char *strval;
uint64_t guid;
uint8_t byte;
@@ -362,7 +351,7 @@ zed_log_fault(nvlist_t *nvl, const char *uuid, const char *code)
if (code != NULL)
zed_log_msg(LOG_INFO, "\t%s: %s", FM_SUSPECT_DIAG_CODE, code);
if (nvlist_lookup_uint8(nvl, FM_FAULT_CERTAINTY, &byte) == 0)
zed_log_msg(LOG_INFO, "\t%s: %llu", FM_FAULT_CERTAINTY, byte);
zed_log_msg(LOG_INFO, "\t%s: %hhu", FM_FAULT_CERTAINTY, byte);
if (nvlist_lookup_nvlist(nvl, FM_FAULT_RESOURCE, &rsrc) == 0) {
if (nvlist_lookup_string(rsrc, FM_FMRI_SCHEME, &strval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %s", FM_FMRI_SCHEME,
@@ -379,7 +368,8 @@ zed_log_fault(nvlist_t *nvl, const char *uuid, const char *code)
static const char *
fmd_fault_mkcode(nvlist_t *fault)
{
char *class, *code = "-";
const char *class;
const char *code = "-";
/*
* Note: message codes come from: openzfs/usr/src/cmd/fm/dicts/ZFS.po
@@ -428,7 +418,8 @@ fmd_case_add_suspect(fmd_hdl_t *hdl, fmd_case_t *cp, nvlist_t *fault)
err |= nvlist_add_string(nvl, FM_SUSPECT_DIAG_CODE, code);
err |= nvlist_add_int64_array(nvl, FM_SUSPECT_DIAG_TIME, tod, 2);
err |= nvlist_add_uint32(nvl, FM_SUSPECT_FAULT_SZ, 1);
err |= nvlist_add_nvlist_array(nvl, FM_SUSPECT_FAULT_LIST, &fault, 1);
err |= nvlist_add_nvlist_array(nvl, FM_SUSPECT_FAULT_LIST,
(const nvlist_t **)&fault, 1);
if (err)
zed_log_die("failed to populate nvlist");
@@ -443,19 +434,21 @@ fmd_case_add_suspect(fmd_hdl_t *hdl, fmd_case_t *cp, nvlist_t *fault)
void
fmd_case_setspecific(fmd_hdl_t *hdl, fmd_case_t *cp, void *data)
{
(void) hdl;
cp->ci_data = data;
}
void *
fmd_case_getspecific(fmd_hdl_t *hdl, fmd_case_t *cp)
{
(void) hdl;
return (cp->ci_data);
}
void
fmd_buf_create(fmd_hdl_t *hdl, fmd_case_t *cp, const char *name, size_t size)
{
assert(strcmp(name, "data") == 0);
assert(strcmp(name, "data") == 0), (void) name;
assert(cp->ci_bufptr == NULL);
assert(size < (1024 * 1024));
@@ -467,22 +460,24 @@ void
fmd_buf_read(fmd_hdl_t *hdl, fmd_case_t *cp,
const char *name, void *buf, size_t size)
{
assert(strcmp(name, "data") == 0);
(void) hdl;
assert(strcmp(name, "data") == 0), (void) name;
assert(cp->ci_bufptr != NULL);
assert(size <= cp->ci_bufsiz);
bcopy(cp->ci_bufptr, buf, size);
memcpy(buf, cp->ci_bufptr, size);
}
void
fmd_buf_write(fmd_hdl_t *hdl, fmd_case_t *cp,
const char *name, const void *buf, size_t size)
{
assert(strcmp(name, "data") == 0);
(void) hdl;
assert(strcmp(name, "data") == 0), (void) name;
assert(cp->ci_bufptr != NULL);
assert(cp->ci_bufsiz >= size);
bcopy(buf, cp->ci_bufptr, size);
memcpy(cp->ci_bufptr, buf, size);
}
/* SERD Engines */
@@ -519,6 +514,19 @@ fmd_serd_exists(fmd_hdl_t *hdl, const char *name)
return (fmd_serd_eng_lookup(&mp->mod_serds, name) != NULL);
}
int
fmd_serd_active(fmd_hdl_t *hdl, const char *name)
{
fmd_module_t *mp = (fmd_module_t *)hdl;
fmd_serd_eng_t *sgp;
if ((sgp = fmd_serd_eng_lookup(&mp->mod_serds, name)) == NULL) {
zed_log_msg(LOG_ERR, "serd engine '%s' does not exist", name);
return (0);
}
return (fmd_serd_eng_fired(sgp) || !fmd_serd_eng_empty(sgp));
}
void
fmd_serd_reset(fmd_hdl_t *hdl, const char *name)
{
@@ -527,12 +535,10 @@ fmd_serd_reset(fmd_hdl_t *hdl, const char *name)
if ((sgp = fmd_serd_eng_lookup(&mp->mod_serds, name)) == NULL) {
zed_log_msg(LOG_ERR, "serd engine '%s' does not exist", name);
return;
} else {
fmd_serd_eng_reset(sgp);
fmd_hdl_debug(hdl, "serd_reset %s", name);
}
fmd_serd_eng_reset(sgp);
fmd_hdl_debug(hdl, "serd_reset %s", name);
}
int
@@ -540,16 +546,21 @@ fmd_serd_record(fmd_hdl_t *hdl, const char *name, fmd_event_t *ep)
{
fmd_module_t *mp = (fmd_module_t *)hdl;
fmd_serd_eng_t *sgp;
int err;
if ((sgp = fmd_serd_eng_lookup(&mp->mod_serds, name)) == NULL) {
zed_log_msg(LOG_ERR, "failed to add record to SERD engine '%s'",
name);
return (FMD_B_FALSE);
return (0);
}
err = fmd_serd_eng_record(sgp, ep->ev_hrt);
return (fmd_serd_eng_record(sgp, ep->ev_hrt));
}
return (err);
void
fmd_serd_gc(fmd_hdl_t *hdl)
{
fmd_module_t *mp = (fmd_module_t *)hdl;
fmd_serd_hash_apply(&mp->mod_serds, fmd_serd_eng_gc, NULL);
}
/* FMD Timers */
@@ -563,10 +574,10 @@ _timer_notify(union sigval sv)
const fmd_hdl_ops_t *ops = mp->mod_info->fmdi_ops;
struct itimerspec its;
fmd_hdl_debug(hdl, "timer fired (%p)", ftp->ft_tid);
fmd_hdl_debug(hdl, "%s timer fired (%p)", mp->mod_name, ftp->ft_tid);
/* disarm the timer */
bzero(&its, sizeof (struct itimerspec));
memset(&its, 0, sizeof (struct itimerspec));
timer_settime(ftp->ft_tid, 0, &its, NULL);
/* Note that the fmdo_timeout can remove this timer */
@@ -582,6 +593,7 @@ _timer_notify(union sigval sv)
fmd_timer_t *
fmd_timer_install(fmd_hdl_t *hdl, void *arg, fmd_event_t *ep, hrtime_t delta)
{
(void) ep;
struct sigevent sev;
struct itimerspec its;
fmd_timer_t *ftp;
@@ -599,6 +611,7 @@ fmd_timer_install(fmd_hdl_t *hdl, void *arg, fmd_event_t *ep, hrtime_t delta)
sev.sigev_notify_function = _timer_notify;
sev.sigev_notify_attributes = NULL;
sev.sigev_value.sival_ptr = ftp;
sev.sigev_signo = 0;
timer_create(CLOCK_REALTIME, &sev, &ftp->ft_tid);
timer_settime(ftp->ft_tid, 0, &its, NULL);
@@ -625,6 +638,7 @@ nvlist_t *
fmd_nvl_create_fault(fmd_hdl_t *hdl, const char *class, uint8_t certainty,
nvlist_t *asru, nvlist_t *fru, nvlist_t *resource)
{
(void) hdl;
nvlist_t *nvl;
int err = 0;
@@ -688,7 +702,8 @@ fmd_strmatch(const char *s, const char *p)
int
fmd_nvl_class_match(fmd_hdl_t *hdl, nvlist_t *nvl, const char *pattern)
{
char *class;
(void) hdl;
const char *class;
return (nvl != NULL &&
nvlist_lookup_string(nvl, FM_CLASS, &class) == 0 &&
@@ -698,6 +713,7 @@ fmd_nvl_class_match(fmd_hdl_t *hdl, nvlist_t *nvl, const char *pattern)
nvlist_t *
fmd_nvl_alloc(fmd_hdl_t *hdl, int flags)
{
(void) hdl, (void) flags;
nvlist_t *nvl = NULL;
if (nvlist_alloc(&nvl, NV_UNIQUE_NAME, 0) != 0)
+4 -8
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -72,10 +72,6 @@ typedef struct fmd_case {
} fmd_case_t;
#define FMD_B_FALSE 0 /* false value for booleans as int */
#define FMD_B_TRUE 1 /* true value for booleans as int */
#define FMD_CASE_UNSOLVED 0 /* case is not yet solved (waiting) */
#define FMD_CASE_SOLVED 1 /* case is solved (suspects added) */
#define FMD_CASE_CLOSE_WAIT 2 /* case is executing fmdo_close() */
@@ -155,7 +151,6 @@ extern void fmd_hdl_vdebug(fmd_hdl_t *, const char *, va_list);
extern void fmd_hdl_debug(fmd_hdl_t *, const char *, ...);
extern int32_t fmd_prop_get_int32(fmd_hdl_t *, const char *);
extern int64_t fmd_prop_get_int64(fmd_hdl_t *, const char *);
#define FMD_STAT_NOALLOC 0x0 /* fmd should use caller's memory */
#define FMD_STAT_ALLOC 0x1 /* fmd should allocate stats memory */
@@ -176,8 +171,7 @@ extern int fmd_case_uuclosed(fmd_hdl_t *, const char *);
extern int fmd_case_uuisresolved(fmd_hdl_t *, const char *);
extern void fmd_case_uuresolved(fmd_hdl_t *, const char *);
extern int fmd_case_solved(fmd_hdl_t *, fmd_case_t *);
extern int fmd_case_closed(fmd_hdl_t *, fmd_case_t *);
extern boolean_t fmd_case_solved(fmd_hdl_t *, fmd_case_t *);
extern void fmd_case_add_ereport(fmd_hdl_t *, fmd_case_t *, fmd_event_t *);
extern void fmd_case_add_serd(fmd_hdl_t *, fmd_case_t *, const char *);
@@ -200,10 +194,12 @@ extern size_t fmd_buf_size(fmd_hdl_t *, fmd_case_t *, const char *);
extern void fmd_serd_create(fmd_hdl_t *, const char *, uint_t, hrtime_t);
extern void fmd_serd_destroy(fmd_hdl_t *, const char *);
extern int fmd_serd_exists(fmd_hdl_t *, const char *);
extern int fmd_serd_active(fmd_hdl_t *, const char *);
extern void fmd_serd_reset(fmd_hdl_t *, const char *);
extern int fmd_serd_record(fmd_hdl_t *, const char *, fmd_event_t *);
extern int fmd_serd_fired(fmd_hdl_t *, const char *);
extern int fmd_serd_empty(fmd_hdl_t *, const char *);
extern void fmd_serd_gc(fmd_hdl_t *);
extern id_t fmd_timer_install(fmd_hdl_t *, void *, fmd_event_t *, hrtime_t);
extern void fmd_timer_remove(fmd_hdl_t *, id_t);
+29 -9
View File
@@ -7,7 +7,7 @@
* with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -29,7 +29,7 @@
#include <assert.h>
#include <stddef.h>
#include <stdlib.h>
#include <strings.h>
#include <string.h>
#include <sys/list.h>
#include <sys/time.h>
@@ -74,9 +74,18 @@ fmd_serd_eng_alloc(const char *name, uint64_t n, hrtime_t t)
fmd_serd_eng_t *sgp;
sgp = malloc(sizeof (fmd_serd_eng_t));
bzero(sgp, sizeof (fmd_serd_eng_t));
if (sgp == NULL) {
perror("malloc");
exit(EXIT_FAILURE);
}
memset(sgp, 0, sizeof (fmd_serd_eng_t));
sgp->sg_name = strdup(name);
if (sgp->sg_name == NULL) {
perror("strdup");
exit(EXIT_FAILURE);
}
sgp->sg_flags = FMD_SERD_DIRTY;
sgp->sg_n = n;
sgp->sg_t = t;
@@ -123,6 +132,12 @@ fmd_serd_hash_create(fmd_serd_hash_t *shp)
shp->sh_hashlen = FMD_STR_BUCKETS;
shp->sh_hash = calloc(shp->sh_hashlen, sizeof (void *));
shp->sh_count = 0;
if (shp->sh_hash == NULL) {
perror("calloc");
exit(EXIT_FAILURE);
}
}
void
@@ -139,7 +154,7 @@ fmd_serd_hash_destroy(fmd_serd_hash_t *shp)
}
free(shp->sh_hash);
bzero(shp, sizeof (fmd_serd_hash_t));
memset(shp, 0, sizeof (fmd_serd_hash_t));
}
void
@@ -234,13 +249,17 @@ fmd_serd_eng_record(fmd_serd_eng_t *sgp, hrtime_t hrt)
if (sgp->sg_flags & FMD_SERD_FIRED) {
serd_log_msg(" SERD Engine: record %s already fired!",
sgp->sg_name);
return (FMD_B_FALSE);
return (B_FALSE);
}
while (sgp->sg_count >= sgp->sg_n)
fmd_serd_eng_discard(sgp, list_tail(&sgp->sg_list));
sep = malloc(sizeof (fmd_serd_elem_t));
if (sep == NULL) {
perror("malloc");
exit(EXIT_FAILURE);
}
sep->se_hrt = hrt;
list_insert_head(&sgp->sg_list, sep);
@@ -259,11 +278,11 @@ fmd_serd_eng_record(fmd_serd_eng_t *sgp, hrtime_t hrt)
fmd_event_delta(oep->se_hrt, sep->se_hrt) <= sgp->sg_t) {
sgp->sg_flags |= FMD_SERD_FIRED | FMD_SERD_DIRTY;
serd_log_msg(" SERD Engine: fired %s", sgp->sg_name);
return (FMD_B_TRUE);
return (B_TRUE);
}
sgp->sg_flags |= FMD_SERD_DIRTY;
return (FMD_B_FALSE);
return (B_FALSE);
}
int
@@ -281,7 +300,7 @@ fmd_serd_eng_empty(fmd_serd_eng_t *sgp)
void
fmd_serd_eng_reset(fmd_serd_eng_t *sgp)
{
serd_log_msg(" SERD Engine: reseting %s", sgp->sg_name);
serd_log_msg(" SERD Engine: resetting %s", sgp->sg_name);
while (sgp->sg_count != 0)
fmd_serd_eng_discard(sgp, list_head(&sgp->sg_list));
@@ -291,8 +310,9 @@ fmd_serd_eng_reset(fmd_serd_eng_t *sgp)
}
void
fmd_serd_eng_gc(fmd_serd_eng_t *sgp)
fmd_serd_eng_gc(fmd_serd_eng_t *sgp, void *arg)
{
(void) arg;
fmd_serd_elem_t *sep, *nep;
hrtime_t hrt;
+2 -2
View File
@@ -7,7 +7,7 @@
* with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -77,7 +77,7 @@ extern int fmd_serd_eng_fired(fmd_serd_eng_t *);
extern int fmd_serd_eng_empty(fmd_serd_eng_t *);
extern void fmd_serd_eng_reset(fmd_serd_eng_t *);
extern void fmd_serd_eng_gc(fmd_serd_eng_t *);
extern void fmd_serd_eng_gc(fmd_serd_eng_t *, void *);
#ifdef __cplusplus
}
+56 -22
View File
@@ -13,6 +13,7 @@
/*
* Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2018, loli10K <ezomori.nozomu@gmail.com>
* Copyright (c) 2021 Hewlett Packard Enterprise Development LP
*/
#include <libnvpair.h>
@@ -63,7 +64,7 @@ typedef enum device_type {
typedef struct guid_search {
uint64_t gs_pool_guid;
uint64_t gs_vdev_guid;
char *gs_devid;
const char *gs_devid;
device_type_t gs_vdev_type;
uint64_t gs_vdev_expandtime; /* vdev expansion time */
} guid_search_t;
@@ -76,9 +77,10 @@ static boolean_t
zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg)
{
guid_search_t *gsp = arg;
char *path = NULL;
const char *path = NULL;
uint_t c, children;
nvlist_t **child;
uint64_t vdev_guid;
/*
* First iterate over any children.
@@ -99,7 +101,7 @@ zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg)
&child, &children) == 0) {
for (c = 0; c < children; c++) {
if (zfs_agent_iter_vdev(zhp, child[c], gsp)) {
gsp->gs_vdev_type = DEVICE_TYPE_L2ARC;
gsp->gs_vdev_type = DEVICE_TYPE_SPARE;
return (B_TRUE);
}
}
@@ -108,7 +110,7 @@ zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg)
&child, &children) == 0) {
for (c = 0; c < children; c++) {
if (zfs_agent_iter_vdev(zhp, child[c], gsp)) {
gsp->gs_vdev_type = DEVICE_TYPE_SPARE;
gsp->gs_vdev_type = DEVICE_TYPE_L2ARC;
return (B_TRUE);
}
}
@@ -116,7 +118,8 @@ zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg)
/*
* On a devid match, grab the vdev guid and expansion time, if any.
*/
if ((nvlist_lookup_string(nvl, ZPOOL_CONFIG_DEVID, &path) == 0) &&
if (gsp->gs_devid != NULL &&
(nvlist_lookup_string(nvl, ZPOOL_CONFIG_DEVID, &path) == 0) &&
(strcmp(gsp->gs_devid, path) == 0)) {
(void) nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_GUID,
&gsp->gs_vdev_guid);
@@ -124,6 +127,21 @@ zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg)
&gsp->gs_vdev_expandtime);
return (B_TRUE);
}
/*
* Otherwise, on a vdev guid match, grab the devid and expansion
* time. The devid might be missing on removal since its not part
* of blkid cache and L2ARC VDEV does not contain pool guid in its
* blkid, so this is a special case for L2ARC VDEV.
*/
else if (gsp->gs_vdev_guid != 0 && gsp->gs_devid == NULL &&
nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_GUID, &vdev_guid) == 0 &&
gsp->gs_vdev_guid == vdev_guid) {
(void) nvlist_lookup_string(nvl, ZPOOL_CONFIG_DEVID,
&gsp->gs_devid);
(void) nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_EXPANSION_TIME,
&gsp->gs_vdev_expandtime);
return (B_TRUE);
}
return (B_FALSE);
}
@@ -146,13 +164,13 @@ zfs_agent_iter_pool(zpool_handle_t *zhp, void *arg)
/*
* if a match was found then grab the pool guid
*/
if (gsp->gs_vdev_guid) {
if (gsp->gs_vdev_guid && gsp->gs_devid) {
(void) nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_GUID,
&gsp->gs_pool_guid);
}
zpool_close(zhp);
return (gsp->gs_vdev_guid != 0);
return (gsp->gs_devid != NULL && gsp->gs_vdev_guid != 0);
}
void
@@ -176,10 +194,12 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
}
/*
* On ZFS on Linux, we don't get the expected FM_RESOURCE_REMOVED
* ereport from vdev_disk layer after a hot unplug. Fortunately we
* get a EC_DEV_REMOVE from our disk monitor and it is a suitable
* On Linux, we don't get the expected FM_RESOURCE_REMOVED ereport
* from the vdev_disk layer after a hot unplug. Fortunately we do
* get an EC_DEV_REMOVE from our disk monitor and it is a suitable
* proxy so we remap it here for the benefit of the diagnosis engine.
* Starting in OpenZFS 2.0, we do get FM_RESOURCE_REMOVED from the spa
* layer. Processing multiple FM_RESOURCE_REMOVED events is not harmful.
*/
if ((strcmp(class, EC_DEV_REMOVE) == 0) &&
(strcmp(subclass, ESC_DISK) == 0) &&
@@ -191,11 +211,13 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
uint64_t pool_guid = 0, vdev_guid = 0;
guid_search_t search = { 0 };
device_type_t devtype = DEVICE_TYPE_PRIMARY;
const char *devid = NULL;
class = "resource.fs.zfs.removed";
subclass = "";
(void) nvlist_add_string(payload, FM_CLASS, class);
(void) nvlist_lookup_string(nvl, DEV_IDENTIFIER, &devid);
(void) nvlist_lookup_uint64(nvl, ZFS_EV_POOL_GUID, &pool_guid);
(void) nvlist_lookup_uint64(nvl, ZFS_EV_VDEV_GUID, &vdev_guid);
@@ -205,15 +227,25 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
(void) nvlist_add_int64_array(payload, FM_EREPORT_TIME, tod, 2);
/*
* If devid is missing but vdev_guid is available, find devid
* and pool_guid from vdev_guid.
* For multipath, spare and l2arc devices ZFS_EV_VDEV_GUID or
* ZFS_EV_POOL_GUID may be missing so find them.
*/
(void) nvlist_lookup_string(nvl, DEV_IDENTIFIER,
&search.gs_devid);
(void) zpool_iter(g_zfs_hdl, zfs_agent_iter_pool, &search);
pool_guid = search.gs_pool_guid;
vdev_guid = search.gs_vdev_guid;
devtype = search.gs_vdev_type;
if (devid == NULL || pool_guid == 0 || vdev_guid == 0) {
if (devid == NULL)
search.gs_vdev_guid = vdev_guid;
else
search.gs_devid = devid;
zpool_iter(g_zfs_hdl, zfs_agent_iter_pool, &search);
if (devid == NULL)
devid = search.gs_devid;
if (pool_guid == 0)
pool_guid = search.gs_pool_guid;
if (vdev_guid == 0)
vdev_guid = search.gs_vdev_guid;
devtype = search.gs_vdev_type;
}
/*
* We want to avoid reporting "remove" events coming from
@@ -225,7 +257,9 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
search.gs_vdev_expandtime + 10 > tv.tv_sec) {
zed_log_msg(LOG_INFO, "agent post event: ignoring '%s' "
"for recently expanded device '%s'", EC_DEV_REMOVE,
search.gs_devid);
devid);
fnvlist_free(payload);
free(event);
goto out;
}
@@ -317,6 +351,8 @@ zfs_agent_dispatch(const char *class, const char *subclass, nvlist_t *nvl)
static void *
zfs_agent_consumer_thread(void *arg)
{
(void) arg;
for (;;) {
agent_event_t *event;
@@ -333,9 +369,7 @@ zfs_agent_consumer_thread(void *arg)
return (NULL);
}
if ((event = (list_head(&agent_events))) != NULL) {
list_remove(&agent_events, event);
if ((event = list_remove_head(&agent_events)) != NULL) {
(void) pthread_mutex_unlock(&agent_lock);
/* dispatch to all event subscribers */
@@ -382,6 +416,7 @@ zfs_agent_init(libzfs_handle_t *zfs_hdl)
list_destroy(&agent_events);
zed_log_die("Failed to initialize agents");
}
pthread_setname_np(g_agents_tid, "agents");
}
void
@@ -397,8 +432,7 @@ zfs_agent_fini(void)
(void) pthread_join(g_agents_tid, NULL);
/* drain any pending events */
while ((event = (list_head(&agent_events))) != NULL) {
list_remove(&agent_events, event);
while ((event = list_remove_head(&agent_events)) != NULL) {
nvlist_free(event->ae_nvl);
free(event);
}
+192 -54
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -23,11 +23,11 @@
* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright 2015 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2023, Klara Inc.
*/
#include <stddef.h>
#include <string.h>
#include <strings.h>
#include <libuutil.h>
#include <libzfs.h>
#include <sys/types.h>
@@ -35,14 +35,29 @@
#include <sys/fs/zfs.h>
#include <sys/fm/protocol.h>
#include <sys/fm/fs/zfs.h>
#include <sys/zio.h>
#include "zfs_agents.h"
#include "fmd_api.h"
/*
* Our serd engines are named 'zfs_<pool_guid>_<vdev_guid>_{checksum,io}'. This
* #define reserves enough space for two 64-bit hex values plus the length of
* the longest string.
* Default values for the serd engine when processing checksum or io errors. The
* semantics are N <events> in T <seconds>.
*/
#define DEFAULT_CHECKSUM_N 10 /* events */
#define DEFAULT_CHECKSUM_T 600 /* seconds */
#define DEFAULT_IO_N 10 /* events */
#define DEFAULT_IO_T 600 /* seconds */
#define DEFAULT_SLOW_IO_N 10 /* events */
#define DEFAULT_SLOW_IO_T 30 /* seconds */
#define CASE_GC_TIMEOUT_SECS 43200 /* 12 hours */
/*
* Our serd engines are named in the following format:
* 'zfs_<pool_guid>_<vdev_guid>_{checksum,io,slow_io}'
* This #define reserves enough space for two 64-bit hex values plus the
* length of the longest string.
*/
#define MAX_SERDLEN (16 * 2 + sizeof ("zfs___checksum"))
@@ -59,6 +74,7 @@ typedef struct zfs_case_data {
int zc_pool_state;
char zc_serd_checksum[MAX_SERDLEN];
char zc_serd_io[MAX_SERDLEN];
char zc_serd_slow_io[MAX_SERDLEN];
int zc_has_remove_timer;
} zfs_case_data_t;
@@ -105,7 +121,8 @@ zfs_de_stats_t zfs_stats = {
{ "resource_drops", FMD_TYPE_UINT64, "resource related ereports" }
};
static hrtime_t zfs_remove_timeout;
/* wait 15 seconds after a removal */
static hrtime_t zfs_remove_timeout = SEC2NSEC(15);
uu_list_pool_t *zfs_case_pool;
uu_list_t *zfs_cases;
@@ -115,11 +132,13 @@ uu_list_t *zfs_cases;
#define ZFS_MAKE_EREPORT(type) \
FM_EREPORT_CLASS "." ZFS_ERROR_CLASS "." type
static void zfs_purge_cases(fmd_hdl_t *hdl);
/*
* Write out the persistent representation of an active case.
*/
static void
zfs_case_serialize(fmd_hdl_t *hdl, zfs_case_t *zcp)
zfs_case_serialize(zfs_case_t *zcp)
{
zcp->zc_data.zc_version = CASE_DATA_VERSION_SERD;
}
@@ -161,6 +180,42 @@ zfs_case_unserialize(fmd_hdl_t *hdl, fmd_case_t *cp)
return (zcp);
}
/*
* count other unique slow-io cases in a pool
*/
static uint_t
zfs_other_slow_cases(fmd_hdl_t *hdl, const zfs_case_data_t *zfs_case)
{
zfs_case_t *zcp;
uint_t cases = 0;
static hrtime_t next_check = 0;
/*
* Note that plumbing in some external GC would require adding locking,
* since most of this module code is not thread safe and assumes there
* is only one thread running against the module. So we perform GC here
* inline periodically so that future delay induced faults will be
* possible once the issue causing multiple vdev delays is resolved.
*/
if (gethrestime_sec() > next_check) {
/* Periodically purge old SERD entries and stale cases */
fmd_serd_gc(hdl);
zfs_purge_cases(hdl);
next_check = gethrestime_sec() + CASE_GC_TIMEOUT_SECS;
}
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp)) {
if (zcp->zc_data.zc_pool_guid == zfs_case->zc_pool_guid &&
zcp->zc_data.zc_vdev_guid != zfs_case->zc_vdev_guid &&
zcp->zc_data.zc_serd_slow_io[0] != '\0' &&
fmd_serd_active(hdl, zcp->zc_data.zc_serd_slow_io)) {
cases++;
}
}
return (cases);
}
/*
* Iterate over any active cases. If any cases are associated with a pool or
* vdev which is no longer present on the system, close the associated case.
@@ -209,10 +264,10 @@ zfs_mark_vdev(uint64_t pool_guid, nvlist_t *vd, er_timeval_t *loaded)
}
}
/*ARGSUSED*/
static int
zfs_mark_pool(zpool_handle_t *zhp, void *unused)
{
(void) unused;
zfs_case_t *zcp;
uint64_t pool_guid;
uint64_t *tod;
@@ -367,14 +422,21 @@ zfs_serd_name(char *buf, uint64_t pool_guid, uint64_t vdev_guid,
(long long unsigned int)vdev_guid, type);
}
static void
zfs_case_retire(fmd_hdl_t *hdl, zfs_case_t *zcp)
{
fmd_hdl_debug(hdl, "retiring case");
fmd_case_close(hdl, zcp->zc_case);
}
/*
* Solve a given ZFS case. This first checks to make sure the diagnosis is
* still valid, as well as cleaning up any pending timer associated with the
* case.
*/
static void
zfs_case_solve(fmd_hdl_t *hdl, zfs_case_t *zcp, const char *faultname,
boolean_t checkunusable)
zfs_case_solve(fmd_hdl_t *hdl, zfs_case_t *zcp, const char *faultname)
{
nvlist_t *detector, *fault;
boolean_t serialize;
@@ -412,7 +474,7 @@ zfs_case_solve(fmd_hdl_t *hdl, zfs_case_t *zcp, const char *faultname,
serialize = B_TRUE;
}
if (serialize)
zfs_case_serialize(hdl, zcp);
zfs_case_serialize(zcp);
nvlist_free(detector);
}
@@ -424,10 +486,10 @@ timeval_earlier(er_timeval_t *a, er_timeval_t *b)
(a->ertv_sec == b->ertv_sec && a->ertv_nsec < b->ertv_nsec));
}
/*ARGSUSED*/
static void
zfs_ereport_when(fmd_hdl_t *hdl, nvlist_t *nvl, er_timeval_t *when)
{
(void) hdl;
int64_t *tod;
uint_t nelem;
@@ -443,19 +505,20 @@ zfs_ereport_when(fmd_hdl_t *hdl, nvlist_t *nvl, er_timeval_t *when)
/*
* Main fmd entry point.
*/
/*ARGSUSED*/
static void
zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
{
zfs_case_t *zcp, *dcp;
int32_t pool_state;
uint64_t ena, pool_guid, vdev_guid;
uint64_t checksum_n, checksum_t;
uint64_t io_n, io_t;
er_timeval_t pool_load;
er_timeval_t er_when;
nvlist_t *detector;
boolean_t pool_found = B_FALSE;
boolean_t isresource;
char *type;
const char *type;
/*
* We subscribe to notifications for vdev or pool removal. In these
@@ -623,9 +686,7 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (strcmp(class,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DATA)) == 0 ||
strcmp(class,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_CONFIG_CACHE_WRITE)) == 0 ||
strcmp(class,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DELAY)) == 0) {
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_CONFIG_CACHE_WRITE)) == 0) {
zfs_stats.resource_drops.fmds_value.ui64++;
return;
}
@@ -686,13 +747,16 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (zcp->zc_data.zc_has_remove_timer) {
fmd_timer_remove(hdl, zcp->zc_remove_timer);
zcp->zc_data.zc_has_remove_timer = 0;
zfs_case_serialize(hdl, zcp);
zfs_case_serialize(zcp);
}
if (zcp->zc_data.zc_serd_io[0] != '\0')
fmd_serd_reset(hdl, zcp->zc_data.zc_serd_io);
if (zcp->zc_data.zc_serd_checksum[0] != '\0')
fmd_serd_reset(hdl,
zcp->zc_data.zc_serd_checksum);
if (zcp->zc_data.zc_serd_slow_io[0] != '\0')
fmd_serd_reset(hdl,
zcp->zc_data.zc_serd_slow_io);
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_RSRC(FM_RESOURCE_STATECHANGE))) {
uint64_t state = 0;
@@ -721,7 +785,11 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (fmd_case_solved(hdl, zcp->zc_case))
return;
fmd_hdl_debug(hdl, "error event '%s'", class);
if (vdev_guid)
fmd_hdl_debug(hdl, "error event '%s', vdev %llu", class,
vdev_guid);
else
fmd_hdl_debug(hdl, "error event '%s'", class);
/*
* Determine if we should solve the case and generate a fault. We solve
@@ -751,18 +819,18 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
fmd_case_close(hdl, dcp->zc_case);
}
zfs_case_solve(hdl, zcp, "fault.fs.zfs.pool", B_TRUE);
zfs_case_solve(hdl, zcp, "fault.fs.zfs.pool");
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_LOG_REPLAY))) {
/*
* Pool level fault for reading the intent logs.
*/
zfs_case_solve(hdl, zcp, "fault.fs.zfs.log_replay", B_TRUE);
zfs_case_solve(hdl, zcp, "fault.fs.zfs.log_replay");
} else if (fmd_nvl_class_match(hdl, nvl, "ereport.fs.zfs.vdev.*")) {
/*
* Device fault.
*/
zfs_case_solve(hdl, zcp, "fault.fs.zfs.device", B_TRUE);
zfs_case_solve(hdl, zcp, "fault.fs.zfs.device");
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_IO)) ||
fmd_nvl_class_match(hdl, nvl,
@@ -770,9 +838,13 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_IO_FAILURE)) ||
fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DELAY)) ||
fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_PROBE_FAILURE))) {
char *failmode = NULL;
const char *failmode = NULL;
boolean_t checkremove = B_FALSE;
uint32_t pri = 0;
int32_t flags = 0;
/*
* If this is a checksum or I/O error, then toss it into the
@@ -784,30 +856,113 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_IO))) {
if (zcp->zc_data.zc_serd_io[0] == '\0') {
if (nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_IO_N,
&io_n) != 0) {
io_n = DEFAULT_IO_N;
}
if (nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_IO_T,
&io_t) != 0) {
io_t = DEFAULT_IO_T;
}
zfs_serd_name(zcp->zc_data.zc_serd_io,
pool_guid, vdev_guid, "io");
fmd_serd_create(hdl, zcp->zc_data.zc_serd_io,
fmd_prop_get_int32(hdl, "io_N"),
fmd_prop_get_int64(hdl, "io_T"));
zfs_case_serialize(hdl, zcp);
io_n,
SEC2NSEC(io_t));
zfs_case_serialize(zcp);
}
if (fmd_serd_record(hdl, zcp->zc_data.zc_serd_io, ep))
checkremove = B_TRUE;
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DELAY))) {
uint64_t slow_io_n, slow_io_t;
/*
* Create a slow io SERD engine when the VDEV has the
* 'vdev_slow_io_n' and 'vdev_slow_io_n' properties.
*/
if (zcp->zc_data.zc_serd_slow_io[0] == '\0' &&
nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_SLOW_IO_N,
&slow_io_n) == 0 &&
nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_SLOW_IO_T,
&slow_io_t) == 0) {
zfs_serd_name(zcp->zc_data.zc_serd_slow_io,
pool_guid, vdev_guid, "slow_io");
fmd_serd_create(hdl,
zcp->zc_data.zc_serd_slow_io,
slow_io_n,
SEC2NSEC(slow_io_t));
zfs_case_serialize(zcp);
}
/* Pass event to SERD engine and see if this triggers */
if (zcp->zc_data.zc_serd_slow_io[0] != '\0' &&
fmd_serd_record(hdl, zcp->zc_data.zc_serd_slow_io,
ep)) {
/*
* Ignore a slow io diagnosis when other
* VDEVs in the pool show signs of being slow.
*/
if (zfs_other_slow_cases(hdl, &zcp->zc_data)) {
zfs_case_retire(hdl, zcp);
fmd_hdl_debug(hdl, "pool %llu has "
"multiple slow io cases -- skip "
"degrading vdev %llu",
(u_longlong_t)
zcp->zc_data.zc_pool_guid,
(u_longlong_t)
zcp->zc_data.zc_vdev_guid);
} else {
zfs_case_solve(hdl, zcp,
"fault.fs.zfs.vdev.slow_io");
}
}
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_CHECKSUM))) {
/*
* We ignore ereports for checksum errors generated by
* scrub/resilver I/O to avoid potentially further
* degrading the pool while it's being repaired.
*/
if (((nvlist_lookup_uint32(nvl,
FM_EREPORT_PAYLOAD_ZFS_ZIO_PRIORITY, &pri) == 0) &&
(pri == ZIO_PRIORITY_SCRUB ||
pri == ZIO_PRIORITY_REBUILD)) ||
((nvlist_lookup_int32(nvl,
FM_EREPORT_PAYLOAD_ZFS_ZIO_FLAGS, &flags) == 0) &&
(flags & (ZIO_FLAG_SCRUB | ZIO_FLAG_RESILVER)))) {
fmd_hdl_debug(hdl, "ignoring '%s' for "
"scrub/resilver I/O", class);
return;
}
if (zcp->zc_data.zc_serd_checksum[0] == '\0') {
if (nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_CKSUM_N,
&checksum_n) != 0) {
checksum_n = DEFAULT_CHECKSUM_N;
}
if (nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_CKSUM_T,
&checksum_t) != 0) {
checksum_t = DEFAULT_CHECKSUM_T;
}
zfs_serd_name(zcp->zc_data.zc_serd_checksum,
pool_guid, vdev_guid, "checksum");
fmd_serd_create(hdl,
zcp->zc_data.zc_serd_checksum,
fmd_prop_get_int32(hdl, "checksum_N"),
fmd_prop_get_int64(hdl, "checksum_T"));
zfs_case_serialize(hdl, zcp);
checksum_n,
SEC2NSEC(checksum_t));
zfs_case_serialize(zcp);
}
if (fmd_serd_record(hdl,
zcp->zc_data.zc_serd_checksum, ep)) {
zfs_case_solve(hdl, zcp,
"fault.fs.zfs.vdev.checksum", B_FALSE);
"fault.fs.zfs.vdev.checksum");
}
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_IO_FAILURE)) &&
@@ -817,12 +972,11 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (strncmp(failmode, FM_EREPORT_FAILMODE_CONTINUE,
strlen(FM_EREPORT_FAILMODE_CONTINUE)) == 0) {
zfs_case_solve(hdl, zcp,
"fault.fs.zfs.io_failure_continue",
B_FALSE);
"fault.fs.zfs.io_failure_continue");
} else if (strncmp(failmode, FM_EREPORT_FAILMODE_WAIT,
strlen(FM_EREPORT_FAILMODE_WAIT)) == 0) {
zfs_case_solve(hdl, zcp,
"fault.fs.zfs.io_failure_wait", B_FALSE);
"fault.fs.zfs.io_failure_wait");
}
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_PROBE_FAILURE))) {
@@ -844,7 +998,7 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
zfs_remove_timeout);
if (!zcp->zc_data.zc_has_remove_timer) {
zcp->zc_data.zc_has_remove_timer = 1;
zfs_case_serialize(hdl, zcp);
zfs_case_serialize(zcp);
}
}
}
@@ -854,14 +1008,13 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
* The timeout is fired when we diagnosed an I/O error, and it was not due to
* device removal (which would cause the timeout to be cancelled).
*/
/* ARGSUSED */
static void
zfs_fm_timeout(fmd_hdl_t *hdl, id_t id, void *data)
{
zfs_case_t *zcp = data;
if (id == zcp->zc_remove_timer)
zfs_case_solve(hdl, zcp, "fault.fs.zfs.vdev.io", B_FALSE);
zfs_case_solve(hdl, zcp, "fault.fs.zfs.vdev.io");
}
/*
@@ -877,6 +1030,8 @@ zfs_fm_close(fmd_hdl_t *hdl, fmd_case_t *cs)
fmd_serd_destroy(hdl, zcp->zc_data.zc_serd_checksum);
if (zcp->zc_data.zc_serd_io[0] != '\0')
fmd_serd_destroy(hdl, zcp->zc_data.zc_serd_io);
if (zcp->zc_data.zc_serd_slow_io[0] != '\0')
fmd_serd_destroy(hdl, zcp->zc_data.zc_serd_slow_io);
if (zcp->zc_data.zc_has_remove_timer)
fmd_timer_remove(hdl, zcp->zc_remove_timer);
@@ -885,30 +1040,15 @@ zfs_fm_close(fmd_hdl_t *hdl, fmd_case_t *cs)
fmd_hdl_free(hdl, zcp, sizeof (zfs_case_t));
}
/*
* We use the fmd gc entry point to look for old cases that no longer apply.
* This allows us to keep our set of case data small in a long running system.
*/
static void
zfs_fm_gc(fmd_hdl_t *hdl)
{
zfs_purge_cases(hdl);
}
static const fmd_hdl_ops_t fmd_ops = {
zfs_fm_recv, /* fmdo_recv */
zfs_fm_timeout, /* fmdo_timeout */
zfs_fm_close, /* fmdo_close */
NULL, /* fmdo_stats */
zfs_fm_gc, /* fmdo_gc */
NULL, /* fmdo_gc */
};
static const fmd_prop_t fmd_props[] = {
{ "checksum_N", FMD_TYPE_UINT32, "10" },
{ "checksum_T", FMD_TYPE_TIME, "10min" },
{ "io_N", FMD_TYPE_UINT32, "10" },
{ "io_T", FMD_TYPE_TIME, "10min" },
{ "remove_timeout", FMD_TYPE_TIME, "15sec" },
{ NULL, 0, NULL }
};
@@ -949,8 +1089,6 @@ _zfs_diagnosis_init(fmd_hdl_t *hdl)
(void) fmd_stat_create(hdl, FMD_STAT_NOALLOC, sizeof (zfs_stats) /
sizeof (fmd_stat_t), (fmd_stat_t *)&zfs_stats);
zfs_remove_timeout = fmd_prop_get_int64(hdl, "remove_timeout");
}
void
+514 -101
View File
File diff suppressed because it is too large Load Diff
+143 -27
View File
@@ -6,7 +6,7 @@
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
@@ -38,8 +38,10 @@
#include <sys/fs/zfs.h>
#include <sys/fm/protocol.h>
#include <sys/fm/fs/zfs.h>
#include <libzutil.h>
#include <libzfs.h>
#include <string.h>
#include <libgen.h>
#include "zfs_agents.h"
#include "fmd_api.h"
@@ -74,6 +76,8 @@ typedef struct find_cbdata {
uint64_t cb_guid;
zpool_handle_t *cb_zhp;
nvlist_t *cb_vdev;
uint64_t cb_vdev_guid;
uint64_t cb_num_spares;
} find_cbdata_t;
static int
@@ -139,6 +143,64 @@ find_vdev(libzfs_handle_t *zhdl, nvlist_t *nv, uint64_t search_guid)
return (NULL);
}
static int
remove_spares(zpool_handle_t *zhp, void *data)
{
nvlist_t *config, *nvroot;
nvlist_t **spares;
uint_t nspares;
char *devname;
find_cbdata_t *cbp = data;
uint64_t spareguid = 0;
vdev_stat_t *vs;
unsigned int c;
config = zpool_get_config(zhp, NULL);
if (nvlist_lookup_nvlist(config,
ZPOOL_CONFIG_VDEV_TREE, &nvroot) != 0) {
zpool_close(zhp);
return (0);
}
if (nvlist_lookup_nvlist_array(nvroot, ZPOOL_CONFIG_SPARES,
&spares, &nspares) != 0) {
zpool_close(zhp);
return (0);
}
for (int i = 0; i < nspares; i++) {
if (nvlist_lookup_uint64(spares[i], ZPOOL_CONFIG_GUID,
&spareguid) == 0 && spareguid == cbp->cb_vdev_guid) {
devname = zpool_vdev_name(NULL, zhp, spares[i],
B_FALSE);
nvlist_lookup_uint64_array(spares[i],
ZPOOL_CONFIG_VDEV_STATS, (uint64_t **)&vs, &c);
if (vs->vs_state != VDEV_STATE_REMOVED &&
zpool_vdev_remove_wanted(zhp, devname) == 0)
cbp->cb_num_spares++;
break;
}
}
zpool_close(zhp);
return (0);
}
/*
* Given a vdev guid, find and remove all spares associated with it.
*/
static int
find_and_remove_spares(libzfs_handle_t *zhdl, uint64_t vdev_guid)
{
find_cbdata_t cb;
cb.cb_num_spares = 0;
cb.cb_vdev_guid = vdev_guid;
zpool_iter(zhdl, remove_spares, &cb);
return (cb.cb_num_spares);
}
/*
* Given a (pool, vdev) GUID pair, find the matching pool and vdev.
*/
@@ -219,25 +281,31 @@ replace_with_spare(fmd_hdl_t *hdl, zpool_handle_t *zhp, nvlist_t *vdev)
* replace it.
*/
for (s = 0; s < nspares; s++) {
char *spare_name;
boolean_t rebuild = B_FALSE;
const char *spare_name, *type;
if (nvlist_lookup_string(spares[s], ZPOOL_CONFIG_PATH,
&spare_name) != 0)
continue;
/* prefer sequential resilvering for distributed spares */
if ((nvlist_lookup_string(spares[s], ZPOOL_CONFIG_TYPE,
&type) == 0) && strcmp(type, VDEV_TYPE_DRAID_SPARE) == 0)
rebuild = B_TRUE;
/* if set, add the "ashift" pool property to the spare nvlist */
if (source != ZPROP_SRC_DEFAULT)
(void) nvlist_add_uint64(spares[s],
ZPOOL_CONFIG_ASHIFT, ashift);
(void) nvlist_add_nvlist_array(replacement,
ZPOOL_CONFIG_CHILDREN, &spares[s], 1);
ZPOOL_CONFIG_CHILDREN, (const nvlist_t **)&spares[s], 1);
fmd_hdl_debug(hdl, "zpool_vdev_replace '%s' with spare '%s'",
dev_name, basename(spare_name));
dev_name, zfs_basename(spare_name));
if (zpool_vdev_attach(zhp, dev_name, spare_name,
replacement, B_TRUE) == 0) {
replacement, B_TRUE, rebuild) == 0) {
free(dev_name);
nvlist_free(replacement);
return (B_TRUE);
@@ -255,7 +323,6 @@ replace_with_spare(fmd_hdl_t *hdl, zpool_handle_t *zhp, nvlist_t *vdev)
* ASRU is now usable. ZFS has found the device to be present and
* functioning.
*/
/*ARGSUSED*/
static void
zfs_vdev_repair(fmd_hdl_t *hdl, nvlist_t *nvl)
{
@@ -294,11 +361,11 @@ zfs_vdev_repair(fmd_hdl_t *hdl, nvlist_t *nvl)
vdev_guid, pool_guid);
}
/*ARGSUSED*/
static void
zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
const char *class)
{
(void) ep;
uint64_t pool_guid, vdev_guid;
zpool_handle_t *zhp;
nvlist_t *resource, *fault;
@@ -308,30 +375,60 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
libzfs_handle_t *zhdl = zdp->zrd_hdl;
boolean_t fault_device, degrade_device;
boolean_t is_repair;
char *scheme;
boolean_t l2arc = B_FALSE;
boolean_t spare = B_FALSE;
const char *scheme;
nvlist_t *vdev = NULL;
char *uuid;
const char *uuid;
int repair_done = 0;
boolean_t retire;
boolean_t is_disk;
vdev_aux_t aux;
uint64_t state = 0;
vdev_stat_t *vs;
unsigned int c;
fmd_hdl_debug(hdl, "zfs_retire_recv: '%s'", class);
(void) nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_STATE,
&state);
/*
* If this is a resource notifying us of device removal then simply
* check for an available spare and continue unless the device is a
* l2arc vdev, in which case we just offline it.
*/
if (strcmp(class, "resource.fs.zfs.removed") == 0) {
char *devtype;
if (strcmp(class, "resource.fs.zfs.removed") == 0 ||
(strcmp(class, "resource.fs.zfs.statechange") == 0 &&
(state == VDEV_STATE_REMOVED || state == VDEV_STATE_FAULTED))) {
const char *devtype;
char *devname;
if (nvlist_lookup_string(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_TYPE,
&devtype) == 0) {
if (strcmp(devtype, VDEV_TYPE_SPARE) == 0)
spare = B_TRUE;
else if (strcmp(devtype, VDEV_TYPE_L2CACHE) == 0)
l2arc = B_TRUE;
}
if (nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID, &vdev_guid) != 0)
return;
if (vdev_guid == 0) {
fmd_hdl_debug(hdl, "Got a zero GUID");
return;
}
if (spare) {
int nspares = find_and_remove_spares(zhdl, vdev_guid);
fmd_hdl_debug(hdl, "%d spares removed", nspares);
return;
}
if (nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_POOL_GUID,
&pool_guid) != 0 ||
nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID,
&vdev_guid) != 0)
&pool_guid) != 0)
return;
if ((zhp = find_by_guid(zhdl, pool_guid, vdev_guid,
@@ -340,16 +437,32 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
devname = zpool_vdev_name(NULL, zhp, vdev, B_FALSE);
/* Can't replace l2arc with a spare: offline the device */
if (nvlist_lookup_string(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_TYPE,
&devtype) == 0 && strcmp(devtype, VDEV_TYPE_L2CACHE) == 0) {
fmd_hdl_debug(hdl, "zpool_vdev_offline '%s'", devname);
zpool_vdev_offline(zhp, devname, B_TRUE);
} else if (!fmd_prop_get_int32(hdl, "spare_on_remove") ||
replace_with_spare(hdl, zhp, vdev) == B_FALSE) {
/* Could not handle with spare: offline the device */
fmd_hdl_debug(hdl, "zpool_vdev_offline '%s'", devname);
zpool_vdev_offline(zhp, devname, B_TRUE);
nvlist_lookup_uint64_array(vdev, ZPOOL_CONFIG_VDEV_STATS,
(uint64_t **)&vs, &c);
/*
* If state removed is requested for already removed vdev,
* its a loopback event from spa_async_remove(). Just
* ignore it.
*/
if (vs->vs_state == VDEV_STATE_REMOVED &&
state == VDEV_STATE_REMOVED)
return;
/* Remove the vdev since device is unplugged */
int remove_status = 0;
if (l2arc || (strcmp(class, "resource.fs.zfs.removed") == 0)) {
remove_status = zpool_vdev_remove_wanted(zhp, devname);
fmd_hdl_debug(hdl, "zpool_vdev_remove_wanted '%s'"
", err:%d", devname, libzfs_errno(zhdl));
}
/* Replace the vdev with a spare if its not a l2arc */
if (!l2arc && !remove_status &&
(!fmd_prop_get_int32(hdl, "spare_on_remove") ||
replace_with_spare(hdl, zhp, vdev) == B_FALSE)) {
/* Could not handle with spare */
fmd_hdl_debug(hdl, "no spare for '%s'", devname);
}
free(devname);
@@ -361,12 +474,11 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
return;
/*
* Note: on zfsonlinux statechange events are more than just
* Note: on Linux statechange events are more than just
* healthy ones so we need to confirm the actual state value.
*/
if (strcmp(class, "resource.fs.zfs.statechange") == 0 &&
nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_STATE,
&state) == 0 && state == VDEV_STATE_HEALTHY) {
state == VDEV_STATE_HEALTHY) {
zfs_vdev_repair(hdl, nvl);
return;
}
@@ -411,6 +523,9 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
} else if (fmd_nvl_class_match(hdl, fault,
"fault.fs.zfs.vdev.checksum")) {
degrade_device = B_TRUE;
} else if (fmd_nvl_class_match(hdl, fault,
"fault.fs.zfs.vdev.slow_io")) {
degrade_device = B_TRUE;
} else if (fmd_nvl_class_match(hdl, fault,
"fault.fs.zfs.device")) {
fault_device = B_FALSE;
@@ -497,6 +612,7 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
* Attempt to substitute a hot spare.
*/
(void) replace_with_spare(hdl, zhp, vdev);
zpool_close(zhp);
}
+52 -23
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -36,6 +36,7 @@ static volatile sig_atomic_t _got_hup = 0;
static void
_exit_handler(int signum)
{
(void) signum;
_got_exit = 1;
}
@@ -45,6 +46,7 @@ _exit_handler(int signum)
static void
_hup_handler(int signum)
{
(void) signum;
_got_hup = 1;
}
@@ -60,8 +62,8 @@ _setup_sig_handlers(void)
zed_log_die("Failed to initialize sigset");
sa.sa_flags = SA_RESTART;
sa.sa_handler = SIG_IGN;
sa.sa_handler = SIG_IGN;
if (sigaction(SIGPIPE, &sa, NULL) < 0)
zed_log_die("Failed to ignore SIGPIPE");
@@ -75,6 +77,10 @@ _setup_sig_handlers(void)
sa.sa_handler = _hup_handler;
if (sigaction(SIGHUP, &sa, NULL) < 0)
zed_log_die("Failed to register SIGHUP handler");
(void) sigaddset(&sa.sa_mask, SIGCHLD);
if (pthread_sigmask(SIG_BLOCK, &sa.sa_mask, NULL) < 0)
zed_log_die("Failed to block SIGCHLD");
}
/*
@@ -212,22 +218,20 @@ _finish_daemonize(void)
int
main(int argc, char *argv[])
{
struct zed_conf *zcp;
struct zed_conf zcp;
uint64_t saved_eid;
int64_t saved_etime[2];
zed_log_init(argv[0]);
zed_log_stderr_open(LOG_NOTICE);
zcp = zed_conf_create();
zed_conf_parse_opts(zcp, argc, argv);
if (zcp->do_verbose)
zed_conf_init(&zcp);
zed_conf_parse_opts(&zcp, argc, argv);
if (zcp.do_verbose)
zed_log_stderr_open(LOG_INFO);
if (geteuid() != 0)
zed_log_die("Must be run as root");
zed_conf_parse_file(zcp);
zed_file_close_from(STDERR_FILENO + 1);
(void) umask(0);
@@ -235,47 +239,72 @@ main(int argc, char *argv[])
if (chdir("/") < 0)
zed_log_die("Failed to change to root directory");
if (zed_conf_scan_dir(zcp) < 0)
if (zed_conf_scan_dir(&zcp) < 0)
exit(EXIT_FAILURE);
if (!zcp->do_foreground) {
if (!zcp.do_foreground) {
_start_daemonize();
zed_log_syslog_open(LOG_DAEMON);
}
_setup_sig_handlers();
if (zcp->do_memlock)
if (zcp.do_memlock)
_lock_memory();
if ((zed_conf_write_pid(zcp) < 0) && (!zcp->do_force))
if ((zed_conf_write_pid(&zcp) < 0) && (!zcp.do_force))
exit(EXIT_FAILURE);
if (!zcp->do_foreground)
if (!zcp.do_foreground)
_finish_daemonize();
zed_log_msg(LOG_NOTICE,
"ZFS Event Daemon %s-%s (PID %d)",
ZFS_META_VERSION, ZFS_META_RELEASE, (int)getpid());
if (zed_conf_open_state(zcp) < 0)
if (zed_conf_open_state(&zcp) < 0)
exit(EXIT_FAILURE);
if (zed_conf_read_state(zcp, &saved_eid, saved_etime) < 0)
if (zed_conf_read_state(&zcp, &saved_eid, saved_etime) < 0)
exit(EXIT_FAILURE);
zed_event_init(zcp);
zed_event_seek(zcp, saved_eid, saved_etime);
idle:
/*
* If -I is specified, attempt to open /dev/zfs repeatedly until
* successful.
*/
do {
if (!zed_event_init(&zcp))
break;
/* Wait for some time and try again. tunable? */
sleep(30);
} while (!_got_exit && zcp.do_idle);
if (_got_exit)
goto out;
zed_event_seek(&zcp, saved_eid, saved_etime);
while (!_got_exit) {
int rv;
if (_got_hup) {
_got_hup = 0;
(void) zed_conf_scan_dir(zcp);
(void) zed_conf_scan_dir(&zcp);
}
zed_event_service(zcp);
rv = zed_event_service(&zcp);
/* ENODEV: When kernel module is unloaded (osx) */
if (rv != 0)
break;
}
zed_log_msg(LOG_NOTICE, "Exiting");
zed_event_fini(zcp);
zed_conf_destroy(zcp);
zed_event_fini(&zcp);
if (zcp.do_idle && !_got_exit)
goto idle;
out:
zed_conf_destroy(&zcp);
zed_log_fini();
exit(EXIT_SUCCESS);
}
+57
View File
@@ -0,0 +1,57 @@
zedconfdir = $(sysconfdir)/zfs/zed.d
dist_zedconf_DATA = \
%D%/zed-functions.sh \
%D%/zed.rc
zedexecdir = $(zfsexecdir)/zed.d
dist_zedexec_SCRIPTS = \
%D%/all-debug.sh \
%D%/all-syslog.sh \
%D%/data-notify.sh \
%D%/generic-notify.sh \
%D%/pool_import-led.sh \
%D%/resilver_finish-notify.sh \
%D%/resilver_finish-start-scrub.sh \
%D%/scrub_finish-notify.sh \
%D%/statechange-led.sh \
%D%/statechange-notify.sh \
%D%/statechange-slot_off.sh \
%D%/trim_finish-notify.sh \
%D%/vdev_attach-led.sh \
%D%/vdev_clear-led.sh
nodist_zedexec_SCRIPTS = \
%D%/history_event-zfs-list-cacher.sh
SUBSTFILES += $(nodist_zedexec_SCRIPTS)
zedconfdefaults = \
all-syslog.sh \
data-notify.sh \
history_event-zfs-list-cacher.sh \
pool_import-led.sh \
resilver_finish-notify.sh \
resilver_finish-start-scrub.sh \
scrub_finish-notify.sh \
statechange-led.sh \
statechange-notify.sh \
statechange-slot_off.sh \
vdev_attach-led.sh \
vdev_clear-led.sh
dist_noinst_DATA += %D%/README
INSTALL_DATA_HOOKS += zed-install-data-hook
zed-install-data-hook:
$(MKDIR_P) "$(DESTDIR)$(zedconfdir)"
set -x; for f in $(zedconfdefaults); do \
[ -f "$(DESTDIR)$(zedconfdir)/$${f}" ] ||\
[ -L "$(DESTDIR)$(zedconfdir)/$${f}" ] || \
$(LN_S) "$(zedexecdir)/$${f}" "$(DESTDIR)$(zedconfdir)"; \
done
SHELLCHECKSCRIPTS += $(dist_zedconf_DATA) $(dist_zedexec_SCRIPTS) $(nodist_zedexec_SCRIPTS)
$(call SHELLCHECK_OPTS,$(dist_zedconf_DATA) $(dist_zedexec_SCRIPTS) $(nodist_zedexec_SCRIPTS)): SHELLCHECK_SHELL = sh
# False positive: 1>&"${ZED_FLOCK_FD}" looks suspiciously similar to a >&filename bash extension
$(call SHELLCHECK_OPTS,$(dist_zedconf_DATA) $(dist_zedexec_SCRIPTS) $(nodist_zedexec_SCRIPTS)): CHECKBASHISMS_IGNORE = -e 'should be >word 2>&1' -e '&"$${ZED_FLOCK_FD}"'
+7 -10
View File
@@ -1,4 +1,5 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Log all environment variables to ZED_DEBUG_LOG.
#
@@ -12,15 +13,11 @@
zed_exit_if_ignoring_this_event
lockfile="$(basename -- "${ZED_DEBUG_LOG}").lock"
zed_lock "${ZED_DEBUG_LOG}"
{
env | sort
echo
} 1>&"${ZED_FLOCK_FD}"
zed_unlock "${ZED_DEBUG_LOG}"
umask 077
zed_lock "${lockfile}"
exec >> "${ZED_DEBUG_LOG}"
printenv | sort
echo
exec >&-
zed_unlock "${lockfile}"
exit 0
+42 -4
View File
@@ -1,14 +1,52 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
# Copyright (c) 2020 by Delphix. All rights reserved.
#
#
# Log the zevent via syslog.
#
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
zed_exit_if_ignoring_this_event
zed_log_msg "eid=${ZEVENT_EID}" "class=${ZEVENT_SUBCLASS}" \
"${ZEVENT_POOL_GUID:+"pool_guid=${ZEVENT_POOL_GUID}"}" \
"${ZEVENT_VDEV_PATH:+"vdev_path=${ZEVENT_VDEV_PATH}"}" \
"${ZEVENT_VDEV_STATE_STR:+"vdev_state=${ZEVENT_VDEV_STATE_STR}"}"
# build a string of name=value pairs for this event
msg="eid=${ZEVENT_EID} class=${ZEVENT_SUBCLASS}"
if [ "${ZED_SYSLOG_DISPLAY_GUIDS}" = "1" ]; then
[ -n "${ZEVENT_POOL_GUID}" ] && msg="${msg} pool_guid=${ZEVENT_POOL_GUID}"
[ -n "${ZEVENT_VDEV_GUID}" ] && msg="${msg} vdev_guid=${ZEVENT_VDEV_GUID}"
else
[ -n "${ZEVENT_POOL}" ] && msg="${msg} pool='${ZEVENT_POOL}'"
[ -n "${ZEVENT_VDEV_PATH}" ] && msg="${msg} vdev=${ZEVENT_VDEV_PATH##*/}"
fi
# log pool state if state is anything other than 'ACTIVE'
[ -n "${ZEVENT_POOL_STATE_STR}" ] && [ "$ZEVENT_POOL_STATE" -ne 0 ] && \
msg="${msg} pool_state=${ZEVENT_POOL_STATE_STR}"
# Log the following payload nvpairs if they are present
[ -n "${ZEVENT_VDEV_STATE_STR}" ] && msg="${msg} vdev_state=${ZEVENT_VDEV_STATE_STR}"
[ -n "${ZEVENT_CKSUM_ALGORITHM}" ] && msg="${msg} algorithm=${ZEVENT_CKSUM_ALGORITHM}"
[ -n "${ZEVENT_ZIO_SIZE}" ] && msg="${msg} size=${ZEVENT_ZIO_SIZE}"
[ -n "${ZEVENT_ZIO_OFFSET}" ] && msg="${msg} offset=${ZEVENT_ZIO_OFFSET}"
[ -n "${ZEVENT_ZIO_PRIORITY}" ] && msg="${msg} priority=${ZEVENT_ZIO_PRIORITY}"
[ -n "${ZEVENT_ZIO_ERR}" ] && msg="${msg} err=${ZEVENT_ZIO_ERR}"
[ -n "${ZEVENT_ZIO_FLAGS}" ] && msg="${msg} flags=$(printf '0x%x' "${ZEVENT_ZIO_FLAGS}")"
# log delays that are >= 10 milisec
[ -n "${ZEVENT_ZIO_DELAY}" ] && [ "$ZEVENT_ZIO_DELAY" -gt 10000000 ] && \
msg="${msg} delay=$((ZEVENT_ZIO_DELAY / 1000000))ms"
# list the bookmark data together
# shellcheck disable=SC2153
[ -n "${ZEVENT_ZIO_OBJSET}" ] && \
msg="${msg} bookmark=${ZEVENT_ZIO_OBJSET}:${ZEVENT_ZIO_OBJECT}:${ZEVENT_ZIO_LEVEL}:${ZEVENT_ZIO_BLKID}"
zed_log_msg "${msg}"
exit 0
+2 -1
View File
@@ -1,4 +1,5 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Send notification in response to a DATA error.
#
@@ -25,7 +26,7 @@ zed_rate_limit "${rate_limit_tag}" || exit 3
umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} error for ${ZEVENT_POOL} on $(hostname)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
note_pathname="$(mktemp)"
{
echo "ZFS has detected a data error:"
echo
+3 -2
View File
@@ -1,4 +1,5 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Send notification in response to a given zevent.
#
@@ -23,7 +24,7 @@
# Rate-limit the notification based in part on the filename.
#
rate_limit_tag="${ZEVENT_POOL};${ZEVENT_SUBCLASS};$(basename -- "$0")"
rate_limit_tag="${ZEVENT_POOL};${ZEVENT_SUBCLASS};${0##*/}"
rate_limit_interval="${ZED_NOTIFY_INTERVAL_SECS}"
zed_rate_limit "${rate_limit_tag}" "${rate_limit_interval}" || exit 3
@@ -31,7 +32,7 @@ umask 077
pool_str="${ZEVENT_POOL:+" for ${ZEVENT_POOL}"}"
host_str=" on $(hostname)"
note_subject="ZFS ${ZEVENT_SUBCLASS} event${pool_str}${host_str}"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
note_pathname="$(mktemp)"
{
echo "ZFS has posted the following event:"
echo
@@ -1,11 +1,11 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Track changes to enumerated pools for use in early-boot
set -ef
FSLIST_DIR="@sysconfdir@/zfs/zfs-list.cache"
FSLIST_TMP="@runstatedir@/zfs-list.cache.new"
FSLIST="${FSLIST_DIR}/${ZEVENT_POOL}"
FSLIST="@sysconfdir@/zfs/zfs-list.cache/${ZEVENT_POOL}"
FSLIST_TMP="@runstatedir@/zfs-list.cache@${ZEVENT_POOL}"
# If the pool specific cache file is not writeable, abort
[ -w "${FSLIST}" ] || exit 0
@@ -13,21 +13,21 @@ FSLIST="${FSLIST_DIR}/${ZEVENT_POOL}"
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
zed_exit_if_ignoring_this_event
zed_check_cmd "${ZFS}" sort diff grep
[ "$ZEVENT_SUBCLASS" != "history_event" ] && exit 0
zed_check_cmd "${ZFS}" sort diff
# If we are acting on a snapshot, we have nothing to do
printf '%s' "${ZEVENT_HISTORY_DSNAME}" | grep '@' && exit 0
[ "${ZEVENT_HISTORY_DSNAME%@*}" = "${ZEVENT_HISTORY_DSNAME}" ] || exit 0
# We obtain a lock on zfs-list to avoid any simultaneous writes.
# We lock the output file to avoid simultaneous writes.
# If we run into trouble, log and drop the lock
abort_alter() {
zed_log_msg "Error updating zfs-list.cache!"
zed_unlock zfs-list
zed_log_msg "Error updating zfs-list.cache for ${ZEVENT_POOL}!"
zed_unlock "${FSLIST}"
}
finished() {
zed_unlock zfs-list
zed_unlock "${FSLIST}"
trap - EXIT
exit 0
}
@@ -37,7 +37,7 @@ case "${ZEVENT_HISTORY_INTERNAL_NAME}" in
;;
export)
zed_lock zfs-list
zed_lock "${FSLIST}"
trap abort_alter EXIT
echo > "${FSLIST}"
finished
@@ -46,8 +46,13 @@ case "${ZEVENT_HISTORY_INTERNAL_NAME}" in
set|inherit)
# Only act if one of the tracked properties is altered.
case "${ZEVENT_HISTORY_INTERNAL_STR%%=*}" in
canmount|mountpoint|atime|relatime|devices|exec| \
readonly|setuid|nbmand) ;;
canmount|mountpoint|atime|relatime|devices|exec|readonly| \
setuid|nbmand|encroot|keylocation|org.openzfs.systemd:requires| \
org.openzfs.systemd:requires-mounts-for| \
org.openzfs.systemd:before|org.openzfs.systemd:after| \
org.openzfs.systemd:wanted-by|org.openzfs.systemd:required-by| \
org.openzfs.systemd:nofail|org.openzfs.systemd:ignore \
) ;;
*) exit 0 ;;
esac
;;
@@ -58,19 +63,23 @@ case "${ZEVENT_HISTORY_INTERNAL_NAME}" in
;;
esac
zed_lock zfs-list
zed_lock "${FSLIST}"
trap abort_alter EXIT
PROPS="name,mountpoint,canmount,atime,relatime,devices,exec,readonly"
PROPS="${PROPS},setuid,nbmand"
PROPS="name,mountpoint,canmount,atime,relatime,devices,exec\
,readonly,setuid,nbmand,encroot,keylocation\
,org.openzfs.systemd:requires,org.openzfs.systemd:requires-mounts-for\
,org.openzfs.systemd:before,org.openzfs.systemd:after\
,org.openzfs.systemd:wanted-by,org.openzfs.systemd:required-by\
,org.openzfs.systemd:nofail,org.openzfs.systemd:ignore"
"${ZFS}" list -H -t filesystem -o $PROPS -r "${ZEVENT_POOL}" > "${FSLIST_TMP}"
"${ZFS}" list -H -t filesystem -o "${PROPS}" -r "${ZEVENT_POOL}" > "${FSLIST_TMP}"
# Sort the output so that it is stable
sort "${FSLIST_TMP}" -o "${FSLIST_TMP}"
# Don't modify the file if it hasn't changed
diff -q "${FSLIST_TMP}" "${FSLIST}" || mv "${FSLIST_TMP}" "${FSLIST}"
diff -q "${FSLIST_TMP}" "${FSLIST}" || cat "${FSLIST_TMP}" > "${FSLIST}"
rm -f "${FSLIST_TMP}"
finished
@@ -1,14 +1,17 @@
#!/bin/sh
# shellcheck disable=SC2154
# resilver_finish-start-scrub.sh
# Run a scrub after a resilver
#
# Exit codes:
# 1: Internal error
# 2: Script wasn't enabled in zed.rc
# 3: Scrubs are automatically started for sequential resilvers
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
[ "${ZED_SCRUB_AFTER_RESILVER}" = "1" ] || exit 2
[ "${ZEVENT_RESILVER_TYPE}" != "sequential" ] || exit 3
[ -n "${ZEVENT_POOL}" ] || exit 1
[ -n "${ZEVENT_SUBCLASS}" ] || exit 1
zed_check_cmd "${ZPOOL}" || exit 1
+2 -1
View File
@@ -1,4 +1,5 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Send notification in response to a RESILVER_FINISH or SCRUB_FINISH.
#
@@ -41,7 +42,7 @@ fi
umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} event for ${ZEVENT_POOL} on $(hostname)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
note_pathname="$(mktemp)"
{
echo "ZFS has finished a ${action}:"
echo
+117 -52
View File
@@ -1,26 +1,27 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Turn off/on the VDEV's enclosure fault LEDs when the pool's state changes.
# Turn off/on vdevs' enclosure fault LEDs when their pool's state changes.
#
# Turn the VDEV's fault LED on if it becomes FAULTED, DEGRADED or UNAVAIL.
# Turn the LED off when it's back ONLINE again.
# Turn a vdev's fault LED on if it becomes FAULTED, DEGRADED or UNAVAIL.
# Turn its LED off when it's back ONLINE again.
#
# This script run in two basic modes:
#
# 1. If $ZEVENT_VDEV_ENC_SYSFS_PATH and $ZEVENT_VDEV_STATE_STR are set, then
# only set the LED for that particular VDEV. This is the case for statechange
# only set the LED for that particular vdev. This is the case for statechange
# events and some vdev_* events.
#
# 2. If those vars are not set, then check the state of all VDEVs in the pool
# 2. If those vars are not set, then check the state of all vdevs in the pool
# and set the LEDs accordingly. This is the case for pool_import events.
#
# Note that this script requires that your enclosure be supported by the
# Linux SCSI enclosure services (ses) driver. The script will do nothing
# Linux SCSI Enclosure services (SES) driver. The script will do nothing
# if you have no enclosure, or if your enclosure isn't supported.
#
# Exit codes:
# 0: enclosure led successfully set
# 1: enclosure leds not not available
# 1: enclosure leds not available
# 2: enclosure leds administratively disabled
# 3: The led sysfs path passed from ZFS does not exist
# 4: $ZPOOL not set
@@ -29,7 +30,8 @@
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
if [ ! -d /sys/class/enclosure ] ; then
if [ ! -d /sys/class/enclosure ] && [ ! -d /sys/bus/pci/slots ] ; then
# No JBOD enclosure or NVMe slots
exit 1
fi
@@ -59,6 +61,10 @@ check_and_set_led()
file="$1"
val="$2"
if [ -z "$val" ]; then
return 0
fi
if [ ! -e "$file" ] ; then
return 3
fi
@@ -66,11 +72,11 @@ check_and_set_led()
# If another process is accessing the LED when we attempt to update it,
# the update will be lost so retry until the LED actually changes or we
# timeout.
for _ in $(seq 1 5); do
for _ in 1 2 3 4 5; do
# We want to check the current state first, since writing to the
# 'fault' entry always always causes a SES command, even if the
# 'fault' entry always causes a SES command, even if the
# current state is already what you want.
current=$(cat "${file}")
read -r current < "${file}"
# On some enclosures if you write 1 to fault, and read it back,
# it will return 2. Treat all non-zero values as 1 for
@@ -85,27 +91,85 @@ check_and_set_led()
else
break
fi
done
done
}
# Fault LEDs for JBODs and NVMe drives are handled a little differently.
#
# On JBODs the fault LED is called 'fault' and on a path like this:
#
# /sys/class/enclosure/0:0:1:0/SLOT 10/fault
#
# On NVMe it's called 'attention' and on a path like this:
#
# /sys/bus/pci/slot/0/attention
#
# This function returns the full path to the fault LED file for a given
# enclosure/slot directory.
#
path_to_led()
{
dir=$1
if [ -f "$dir/fault" ] ; then
echo "$dir/fault"
elif [ -f "$dir/attention" ] ; then
echo "$dir/attention"
fi
}
state_to_val()
{
state="$1"
if [ "$state" = "FAULTED" ] || [ "$state" = "DEGRADED" ] || \
[ "$state" = "UNAVAIL" ] ; then
echo 1
elif [ "$state" = "ONLINE" ] ; then
echo 0
fi
case "$state" in
FAULTED|DEGRADED|UNAVAIL|REMOVED)
echo 1
;;
ONLINE)
echo 0
;;
*)
echo "invalid state: $state"
;;
esac
}
# process_pool ([pool])
#
# Iterate through a pool (or pools) and set the VDEV's enclosure slot LEDs to
# the VDEV's state.
# Given a nvme name like 'nvme0n1', pass back its slot directory
# like "/sys/bus/pci/slots/0"
#
nvme_dev_to_slot()
{
dev="$1"
# Get the address "0000:01:00.0"
read -r address < "/sys/class/block/$dev/device/address"
find /sys/bus/pci/slots -regex '.*/[0-9]+/address$' | \
while read -r sys_addr; do
read -r this_address < "$sys_addr"
# The format of address is a little different between
# /sys/class/block/$dev/device/address and
# /sys/bus/pci/slots/
#
# address= "0000:01:00.0"
# this_address = "0000:01:00"
#
if echo "$address" | grep -Eq ^"$this_address" ; then
echo "${sys_addr%/*}"
break
fi
done
}
# process_pool (pool)
#
# Iterate through a pool and set the vdevs' enclosure slot LEDs to
# those vdevs' state.
#
# Arguments
# pool: Optional pool name. If not specified, iterate though all pools.
# pool: Pool name.
#
# Return
# 0 on success, 3 on missing sysfs path
@@ -113,19 +177,27 @@ state_to_val()
process_pool()
{
pool="$1"
# The output will be the vdevs only (from "grep '/dev/'"):
#
# U45 ONLINE 0 0 0 /dev/sdk 0
# U46 ONLINE 0 0 0 /dev/sdm 0
# U47 ONLINE 0 0 0 /dev/sdn 0
# U50 ONLINE 0 0 0 /dev/sdbn 0
#
ZPOOL_SCRIPTS_AS_ROOT=1 $ZPOOL status -c upath,fault_led "$pool" | grep '/dev/' | (
rc=0
# Lookup all the current LED values and paths in parallel
#shellcheck disable=SC2016
cmd='echo led_token=$(cat "$VDEV_ENC_SYSFS_PATH/fault"),"$VDEV_ENC_SYSFS_PATH",'
out=$($ZPOOL status -vc "$cmd" "$pool" | grep 'led_token=')
#shellcheck disable=SC2034
echo "$out" | while read -r vdev state read write chksum therest; do
while read -r vdev state _ _ _ therest; do
# Read out current LED value and path
tmp=$(echo "$therest" | sed 's/^.*led_token=//g')
vdev_enc_sysfs_path=$(echo "$tmp" | awk -F ',' '{print $2}')
current_val=$(echo "$tmp" | awk -F ',' '{print $1}')
# Get dev name (like 'sda')
dev=$(basename "$(echo "$therest" | awk '{print $(NF-1)}')")
vdev_enc_sysfs_path=$(realpath "/sys/class/block/$dev/device/enclosure_device"*)
if [ ! -d "$vdev_enc_sysfs_path" ] ; then
# This is not a JBOD disk, but it could be a PCI NVMe drive
vdev_enc_sysfs_path=$(nvme_dev_to_slot "$dev")
fi
current_val=$(echo "$therest" | awk '{print $NF}')
if [ "$current_val" != "0" ] ; then
current_val=1
@@ -136,40 +208,33 @@ process_pool()
continue
fi
if [ ! -e "$vdev_enc_sysfs_path/fault" ] ; then
#shellcheck disable=SC2030
rc=1
zed_log_msg "vdev $vdev '$file/fault' doesn't exist"
continue;
led_path=$(path_to_led "$vdev_enc_sysfs_path")
if [ ! -e "$led_path" ] ; then
rc=3
zed_log_msg "vdev $vdev '$led_path' doesn't exist"
continue
fi
val=$(state_to_val "$state")
if [ "$current_val" = "$val" ] ; then
# LED is already set correctly
continue;
continue
fi
if ! check_and_set_led "$vdev_enc_sysfs_path/fault" "$val"; then
rc=1
if ! check_and_set_led "$led_path" "$val"; then
rc=3
fi
done
#shellcheck disable=SC2031
if [ "$rc" = "0" ] ; then
return 0
else
# We didn't see a sysfs entry that we wanted to set
return 3
fi
exit "$rc"; )
}
if [ -n "$ZEVENT_VDEV_ENC_SYSFS_PATH" ] && [ -n "$ZEVENT_VDEV_STATE_STR" ] ; then
# Got a statechange for an individual VDEV
# Got a statechange for an individual vdev
val=$(state_to_val "$ZEVENT_VDEV_STATE_STR")
vdev=$(basename "$ZEVENT_VDEV_PATH")
check_and_set_led "$ZEVENT_VDEV_ENC_SYSFS_PATH/fault" "$val"
ledpath=$(path_to_led "$ZEVENT_VDEV_ENC_SYSFS_PATH")
check_and_set_led "$ledpath" "$val"
else
# Process the entire pool
poolname=$(zed_guid_to_pool "$ZEVENT_POOL_GUID")
+7 -5
View File
@@ -1,4 +1,5 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# CDDL HEADER START
#
@@ -15,7 +16,7 @@
# Send notification in response to a fault induced statechange
#
# ZEVENT_SUBCLASS: 'statechange'
# ZEVENT_VDEV_STATE_STR: 'DEGRADED', 'FAULTED' or 'REMOVED'
# ZEVENT_VDEV_STATE_STR: 'DEGRADED', 'FAULTED', 'REMOVED', or 'UNAVAIL'
#
# Exit codes:
# 0: notification sent
@@ -31,13 +32,14 @@
if [ "${ZEVENT_VDEV_STATE_STR}" != "FAULTED" ] \
&& [ "${ZEVENT_VDEV_STATE_STR}" != "DEGRADED" ] \
&& [ "${ZEVENT_VDEV_STATE_STR}" != "REMOVED" ]; then
&& [ "${ZEVENT_VDEV_STATE_STR}" != "REMOVED" ] \
&& [ "${ZEVENT_VDEV_STATE_STR}" != "UNAVAIL" ]; then
exit 3
fi
umask 077
note_subject="ZFS device fault for pool ${ZEVENT_POOL_GUID} on $(hostname)"
note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
note_subject="ZFS device fault for pool ${ZEVENT_POOL} on $(hostname)"
note_pathname="$(mktemp)"
{
if [ "${ZEVENT_VDEV_STATE_STR}" = "FAULTED" ] ; then
echo "The number of I/O errors associated with a ZFS device exceeded"
@@ -64,7 +66,7 @@ note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
[ -n "${ZEVENT_VDEV_GUID}" ] && echo " vguid: ${ZEVENT_VDEV_GUID}"
[ -n "${ZEVENT_VDEV_DEVID}" ] && echo " devid: ${ZEVENT_VDEV_DEVID}"
echo " pool: ${ZEVENT_POOL_GUID}"
echo " pool: ${ZEVENT_POOL} (${ZEVENT_POOL_GUID})"
} > "${note_pathname}"
+64
View File
@@ -0,0 +1,64 @@
#!/bin/sh
# shellcheck disable=SC3014,SC2154,SC2086,SC2034
#
# Turn off disk's enclosure slot if it becomes FAULTED.
#
# Bad SCSI disks can often "disappear and reappear" causing all sorts of chaos
# as they flip between FAULTED and ONLINE. If
# ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT is set in zed.rc, and the disk gets
# FAULTED, then power down the slot via sysfs:
#
# /sys/class/enclosure/<enclosure>/<slot>/power_status
#
# We assume the user will be responsible for turning the slot back on again.
#
# Note that this script requires that your enclosure be supported by the
# Linux SCSI Enclosure services (SES) driver. The script will do nothing
# if you have no enclosure, or if your enclosure isn't supported.
#
# Exit codes:
# 0: slot successfully powered off
# 1: enclosure not available
# 2: ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT disabled
# 3: vdev was not FAULTED
# 4: The enclosure sysfs path passed from ZFS does not exist
# 5: Enclosure slot didn't actually turn off after we told it to
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
if [ ! -d /sys/class/enclosure ] ; then
# No JBOD enclosure or NVMe slots
exit 1
fi
if [ "${ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT}" != "1" ] ; then
exit 2
fi
if [ "$ZEVENT_VDEV_STATE_STR" != "FAULTED" ] ; then
exit 3
fi
if [ ! -f "$ZEVENT_VDEV_ENC_SYSFS_PATH/power_status" ] ; then
exit 4
fi
# Turn off the slot and wait for sysfs to report that the slot is off.
# It can take ~400ms on some enclosures and multiple retries may be needed.
for i in $(seq 1 20) ; do
echo "off" | tee "$ZEVENT_VDEV_ENC_SYSFS_PATH/power_status"
for j in $(seq 1 5) ; do
if [ "$(cat $ZEVENT_VDEV_ENC_SYSFS_PATH/power_status)" == "off" ] ; then
break 2
fi
sleep 0.1
done
done
if [ "$(cat $ZEVENT_VDEV_ENC_SYSFS_PATH/power_status)" != "off" ] ; then
exit 5
fi
zed_log_msg "powered down slot $ZEVENT_VDEV_ENC_SYSFS_PATH for $ZEVENT_VDEV_PATH"
+38
View File
@@ -0,0 +1,38 @@
#!/bin/sh
# shellcheck disable=SC2154
#
# Send notification in response to a TRIM_FINISH. The event
# will be received for each vdev in the pool which was trimmed.
#
# Exit codes:
# 0: notification sent
# 1: notification failed
# 2: notification not configured
# 9: internal error
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
[ -n "${ZEVENT_POOL}" ] || exit 9
[ -n "${ZEVENT_SUBCLASS}" ] || exit 9
zed_check_cmd "${ZPOOL}" || exit 9
umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} event for ${ZEVENT_POOL} on $(hostname)"
note_pathname="$(mktemp)"
{
echo "ZFS has finished a trim:"
echo
echo " eid: ${ZEVENT_EID}"
echo " class: ${ZEVENT_SUBCLASS}"
echo " host: $(hostname)"
echo " time: ${ZEVENT_TIME_STRING}"
"${ZPOOL}" status -t "${ZEVENT_POOL}"
} > "${note_pathname}"
zed_notify "${note_subject}" "${note_pathname}"; rv=$?
rm -f "${note_pathname}"
exit "${rv}"
+284 -20
View File
@@ -1,5 +1,5 @@
#!/bin/sh
# shellcheck disable=SC2039
# shellcheck disable=SC2154,SC3043
# zed-functions.sh
#
# ZED helper functions for use in ZEDLETs
@@ -76,8 +76,7 @@ zed_log_msg()
#
zed_log_err()
{
logger -p "${ZED_SYSLOG_PRIORITY}" -t "${ZED_SYSLOG_TAG}" -- "error:" \
"$(basename -- "$0"):""${ZEVENT_EID:+" eid=${ZEVENT_EID}:"}" "$@"
zed_log_msg "error: ${0##*/}:""${ZEVENT_EID:+" eid=${ZEVENT_EID}:"}" "$@"
}
@@ -126,10 +125,8 @@ zed_lock()
# Obtain a lock on the file bound to the given file descriptor.
#
eval "exec ${fd}> '${lockfile}'"
err="$(flock --exclusive "${fd}" 2>&1)"
# shellcheck disable=SC2181
if [ $? -ne 0 ]; then
eval "exec ${fd}>> '${lockfile}'"
if ! err="$(flock --exclusive "${fd}" 2>&1)"; then
zed_log_err "failed to lock \"${lockfile}\": ${err}"
fi
@@ -165,9 +162,7 @@ zed_unlock()
fi
# Release the lock and close the file descriptor.
err="$(flock --unlock "${fd}" 2>&1)"
# shellcheck disable=SC2181
if [ $? -ne 0 ]; then
if ! err="$(flock --unlock "${fd}" 2>&1)"; then
zed_log_err "failed to unlock \"${lockfile}\": ${err}"
fi
eval "exec ${fd}>&-"
@@ -202,6 +197,18 @@ zed_notify()
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
zed_notify_slack_webhook "${subject}" "${pathname}"; rv=$?
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
zed_notify_pushover "${subject}" "${pathname}"; rv=$?
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
zed_notify_ntfy "${subject}" "${pathname}"; rv=$?
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
[ "${num_success}" -gt 0 ] && return 0
[ "${num_failure}" -gt 0 ] && return 1
return 2
@@ -220,6 +227,8 @@ zed_notify()
# ZED_EMAIL_OPTS. This undergoes the following keyword substitutions:
# - @ADDRESS@ is replaced with the space-delimited recipient email address(es)
# - @SUBJECT@ is replaced with the notification subject
# If @SUBJECT@ was omited here, a "Subject: ..." header will be added to notification
#
#
# Arguments
# subject: notification subject
@@ -237,7 +246,7 @@ zed_notify()
#
zed_notify_email()
{
local subject="$1"
local subject="${1:-"ZED notification"}"
local pathname="${2:-"/dev/null"}"
: "${ZED_EMAIL_PROG:="mail"}"
@@ -254,19 +263,30 @@ zed_notify_email()
[ -n "${subject}" ] || return 1
if [ ! -r "${pathname}" ]; then
zed_log_err \
"$(basename "${ZED_EMAIL_PROG}") cannot read \"${pathname}\""
"${ZED_EMAIL_PROG##*/} cannot read \"${pathname}\""
return 1
fi
ZED_EMAIL_OPTS="$(echo "${ZED_EMAIL_OPTS}" \
# construct cmdline options
ZED_EMAIL_OPTS_PARSED="$(echo "${ZED_EMAIL_OPTS}" \
| sed -e "s/@ADDRESS@/${ZED_EMAIL_ADDR}/g" \
-e "s/@SUBJECT@/${subject}/g")"
# shellcheck disable=SC2086
eval "${ZED_EMAIL_PROG}" ${ZED_EMAIL_OPTS} < "${pathname}" >/dev/null 2>&1
# pipe message to email prog
# shellcheck disable=SC2086,SC2248
{
# no subject passed as option?
if [ "${ZED_EMAIL_OPTS%@SUBJECT@*}" = "${ZED_EMAIL_OPTS}" ] ; then
# inject subject header
printf "Subject: %s\n" "${subject}"
fi
# output message
cat "${pathname}"
} |
eval ${ZED_EMAIL_PROG} ${ZED_EMAIL_OPTS_PARSED} >/dev/null 2>&1
rv=$?
if [ "${rv}" -ne 0 ]; then
zed_log_err "$(basename "${ZED_EMAIL_PROG}") exit=${rv}"
zed_log_err "${ZED_EMAIL_PROG##*/} exit=${rv}"
return 1
fi
return 0
@@ -359,6 +379,252 @@ zed_notify_pushbullet()
}
# zed_notify_slack_webhook (subject, pathname)
#
# Notification via Slack Webhook <https://api.slack.com/incoming-webhooks>.
# The Webhook URL (ZED_SLACK_WEBHOOK_URL) identifies this client to the
# Slack channel.
#
# Requires awk, curl, and sed executables to be installed in the standard PATH.
#
# References
# https://api.slack.com/incoming-webhooks
#
# Arguments
# subject: notification subject
# pathname: pathname containing the notification message (OPTIONAL)
#
# Globals
# ZED_SLACK_WEBHOOK_URL
#
# Return
# 0: notification sent
# 1: notification failed
# 2: not configured
#
zed_notify_slack_webhook()
{
[ -n "${ZED_SLACK_WEBHOOK_URL}" ] || return 2
local subject="$1"
local pathname="${2:-"/dev/null"}"
local msg_body
local msg_tag
local msg_json
local msg_out
local msg_err
local url="${ZED_SLACK_WEBHOOK_URL}"
[ -n "${subject}" ] || return 1
if [ ! -r "${pathname}" ]; then
zed_log_err "slack webhook cannot read \"${pathname}\""
return 1
fi
zed_check_cmd "awk" "curl" "sed" || return 1
# Escape the following characters in the message body for JSON:
# newline, backslash, double quote, horizontal tab, vertical tab,
# and carriage return.
#
msg_body="$(awk '{ ORS="\\n" } { gsub(/\\/, "\\\\"); gsub(/"/, "\\\"");
gsub(/\t/, "\\t"); gsub(/\f/, "\\f"); gsub(/\r/, "\\r"); print }' \
"${pathname}")"
# Construct the JSON message for posting.
#
msg_json="$(printf '{"text": "*%s*\\n%s"}' "${subject}" "${msg_body}" )"
# Send the POST request and check for errors.
#
msg_out="$(curl -X POST "${url}" \
--header "Content-Type: application/json" --data-binary "${msg_json}" \
2>/dev/null)"; rv=$?
if [ "${rv}" -ne 0 ]; then
zed_log_err "curl exit=${rv}"
return 1
fi
msg_err="$(echo "${msg_out}" \
| sed -n -e 's/.*"error" *:.*"message" *: *"\([^"]*\)".*/\1/p')"
if [ -n "${msg_err}" ]; then
zed_log_err "slack webhook \"${msg_err}"\"
return 1
fi
return 0
}
# zed_notify_pushover (subject, pathname)
#
# Send a notification via Pushover <https://pushover.net/>.
# The access token (ZED_PUSHOVER_TOKEN) identifies this client to the
# Pushover server. The user token (ZED_PUSHOVER_USER) defines the user or
# group to which the notification will be sent.
#
# Requires curl and sed executables to be installed in the standard PATH.
#
# References
# https://pushover.net/api
#
# Arguments
# subject: notification subject
# pathname: pathname containing the notification message (OPTIONAL)
#
# Globals
# ZED_PUSHOVER_TOKEN
# ZED_PUSHOVER_USER
#
# Return
# 0: notification sent
# 1: notification failed
# 2: not configured
#
zed_notify_pushover()
{
local subject="$1"
local pathname="${2:-"/dev/null"}"
local msg_body
local msg_out
local msg_err
local url="https://api.pushover.net/1/messages.json"
[ -n "${ZED_PUSHOVER_TOKEN}" ] && [ -n "${ZED_PUSHOVER_USER}" ] || return 2
if [ ! -r "${pathname}" ]; then
zed_log_err "pushover cannot read \"${pathname}\""
return 1
fi
zed_check_cmd "curl" "sed" || return 1
# Read the message body in.
#
msg_body="$(cat "${pathname}")"
if [ -z "${msg_body}" ]
then
msg_body=$subject
subject=""
fi
# Send the POST request and check for errors.
#
msg_out="$( \
curl \
--form-string "token=${ZED_PUSHOVER_TOKEN}" \
--form-string "user=${ZED_PUSHOVER_USER}" \
--form-string "message=${msg_body}" \
--form-string "title=${subject}" \
"${url}" \
2>/dev/null \
)"; rv=$?
if [ "${rv}" -ne 0 ]; then
zed_log_err "curl exit=${rv}"
return 1
fi
msg_err="$(echo "${msg_out}" \
| sed -n -e 's/.*"errors" *:.*\[\(.*\)\].*/\1/p')"
if [ -n "${msg_err}" ]; then
zed_log_err "pushover \"${msg_err}"\"
return 1
fi
return 0
}
# zed_notify_ntfy (subject, pathname)
#
# Send a notification via Ntfy.sh <https://ntfy.sh/>.
# The ntfy topic (ZED_NTFY_TOPIC) identifies the topic that the notification
# will be sent to Ntfy.sh server. The ntfy url (ZED_NTFY_URL) defines the
# self-hosted or provided hosted ntfy service location. The ntfy access token
# <https://docs.ntfy.sh/publish/#access-tokens> (ZED_NTFY_ACCESS_TOKEN) reprsents an
# access token that could be used if a topic is read/write protected. If a
# topic can be written to publicaly, a ZED_NTFY_ACCESS_TOKEN is not required.
#
# Requires curl and sed executables to be installed in the standard PATH.
#
# References
# https://docs.ntfy.sh
#
# Arguments
# subject: notification subject
# pathname: pathname containing the notification message (OPTIONAL)
#
# Globals
# ZED_NTFY_TOPIC
# ZED_NTFY_ACCESS_TOKEN (OPTIONAL)
# ZED_NTFY_URL
#
# Return
# 0: notification sent
# 1: notification failed
# 2: not configured
#
zed_notify_ntfy()
{
local subject="$1"
local pathname="${2:-"/dev/null"}"
local msg_body
local msg_out
local msg_err
[ -n "${ZED_NTFY_TOPIC}" ] || return 2
local url="${ZED_NTFY_URL:-"https://ntfy.sh"}/${ZED_NTFY_TOPIC}"
if [ ! -r "${pathname}" ]; then
zed_log_err "ntfy cannot read \"${pathname}\""
return 1
fi
zed_check_cmd "curl" "sed" || return 1
# Read the message body in.
#
msg_body="$(cat "${pathname}")"
if [ -z "${msg_body}" ]
then
msg_body=$subject
subject=""
fi
# Send the POST request and check for errors.
#
if [ -n "${ZED_NTFY_ACCESS_TOKEN}" ]; then
msg_out="$( \
curl \
-u ":${ZED_NTFY_ACCESS_TOKEN}" \
-H "Title: ${subject}" \
-d "${msg_body}" \
-H "Priority: high" \
"${url}" \
2>/dev/null \
)"; rv=$?
else
msg_out="$( \
curl \
-H "Title: ${subject}" \
-d "${msg_body}" \
-H "Priority: high" \
"${url}" \
2>/dev/null \
)"; rv=$?
fi
if [ "${rv}" -ne 0 ]; then
zed_log_err "curl exit=${rv}"
return 1
fi
msg_err="$(echo "${msg_out}" \
| sed -n -e 's/.*"errors" *:.*\[\(.*\)\].*/\1/p')"
if [ -n "${msg_err}" ]; then
zed_log_err "ntfy \"${msg_err}"\"
return 1
fi
return 0
}
# zed_rate_limit (tag, [interval])
#
# Check whether an event of a given type [tag] has already occurred within the
@@ -433,10 +699,8 @@ zed_guid_to_pool()
return
fi
guid=$(printf "%llu" "$1")
if [ -n "$guid" ] ; then
$ZPOOL get -H -ovalue,name guid | awk '$1=='"$guid"' {print $2}'
fi
guid="$(printf "%u" "$1")"
$ZPOOL get -H -ovalue,name guid | awk '$1 == '"$guid"' {print $2; exit}'
}
# zed_exit_if_ignoring_this_event
+65 -8
View File
@@ -1,8 +1,7 @@
##
# zed.rc
#
# This file should be owned by root and permissioned 0600.
# zed.rc ZEDLET configuration.
##
# shellcheck disable=SC2034
##
# Absolute path to the debug output file.
@@ -13,9 +12,9 @@
# Email address of the zpool administrator for receipt of notifications;
# multiple addresses can be specified if they are delimited by whitespace.
# Email will only be sent if ZED_EMAIL_ADDR is defined.
# Disabled by default; uncomment to enable.
# Enabled by default; comment to disable.
#
#ZED_EMAIL_ADDR="root"
ZED_EMAIL_ADDR="root"
##
# Name or path of executable responsible for sending notifications via email;
@@ -30,6 +29,7 @@
# The string @SUBJECT@ will be replaced with the notification subject;
# this should be protected with quotes to prevent word-splitting.
# Email will only be sent if ZED_EMAIL_ADDR is defined.
# If @SUBJECT@ was omited here, a "Subject: ..." header will be added to notification
#
#ZED_EMAIL_OPTS="-s '@SUBJECT@' @ADDRESS@"
@@ -74,6 +74,31 @@
#
#ZED_PUSHBULLET_CHANNEL_TAG=""
##
# Slack Webhook URL.
# This allows posting to the given channel and includes an access token.
# <https://api.slack.com/incoming-webhooks>
# Disabled by default; uncomment to enable.
#
#ZED_SLACK_WEBHOOK_URL=""
##
# Pushover token.
# This defines the application from which the notification will be sent.
# <https://pushover.net/api#registration>
# Disabled by default; uncomment to enable.
# ZED_PUSHOVER_USER, below, must also be configured.
#
#ZED_PUSHOVER_TOKEN=""
##
# Pushover user key.
# This defines which user or group will receive Pushover notifications.
# <https://pushover.net/api#identifiers>
# Disabled by default; uncomment to enable.
# ZED_PUSHOVER_TOKEN, above, must also be configured.
#ZED_PUSHOVER_USER=""
##
# Default directory for zed state files.
#
@@ -81,8 +106,8 @@
##
# Turn on/off enclosure LEDs when drives get DEGRADED/FAULTED. This works for
# device mapper and multipath devices as well. Your enclosure must be
# supported by the Linux SES driver for this to work.
# device mapper and multipath devices as well. This works with JBOD enclosures
# and NVMe PCI drives (assuming they're supported by Linux in sysfs).
#
ZED_USE_ENCLOSURE_LEDS=1
@@ -110,5 +135,37 @@ ZED_USE_ENCLOSURE_LEDS=1
# Otherwise, if ZED_SYSLOG_SUBCLASS_EXCLUDE is set, the
# matching subclasses are excluded from logging.
#ZED_SYSLOG_SUBCLASS_INCLUDE="checksum|scrub_*|vdev.*"
#ZED_SYSLOG_SUBCLASS_EXCLUDE="statechange|config_*|history_event"
ZED_SYSLOG_SUBCLASS_EXCLUDE="history_event"
##
# Use GUIDs instead of names when logging pool and vdevs
# Disabled by default, 1 to enable and 0 to disable.
#ZED_SYSLOG_DISPLAY_GUIDS=1
##
# Power off the drive's slot in the enclosure if it becomes FAULTED. This can
# help silence misbehaving drives. This assumes your drive enclosure fully
# supports slot power control via sysfs.
#ZED_POWER_OFF_ENCLOSURE_SLOT_ON_FAULT=1
##
# Ntfy topic
# This defines which topic will receive the ntfy notification.
# <https://docs.ntfy.sh/publish/>
# Disabled by default; uncomment to enable.
#ZED_NTFY_TOPIC=""
##
# Ntfy access token (optional for public topics)
# This defines an access token which can be used
# to allow you to authenticate when sending to topics
# <https://docs.ntfy.sh/publish/#access-tokens>
# Disabled by default; uncomment to enable.
#ZED_NTFY_ACCESS_TOKEN=""
##
# Ntfy Service URL
# This defines which service the ntfy call will be directed toward
# <https://docs.ntfy.sh/install/>
# https://ntfy.sh by default; uncomment to enable an alternative service url.
#ZED_NTFY_URL="https://ntfy.sh"
+3 -18
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -15,11 +15,6 @@
#ifndef ZED_H
#define ZED_H
/*
* Absolute path for the default zed configuration file.
*/
#define ZED_CONF_FILE SYSCONFDIR "/zfs/zed.conf"
/*
* Absolute path for the default zed pid file.
*/
@@ -35,16 +30,6 @@
*/
#define ZED_ZEDLET_DIR SYSCONFDIR "/zfs/zed.d"
/*
* Reserved for future use.
*/
#define ZED_MAX_EVENTS 0
/*
* Reserved for future use.
*/
#define ZED_MIN_EVENTS 0
/*
* String prefix for ZED variables passed via environment variables.
*/
+116 -126
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -22,6 +22,7 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/uio.h>
#include <unistd.h>
@@ -32,43 +33,27 @@
#include "zed_strings.h"
/*
* Return a new configuration with default values.
* Initialise the configuration with default values.
*/
struct zed_conf *
zed_conf_create(void)
void
zed_conf_init(struct zed_conf *zcp)
{
struct zed_conf *zcp;
memset(zcp, 0, sizeof (*zcp));
zcp = calloc(1, sizeof (*zcp));
if (!zcp)
goto nomem;
/* zcp->zfs_hdl opened in zed_event_init() */
/* zcp->zedlets created in zed_conf_scan_dir() */
zcp->syslog_facility = LOG_DAEMON;
zcp->min_events = ZED_MIN_EVENTS;
zcp->max_events = ZED_MAX_EVENTS;
zcp->pid_fd = -1;
zcp->zedlets = NULL; /* created via zed_conf_scan_dir() */
zcp->state_fd = -1; /* opened via zed_conf_open_state() */
zcp->zfs_hdl = NULL; /* opened via zed_event_init() */
zcp->zevent_fd = -1; /* opened via zed_event_init() */
zcp->pid_fd = -1; /* opened in zed_conf_write_pid() */
zcp->state_fd = -1; /* opened in zed_conf_open_state() */
zcp->zevent_fd = -1; /* opened in zed_event_init() */
if (!(zcp->conf_file = strdup(ZED_CONF_FILE)))
goto nomem;
zcp->max_jobs = 16;
zcp->max_zevent_buf_len = 1 << 20;
if (!(zcp->pid_file = strdup(ZED_PID_FILE)))
goto nomem;
if (!(zcp->zedlet_dir = strdup(ZED_ZEDLET_DIR)))
goto nomem;
if (!(zcp->state_file = strdup(ZED_STATE_FILE)))
goto nomem;
return (zcp);
nomem:
zed_log_die("Failed to create conf: %s", strerror(errno));
return (NULL);
if (!(zcp->pid_file = strdup(ZED_PID_FILE)) ||
!(zcp->zedlet_dir = strdup(ZED_ZEDLET_DIR)) ||
!(zcp->state_file = strdup(ZED_STATE_FILE)))
zed_log_die("Failed to create conf: %s", strerror(errno));
}
/*
@@ -79,9 +64,6 @@ nomem:
void
zed_conf_destroy(struct zed_conf *zcp)
{
if (!zcp)
return;
if (zcp->state_fd >= 0) {
if (close(zcp->state_fd) < 0)
zed_log_msg(LOG_WARNING,
@@ -102,10 +84,6 @@ zed_conf_destroy(struct zed_conf *zcp)
zcp->pid_file, strerror(errno));
zcp->pid_fd = -1;
}
if (zcp->conf_file) {
free(zcp->conf_file);
zcp->conf_file = NULL;
}
if (zcp->pid_file) {
free(zcp->pid_file);
zcp->pid_file = NULL;
@@ -122,7 +100,6 @@ zed_conf_destroy(struct zed_conf *zcp)
zed_strings_destroy(zcp->zedlets);
zcp->zedlets = NULL;
}
free(zcp);
}
/*
@@ -132,44 +109,54 @@ zed_conf_destroy(struct zed_conf *zcp)
* otherwise, output to stderr and exit with a failure status.
*/
static void
_zed_conf_display_help(const char *prog, int got_err)
_zed_conf_display_help(const char *prog, boolean_t got_err)
{
struct opt { const char *o, *d, *v; };
FILE *fp = got_err ? stderr : stdout;
int w1 = 4; /* width of leading whitespace */
int w2 = 8; /* width of L-justified option field */
struct opt *oo;
struct opt iopts[] = {
{ .o = "-h", .d = "Display help" },
{ .o = "-L", .d = "Display license information" },
{ .o = "-V", .d = "Display version information" },
{},
};
struct opt nopts[] = {
{ .o = "-v", .d = "Be verbose" },
{ .o = "-f", .d = "Force daemon to run" },
{ .o = "-F", .d = "Run daemon in the foreground" },
{ .o = "-I",
.d = "Idle daemon until kernel module is (re)loaded" },
{ .o = "-M", .d = "Lock all pages in memory" },
{ .o = "-P", .d = "$PATH for ZED to use (only used by ZTS)" },
{ .o = "-Z", .d = "Zero state file" },
{},
};
struct opt vopts[] = {
{ .o = "-d DIR", .d = "Read enabled ZEDLETs from DIR.",
.v = ZED_ZEDLET_DIR },
{ .o = "-p FILE", .d = "Write daemon's PID to FILE.",
.v = ZED_PID_FILE },
{ .o = "-s FILE", .d = "Write daemon's state to FILE.",
.v = ZED_STATE_FILE },
{ .o = "-j JOBS", .d = "Start at most JOBS at once.",
.v = "16" },
{ .o = "-b LEN", .d = "Cap kernel event buffer at LEN entries.",
.v = "1048576" },
{},
};
fprintf(fp, "Usage: %s [OPTION]...\n", (prog ? prog : "zed"));
fprintf(fp, "\n");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-h",
"Display help.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-L",
"Display license information.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-V",
"Display version information.");
for (oo = iopts; oo->o; ++oo)
fprintf(fp, " %*s %s\n", -8, oo->o, oo->d);
fprintf(fp, "\n");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-v",
"Be verbose.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-f",
"Force daemon to run.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-F",
"Run daemon in the foreground.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-M",
"Lock all pages in memory.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-P",
"$PATH for ZED to use (only used by ZTS).");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-Z",
"Zero state file.");
for (oo = nopts; oo->o; ++oo)
fprintf(fp, " %*s %s\n", -8, oo->o, oo->d);
fprintf(fp, "\n");
#if 0
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-c FILE",
"Read configuration from FILE.", ZED_CONF_FILE);
#endif
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-d DIR",
"Read enabled ZEDLETs from DIR.", ZED_ZEDLET_DIR);
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-p FILE",
"Write daemon's PID to FILE.", ZED_PID_FILE);
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-s FILE",
"Write daemon's state to FILE.", ZED_STATE_FILE);
for (oo = vopts; oo->o; ++oo)
fprintf(fp, " %*s %s [%s]\n", -8, oo->o, oo->d, oo->v);
fprintf(fp, "\n");
exit(got_err ? EXIT_FAILURE : EXIT_SUCCESS);
@@ -181,20 +168,14 @@ _zed_conf_display_help(const char *prog, int got_err)
static void
_zed_conf_display_license(void)
{
const char **pp;
const char *text[] = {
"The ZFS Event Daemon (ZED) is distributed under the terms of the",
" Common Development and Distribution License (CDDL-1.0)",
" <http://opensource.org/licenses/CDDL-1.0>.",
"",
printf(
"The ZFS Event Daemon (ZED) is distributed under the terms of the\n"
" Common Development and Distribution License (CDDL-1.0)\n"
" <http://opensource.org/licenses/CDDL-1.0>.\n"
"\n"
"Developed at Lawrence Livermore National Laboratory"
" (LLNL-CODE-403049).",
"",
NULL
};
for (pp = text; *pp; pp++)
printf("%s\n", *pp);
" (LLNL-CODE-403049).\n"
"\n");
exit(EXIT_SUCCESS);
}
@@ -229,16 +210,19 @@ _zed_conf_parse_path(char **resultp, const char *path)
if (path[0] == '/') {
*resultp = strdup(path);
} else if (!getcwd(buf, sizeof (buf))) {
zed_log_die("Failed to get current working dir: %s",
strerror(errno));
} else if (strlcat(buf, "/", sizeof (buf)) >= sizeof (buf)) {
zed_log_die("Failed to copy path: %s", strerror(ENAMETOOLONG));
} else if (strlcat(buf, path, sizeof (buf)) >= sizeof (buf)) {
zed_log_die("Failed to copy path: %s", strerror(ENAMETOOLONG));
} else {
if (!getcwd(buf, sizeof (buf)))
zed_log_die("Failed to get current working dir: %s",
strerror(errno));
if (strlcat(buf, "/", sizeof (buf)) >= sizeof (buf) ||
strlcat(buf, path, sizeof (buf)) >= sizeof (buf))
zed_log_die("Failed to copy path: %s",
strerror(ENAMETOOLONG));
*resultp = strdup(buf);
}
if (!*resultp)
zed_log_die("Failed to copy path: %s", strerror(ENOMEM));
}
@@ -249,8 +233,9 @@ _zed_conf_parse_path(char **resultp, const char *path)
void
zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
{
const char * const opts = ":hLVc:d:p:P:s:vfFMZ";
const char * const opts = ":hLVd:p:P:s:vfFMZIj:b:";
int opt;
unsigned long raw;
if (!zcp || !argv || !argv[0])
zed_log_die("Failed to parse options: Internal error");
@@ -260,7 +245,7 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
while ((opt = getopt(argc, argv, opts)) != -1) {
switch (opt) {
case 'h':
_zed_conf_display_help(argv[0], EXIT_SUCCESS);
_zed_conf_display_help(argv[0], B_FALSE);
break;
case 'L':
_zed_conf_display_license();
@@ -268,12 +253,12 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
case 'V':
_zed_conf_display_version();
break;
case 'c':
_zed_conf_parse_path(&zcp->conf_file, optarg);
break;
case 'd':
_zed_conf_parse_path(&zcp->zedlet_dir, optarg);
break;
case 'I':
zcp->do_idle = 1;
break;
case 'p':
_zed_conf_parse_path(&zcp->pid_file, optarg);
break;
@@ -298,31 +283,41 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
case 'Z':
zcp->do_zero = 1;
break;
case 'j':
errno = 0;
raw = strtoul(optarg, NULL, 0);
if (errno == ERANGE || raw > INT16_MAX) {
zed_log_die("%lu is too many jobs", raw);
} if (raw == 0) {
zed_log_die("0 jobs makes no sense");
} else {
zcp->max_jobs = raw;
}
break;
case 'b':
errno = 0;
raw = strtoul(optarg, NULL, 0);
if (errno == ERANGE || raw > INT32_MAX) {
zed_log_die("%lu is too large", raw);
} if (raw == 0) {
zcp->max_zevent_buf_len = INT32_MAX;
} else {
zcp->max_zevent_buf_len = raw;
}
break;
case '?':
default:
if (optopt == '?')
_zed_conf_display_help(argv[0], EXIT_SUCCESS);
_zed_conf_display_help(argv[0], B_FALSE);
fprintf(stderr, "%s: %s '-%c'\n\n", argv[0],
"Invalid option", optopt);
_zed_conf_display_help(argv[0], EXIT_FAILURE);
fprintf(stderr, "%s: Invalid option '-%c'\n\n",
argv[0], optopt);
_zed_conf_display_help(argv[0], B_TRUE);
break;
}
}
}
/*
* Parse the configuration file into the configuration [zcp].
*
* FIXME: Not yet implemented.
*/
void
zed_conf_parse_file(struct zed_conf *zcp)
{
if (!zcp)
zed_log_die("Failed to parse config: %s", strerror(EINVAL));
}
/*
* Scan the [zcp] zedlet_dir for files to exec based on the event class.
* Files must be executable by user, but not writable by group or other.
@@ -330,8 +325,6 @@ zed_conf_parse_file(struct zed_conf *zcp)
*
* Return 0 on success with an updated set of zedlets,
* or -1 on error with errno set.
*
* FIXME: Check if zedlet_dir and all parent dirs are secure.
*/
int
zed_conf_scan_dir(struct zed_conf *zcp)
@@ -447,8 +440,6 @@ zed_conf_scan_dir(struct zed_conf *zcp)
int
zed_conf_write_pid(struct zed_conf *zcp)
{
const mode_t dirmode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
const mode_t filemode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;
char buf[PATH_MAX];
int n;
char *p;
@@ -476,7 +467,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
if (p)
*p = '\0';
if ((mkdirp(buf, dirmode) < 0) && (errno != EEXIST)) {
if ((mkdirp(buf, 0755) < 0) && (errno != EEXIST)) {
zed_log_msg(LOG_ERR, "Failed to create directory \"%s\": %s",
buf, strerror(errno));
goto err;
@@ -486,7 +477,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
*/
mask = umask(0);
umask(mask | 022);
zcp->pid_fd = open(zcp->pid_file, (O_RDWR | O_CREAT), filemode);
zcp->pid_fd = open(zcp->pid_file, O_RDWR | O_CREAT | O_CLOEXEC, 0644);
umask(mask);
if (zcp->pid_fd < 0) {
zed_log_msg(LOG_ERR, "Failed to open PID file \"%s\": %s",
@@ -523,7 +514,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
errno = ERANGE;
zed_log_msg(LOG_ERR, "Failed to write PID file \"%s\": %s",
zcp->pid_file, strerror(errno));
} else if (zed_file_write_n(zcp->pid_fd, buf, n) != n) {
} else if (write(zcp->pid_fd, buf, n) != n) {
zed_log_msg(LOG_ERR, "Failed to write PID file \"%s\": %s",
zcp->pid_file, strerror(errno));
} else if (fdatasync(zcp->pid_fd) < 0) {
@@ -551,7 +542,6 @@ int
zed_conf_open_state(struct zed_conf *zcp)
{
char dirbuf[PATH_MAX];
mode_t dirmode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
int n;
char *p;
int rv;
@@ -573,7 +563,7 @@ zed_conf_open_state(struct zed_conf *zcp)
if (p)
*p = '\0';
if ((mkdirp(dirbuf, dirmode) < 0) && (errno != EEXIST)) {
if ((mkdirp(dirbuf, 0755) < 0) && (errno != EEXIST)) {
zed_log_msg(LOG_WARNING,
"Failed to create directory \"%s\": %s",
dirbuf, strerror(errno));
@@ -591,7 +581,7 @@ zed_conf_open_state(struct zed_conf *zcp)
(void) unlink(zcp->state_file);
zcp->state_fd = open(zcp->state_file,
(O_RDWR | O_CREAT), (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH));
O_RDWR | O_CREAT | O_CLOEXEC, 0644);
if (zcp->state_fd < 0) {
zed_log_msg(LOG_WARNING, "Failed to open state file \"%s\": %s",
zcp->state_file, strerror(errno));
@@ -667,7 +657,7 @@ zed_conf_read_state(struct zed_conf *zcp, uint64_t *eidp, int64_t etime[])
} else if (n != len) {
errno = EIO;
zed_log_msg(LOG_WARNING,
"Failed to read state file \"%s\": Read %d of %d bytes",
"Failed to read state file \"%s\": Read %zd of %zd bytes",
zcp->state_file, n, len);
return (-1);
}
@@ -716,7 +706,7 @@ zed_conf_write_state(struct zed_conf *zcp, uint64_t eid, int64_t etime[])
if (n != len) {
errno = EIO;
zed_log_msg(LOG_WARNING,
"Failed to write state file \"%s\": Wrote %d of %d bytes",
"Failed to write state file \"%s\": Wrote %zd of %zd bytes",
zcp->state_file, n, len);
return (-1);
}
+20 -22
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -20,42 +20,40 @@
#include "zed_strings.h"
struct zed_conf {
unsigned do_force:1; /* true if force enabled */
unsigned do_foreground:1; /* true if run in foreground */
unsigned do_memlock:1; /* true if locking memory */
unsigned do_verbose:1; /* true if verbosity enabled */
unsigned do_zero:1; /* true if zeroing state */
int syslog_facility; /* syslog facility value */
int min_events; /* RESERVED FOR FUTURE USE */
int max_events; /* RESERVED FOR FUTURE USE */
char *conf_file; /* abs path to config file */
char *pid_file; /* abs path to pid file */
int pid_fd; /* fd to pid file for lock */
char *zedlet_dir; /* abs path to zedlet dir */
zed_strings_t *zedlets; /* names of enabled zedlets */
char *state_file; /* abs path to state file */
int state_fd; /* fd to state file */
libzfs_handle_t *zfs_hdl; /* handle to libzfs */
int zevent_fd; /* fd for access to zevents */
zed_strings_t *zedlets; /* names of enabled zedlets */
char *path; /* custom $PATH for zedlets to use */
int pid_fd; /* fd to pid file for lock */
int state_fd; /* fd to state file */
int zevent_fd; /* fd for access to zevents */
int16_t max_jobs; /* max zedlets to run at one time */
int32_t max_zevent_buf_len; /* max size of kernel event list */
boolean_t do_force:1; /* true if force enabled */
boolean_t do_foreground:1; /* true if run in foreground */
boolean_t do_memlock:1; /* true if locking memory */
boolean_t do_verbose:1; /* true if verbosity enabled */
boolean_t do_zero:1; /* true if zeroing state */
boolean_t do_idle:1; /* true if idle enabled */
};
struct zed_conf *zed_conf_create(void);
void zed_conf_init(struct zed_conf *zcp);
void zed_conf_destroy(struct zed_conf *zcp);
void zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv);
void zed_conf_parse_file(struct zed_conf *zcp);
int zed_conf_scan_dir(struct zed_conf *zcp);
int zed_conf_write_pid(struct zed_conf *zcp);
int zed_conf_open_state(struct zed_conf *zcp);
int zed_conf_read_state(struct zed_conf *zcp, uint64_t *eidp, int64_t etime[]);
int zed_conf_write_state(struct zed_conf *zcp, uint64_t eid, int64_t etime[]);
#endif /* !ZED_CONF_H */
+68 -15
View File
@@ -49,7 +49,7 @@ struct udev_monitor *g_mon;
#define DEV_BYID_PATH "/dev/disk/by-id/"
/* 64MB is minimum usable disk for ZFS */
#define MINIMUM_SECTORS 131072
#define MINIMUM_SECTORS 131072ULL
/*
@@ -60,7 +60,7 @@ struct udev_monitor *g_mon;
static void
zed_udev_event(const char *class, const char *subclass, nvlist_t *nvl)
{
char *strval;
const char *strval;
uint64_t numval;
zed_log_msg(LOG_INFO, "zed_disk_event:");
@@ -72,10 +72,14 @@ zed_udev_event(const char *class, const char *subclass, nvlist_t *nvl)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_PATH, strval);
if (nvlist_lookup_string(nvl, DEV_IDENTIFIER, &strval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_IDENTIFIER, strval);
if (nvlist_lookup_boolean(nvl, DEV_IS_PART) == B_TRUE)
zed_log_msg(LOG_INFO, "\t%s: B_TRUE", DEV_IS_PART);
if (nvlist_lookup_string(nvl, DEV_PHYS_PATH, &strval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_PHYS_PATH, strval);
if (nvlist_lookup_uint64(nvl, DEV_SIZE, &numval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %llu", DEV_SIZE, numval);
if (nvlist_lookup_uint64(nvl, DEV_PARENT_SIZE, &numval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %llu", DEV_PARENT_SIZE, numval);
if (nvlist_lookup_uint64(nvl, ZFS_EV_POOL_GUID, &numval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %llu", ZFS_EV_POOL_GUID, numval);
if (nvlist_lookup_uint64(nvl, ZFS_EV_VDEV_GUID, &numval) == 0)
@@ -128,6 +132,20 @@ dev_event_nvlist(struct udev_device *dev)
numval *= strtoull(value, NULL, 10);
(void) nvlist_add_uint64(nvl, DEV_SIZE, numval);
/*
* If the device has a parent, then get the parent block
* device's size as well. For example, /dev/sda1's parent
* is /dev/sda.
*/
struct udev_device *parent_dev = udev_device_get_parent(dev);
if ((value = udev_device_get_sysattr_value(parent_dev, "size"))
!= NULL) {
uint64_t numval = DEV_BSIZE;
numval *= strtoull(value, NULL, 10);
(void) nvlist_add_uint64(nvl, DEV_PARENT_SIZE, numval);
}
}
/*
@@ -160,14 +178,15 @@ static void *
zed_udev_monitor(void *arg)
{
struct udev_monitor *mon = arg;
char *tmp, *tmp2;
const char *tmp;
char *tmp2;
zed_log_msg(LOG_INFO, "Waiting for new udev disk events...");
while (1) {
struct udev_device *dev;
const char *action, *type, *part, *sectors;
const char *bus, *uuid;
const char *bus, *uuid, *devpath;
const char *class, *subclass;
nvlist_t *nvl;
boolean_t is_zfs = B_FALSE;
@@ -206,6 +225,12 @@ zed_udev_monitor(void *arg)
* if this is a disk and it is partitioned, then the
* zfs label will reside in a DEVTYPE=partition and
* we can skip passing this event
*
* Special case: Blank disks are sometimes reported with
* an erroneous 'atari' partition, and should not be
* excluded from being used as an autoreplace disk:
*
* https://github.com/openzfs/zfs/issues/13497
*/
type = udev_device_get_property_value(dev, "DEVTYPE");
part = udev_device_get_property_value(dev,
@@ -213,9 +238,23 @@ zed_udev_monitor(void *arg)
if (type != NULL && type[0] != '\0' &&
strcmp(type, "disk") == 0 &&
part != NULL && part[0] != '\0') {
/* skip and wait for partition event */
udev_device_unref(dev);
continue;
const char *devname =
udev_device_get_property_value(dev, "DEVNAME");
if (strcmp(part, "atari") == 0) {
zed_log_msg(LOG_INFO,
"%s: %s is reporting an atari partition, "
"but we're going to assume it's a false "
"positive and still use it (issue #13497)",
__func__, devname);
} else {
zed_log_msg(LOG_INFO,
"%s: skip %s since it has a %s partition "
"already", __func__, devname, part);
/* skip and wait for partition event */
udev_device_unref(dev);
continue;
}
}
/*
@@ -227,6 +266,11 @@ zed_udev_monitor(void *arg)
sectors = udev_device_get_sysattr_value(dev, "size");
if (sectors != NULL &&
strtoull(sectors, NULL, 10) < MINIMUM_SECTORS) {
zed_log_msg(LOG_INFO,
"%s: %s sectors %s < %llu (minimum)",
__func__,
udev_device_get_property_value(dev, "DEVNAME"),
sectors, MINIMUM_SECTORS);
udev_device_unref(dev);
continue;
}
@@ -236,10 +280,19 @@ zed_udev_monitor(void *arg)
* device id string is required in the message schema
* for matching with vdevs. Preflight here for expected
* udev information.
*
* Special case:
* NVMe devices don't have ID_BUS set (at least on RHEL 7-8),
* but they are valid for autoreplace. Add a special case for
* them by searching for "/nvme/" in the udev DEVPATH:
*
* DEVPATH=/devices/pci0000:00/0000:00:1e.0/nvme/nvme2/nvme2n1
*/
bus = udev_device_get_property_value(dev, "ID_BUS");
uuid = udev_device_get_property_value(dev, "DM_UUID");
if (!is_zfs && (bus == NULL && uuid == NULL)) {
devpath = udev_device_get_devpath(dev);
if (!is_zfs && (bus == NULL && uuid == NULL &&
strstr(devpath, "/nvme/") == NULL)) {
zed_log_msg(LOG_INFO, "zed_udev_monitor: %s no devid "
"source", udev_device_get_devnode(dev));
udev_device_unref(dev);
@@ -284,7 +337,7 @@ zed_udev_monitor(void *arg)
if (strcmp(class, EC_DEV_STATUS) == 0 &&
udev_device_get_property_value(dev, "DM_UUID") &&
udev_device_get_property_value(dev, "MPATH_SBIN_PATH")) {
tmp = (char *)udev_device_get_devnode(dev);
tmp = udev_device_get_devnode(dev);
tmp2 = zfs_get_underlying_path(tmp);
if (tmp && tmp2 && (strcmp(tmp, tmp2) != 0)) {
/*
@@ -301,8 +354,7 @@ zed_udev_monitor(void *arg)
class = EC_DEV_ADD;
subclass = ESC_DISK;
} else {
tmp = (char *)
udev_device_get_property_value(dev,
tmp = udev_device_get_property_value(dev,
"DM_NR_VALID_PATHS");
/* treat as a multipath remove */
if (tmp != NULL && strcmp(tmp, "0") == 0) {
@@ -350,7 +402,7 @@ zed_udev_monitor(void *arg)
}
int
zed_disk_event_init()
zed_disk_event_init(void)
{
int fd, fflags;
@@ -379,13 +431,14 @@ zed_disk_event_init()
return (-1);
}
pthread_setname_np(g_mon_tid, "udev monitor");
zed_log_msg(LOG_INFO, "zed_disk_event_init");
return (0);
}
void
zed_disk_event_fini()
zed_disk_event_fini(void)
{
/* cancel monitor thread at recvmsg() */
(void) pthread_cancel(g_mon_tid);
@@ -403,13 +456,13 @@ zed_disk_event_fini()
#include "zed_disk_event.h"
int
zed_disk_event_init()
zed_disk_event_init(void)
{
return (0);
}
void
zed_disk_event_fini()
zed_disk_event_fini(void)
{
}
+128 -47
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -15,7 +15,7 @@
#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <libzfs.h> /* FIXME: Replace with libzfs_core. */
#include <libzfs_core.h>
#include <paths.h>
#include <stdarg.h>
#include <stdio.h>
@@ -28,37 +28,55 @@
#include "zed.h"
#include "zed_conf.h"
#include "zed_disk_event.h"
#include "zed_event.h"
#include "zed_exec.h"
#include "zed_file.h"
#include "zed_log.h"
#include "zed_strings.h"
#include "agents/zfs_agents.h"
#include <libzutil.h>
#define MAXBUF 4096
static int max_zevent_buf_len = 1 << 20;
/*
* Open the libzfs interface.
*/
void
int
zed_event_init(struct zed_conf *zcp)
{
if (!zcp)
zed_log_die("Failed zed_event_init: %s", strerror(EINVAL));
zcp->zfs_hdl = libzfs_init();
if (!zcp->zfs_hdl)
if (!zcp->zfs_hdl) {
if (zcp->do_idle)
return (-1);
zed_log_die("Failed to initialize libzfs");
}
zcp->zevent_fd = open(ZFS_DEV, O_RDWR);
if (zcp->zevent_fd < 0)
zcp->zevent_fd = open(ZFS_DEV, O_RDWR | O_CLOEXEC);
if (zcp->zevent_fd < 0) {
if (zcp->do_idle)
return (-1);
zed_log_die("Failed to open \"%s\": %s",
ZFS_DEV, strerror(errno));
}
zfs_agent_init(zcp->zfs_hdl);
if (zed_disk_event_init() != 0)
if (zed_disk_event_init() != 0) {
if (zcp->do_idle)
return (-1);
zed_log_die("Failed to initialize disk events");
}
if (zcp->max_zevent_buf_len != 0)
max_zevent_buf_len = zcp->max_zevent_buf_len;
return (0);
}
/*
@@ -84,6 +102,57 @@ zed_event_fini(struct zed_conf *zcp)
libzfs_fini(zcp->zfs_hdl);
zcp->zfs_hdl = NULL;
}
zed_exec_fini();
}
static void
_bump_event_queue_length(void)
{
int zzlm = -1, wr;
char qlen_buf[12] = {0}; /* parameter is int => max "-2147483647\n" */
long int qlen, orig_qlen;
zzlm = open("/sys/module/zfs/parameters/zfs_zevent_len_max", O_RDWR);
if (zzlm < 0)
goto done;
if (read(zzlm, qlen_buf, sizeof (qlen_buf)) < 0)
goto done;
qlen_buf[sizeof (qlen_buf) - 1] = '\0';
errno = 0;
orig_qlen = qlen = strtol(qlen_buf, NULL, 10);
if (errno == ERANGE)
goto done;
if (qlen <= 0)
qlen = 512; /* default zfs_zevent_len_max value */
else
qlen *= 2;
/*
* Don't consume all of kernel memory with event logs if something
* goes wrong.
*/
if (qlen > max_zevent_buf_len)
qlen = max_zevent_buf_len;
if (qlen == orig_qlen)
goto done;
wr = snprintf(qlen_buf, sizeof (qlen_buf), "%ld", qlen);
if (wr >= sizeof (qlen_buf)) {
wr = sizeof (qlen_buf) - 1;
zed_log_msg(LOG_WARNING, "Truncation in %s()", __func__);
}
if (pwrite(zzlm, qlen_buf, wr + 1, 0) < 0)
goto done;
zed_log_msg(LOG_WARNING, "Bumping queue length to %ld", qlen);
done:
if (zzlm > -1)
(void) close(zzlm);
}
/*
@@ -124,10 +193,7 @@ zed_event_seek(struct zed_conf *zcp, uint64_t saved_eid, int64_t saved_etime[])
if (n_dropped > 0) {
zed_log_msg(LOG_WARNING, "Missed %d events", n_dropped);
/*
* FIXME: Increase max size of event nvlist in
* /sys/module/zfs/parameters/zfs_zevent_len_max ?
*/
_bump_event_queue_length();
}
if (nvlist_lookup_uint64(nvl, "eid", &eid) != 0) {
zed_log_msg(LOG_WARNING, "Failed to lookup zevent eid");
@@ -199,7 +265,7 @@ _zed_event_value_is_hex(const char *name)
*
* All environment variables in [zsp] should be added through this function.
*/
static int
static __attribute__((format(printf, 5, 6))) int
_zed_event_add_var(uint64_t eid, zed_strings_t *zsp,
const char *prefix, const char *name, const char *fmt, ...)
{
@@ -547,7 +613,7 @@ _zed_event_add_string_array(uint64_t eid, zed_strings_t *zsp,
char buf[MAXBUF];
int buflen = sizeof (buf);
const char *name;
char **strp;
const char **strp;
uint_t nelem;
uint_t i;
char *p;
@@ -574,8 +640,6 @@ _zed_event_add_string_array(uint64_t eid, zed_strings_t *zsp,
* Convert the nvpair [nvp] to a string which is added to the environment
* of the child process.
* Return 0 on success, -1 on error.
*
* FIXME: Refactor with cmd/zpool/zpool_main.c:zpool_do_events_nvprint()?
*/
static void
_zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
@@ -589,7 +653,7 @@ _zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
uint16_t i16;
uint32_t i32;
uint64_t i64;
char *str;
const char *str;
assert(zsp != NULL);
assert(nvp != NULL);
@@ -674,23 +738,11 @@ _zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
_zed_event_add_var(eid, zsp, prefix, name,
"%llu", (u_longlong_t)i64);
break;
case DATA_TYPE_NVLIST:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_STRING:
(void) nvpair_value_string(nvp, &str);
_zed_event_add_var(eid, zsp, prefix, name,
"%s", (str ? str : "<NULL>"));
break;
case DATA_TYPE_BOOLEAN_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_BYTE_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_INT8_ARRAY:
_zed_event_add_int8_array(eid, zsp, prefix, nvp);
break;
@@ -718,9 +770,11 @@ _zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
case DATA_TYPE_STRING_ARRAY:
_zed_event_add_string_array(eid, zsp, prefix, nvp);
break;
case DATA_TYPE_NVLIST:
case DATA_TYPE_BOOLEAN_ARRAY:
case DATA_TYPE_BYTE_ARRAY:
case DATA_TYPE_NVLIST_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
_zed_event_add_var(eid, zsp, prefix, name, "_NOT_IMPLEMENTED_");
break;
default:
errno = EINVAL;
@@ -846,21 +900,21 @@ _zed_event_get_subclass(const char *class)
static void
_zed_event_add_time_strings(uint64_t eid, zed_strings_t *zsp, int64_t etime[])
{
struct tm *stp;
struct tm stp;
char buf[32];
assert(zsp != NULL);
assert(etime != NULL);
_zed_event_add_var(eid, zsp, ZEVENT_VAR_PREFIX, "TIME_SECS",
"%lld", (long long int) etime[0]);
"%" PRId64, etime[0]);
_zed_event_add_var(eid, zsp, ZEVENT_VAR_PREFIX, "TIME_NSECS",
"%lld", (long long int) etime[1]);
"%" PRId64, etime[1]);
if (!(stp = localtime((const time_t *) &etime[0]))) {
if (!localtime_r((const time_t *) &etime[0], &stp)) {
zed_log_msg(LOG_WARNING, "Failed to add %s%s for eid=%llu: %s",
ZEVENT_VAR_PREFIX, "TIME_STRING", eid, "localtime error");
} else if (!strftime(buf, sizeof (buf), "%Y-%m-%d %H:%M:%S%z", stp)) {
} else if (!strftime(buf, sizeof (buf), "%Y-%m-%d %H:%M:%S%z", &stp)) {
zed_log_msg(LOG_WARNING, "Failed to add %s%s for eid=%llu: %s",
ZEVENT_VAR_PREFIX, "TIME_STRING", eid, "strftime error");
} else {
@@ -869,10 +923,29 @@ _zed_event_add_time_strings(uint64_t eid, zed_strings_t *zsp, int64_t etime[])
}
}
static void
_zed_event_update_enc_sysfs_path(nvlist_t *nvl)
{
const char *vdev_path;
if (nvlist_lookup_string(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_PATH,
&vdev_path) != 0) {
return; /* some other kind of event, ignore it */
}
if (vdev_path == NULL) {
return;
}
update_vdev_config_dev_sysfs_path(nvl, vdev_path,
FM_EREPORT_PAYLOAD_ZFS_VDEV_ENC_SYSFS_PATH);
}
/*
* Service the next zevent, blocking until one is available.
*/
void
int
zed_event_service(struct zed_conf *zcp)
{
nvlist_t *nvl;
@@ -882,7 +955,7 @@ zed_event_service(struct zed_conf *zcp)
uint64_t eid;
int64_t *etime;
uint_t nelem;
char *class;
const char *class;
const char *subclass;
int rv;
@@ -890,20 +963,17 @@ zed_event_service(struct zed_conf *zcp)
errno = EINVAL;
zed_log_msg(LOG_ERR, "Failed to service zevent: %s",
strerror(errno));
return;
return (EINVAL);
}
rv = zpool_events_next(zcp->zfs_hdl, &nvl, &n_dropped, ZEVENT_NONE,
zcp->zevent_fd);
if ((rv != 0) || !nvl)
return;
return (errno);
if (n_dropped > 0) {
zed_log_msg(LOG_WARNING, "Missed %d events", n_dropped);
/*
* FIXME: Increase max size of event nvlist in
* /sys/module/zfs/parameters/zfs_zevent_len_max ?
*/
_bump_event_queue_length();
}
if (nvlist_lookup_uint64(nvl, "eid", &eid) != 0) {
zed_log_msg(LOG_WARNING, "Failed to lookup zevent eid");
@@ -919,6 +989,17 @@ zed_event_service(struct zed_conf *zcp)
zed_log_msg(LOG_WARNING,
"Failed to lookup zevent class (eid=%llu)", eid);
} else {
/*
* Special case: If we can dynamically detect an enclosure sysfs
* path, then use that value rather than the one stored in the
* vd->vdev_enc_sysfs_path. There have been rare cases where
* vd->vdev_enc_sysfs_path becomes outdated. However, there
* will be other times when we can not dynamically detect the
* sysfs path (like if a disk disappears) and have to rely on
* the old value for things like turning on the fault LED.
*/
_zed_event_update_enc_sysfs_path(nvl);
/* let internal modules see this event first */
zfs_agent_post_event(class, NULL, nvl);
@@ -941,12 +1022,12 @@ zed_event_service(struct zed_conf *zcp)
_zed_event_add_time_strings(eid, zsp, etime);
zed_exec_process(eid, class, subclass,
zcp->zedlet_dir, zcp->zedlets, zsp, zcp->zevent_fd);
zed_exec_process(eid, class, subclass, zcp, zsp);
zed_conf_write_state(zcp, eid, etime);
zed_strings_destroy(zsp);
}
nvlist_free(nvl);
return (0);
}
+5 -5
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -17,13 +17,13 @@
#include <stdint.h>
void zed_event_init(struct zed_conf *zcp);
int zed_event_init(struct zed_conf *zcp);
void zed_event_fini(struct zed_conf *zcp);
int zed_event_seek(struct zed_conf *zcp, uint64_t saved_eid,
int64_t saved_etime[]);
void zed_event_service(struct zed_conf *zcp);
int zed_event_service(struct zed_conf *zcp);
#endif /* !ZED_EVENT_H */
+205 -59
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -18,16 +18,55 @@
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>
#include <sys/avl.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>
#include "zed_file.h"
#include <pthread.h>
#include <signal.h>
#include "zed_exec.h"
#include "zed_log.h"
#include "zed_strings.h"
#define ZEVENT_FILENO 3
struct launched_process_node {
avl_node_t node;
pid_t pid;
uint64_t eid;
char *name;
};
static int
_launched_process_node_compare(const void *x1, const void *x2)
{
pid_t p1;
pid_t p2;
assert(x1 != NULL);
assert(x2 != NULL);
p1 = ((const struct launched_process_node *) x1)->pid;
p2 = ((const struct launched_process_node *) x2)->pid;
if (p1 < p2)
return (-1);
else if (p1 == p2)
return (0);
else
return (1);
}
static pthread_t _reap_children_tid = (pthread_t)-1;
static volatile boolean_t _reap_children_stop;
static avl_tree_t _launched_processes;
static pthread_mutex_t _launched_processes_lock = PTHREAD_MUTEX_INITIALIZER;
static int16_t _launched_processes_limit;
/*
* Create an environment string array for passing to execve() using the
* NAME=VALUE strings in container [zsp].
@@ -78,20 +117,26 @@ _zed_exec_create_env(zed_strings_t *zsp)
*/
static void
_zed_exec_fork_child(uint64_t eid, const char *dir, const char *prog,
char *env[], int zfd)
char *env[], int zfd, boolean_t in_foreground)
{
char path[PATH_MAX];
int n;
pid_t pid;
int fd;
pid_t wpid;
int status;
struct launched_process_node *node;
sigset_t mask;
struct timespec launch_timeout =
{ .tv_sec = 0, .tv_nsec = 200 * 1000 * 1000, };
assert(dir != NULL);
assert(prog != NULL);
assert(env != NULL);
assert(zfd >= 0);
while (__atomic_load_n(&_launched_processes_limit,
__ATOMIC_SEQ_CST) <= 0)
(void) nanosleep(&launch_timeout, NULL);
n = snprintf(path, sizeof (path), "%s/%s", dir, prog);
if ((n < 0) || (n >= sizeof (path))) {
zed_log_msg(LOG_WARNING,
@@ -99,100 +144,186 @@ _zed_exec_fork_child(uint64_t eid, const char *dir, const char *prog,
prog, eid, strerror(ENAMETOOLONG));
return;
}
(void) pthread_mutex_lock(&_launched_processes_lock);
pid = fork();
if (pid < 0) {
(void) pthread_mutex_unlock(&_launched_processes_lock);
zed_log_msg(LOG_WARNING,
"Failed to fork \"%s\" for eid=%llu: %s",
prog, eid, strerror(errno));
return;
} else if (pid == 0) {
(void) sigemptyset(&mask);
(void) sigprocmask(SIG_SETMASK, &mask, NULL);
(void) umask(022);
if ((fd = open("/dev/null", O_RDWR)) != -1) {
if (in_foreground && /* we're already devnulled if daemonised */
(fd = open("/dev/null", O_RDWR | O_CLOEXEC)) != -1) {
(void) dup2(fd, STDIN_FILENO);
(void) dup2(fd, STDOUT_FILENO);
(void) dup2(fd, STDERR_FILENO);
}
(void) dup2(zfd, ZEVENT_FILENO);
zed_file_close_from(ZEVENT_FILENO + 1);
execle(path, prog, NULL, env);
_exit(127);
}
/* parent process */
node = calloc(1, sizeof (*node));
if (node) {
node->pid = pid;
node->eid = eid;
node->name = strdup(prog);
if (node->name == NULL) {
perror("strdup");
exit(EXIT_FAILURE);
}
avl_add(&_launched_processes, node);
}
(void) pthread_mutex_unlock(&_launched_processes_lock);
__atomic_sub_fetch(&_launched_processes_limit, 1, __ATOMIC_SEQ_CST);
zed_log_msg(LOG_INFO, "Invoking \"%s\" eid=%llu pid=%d",
prog, eid, pid);
}
/* FIXME: Timeout rogue child processes with sigalarm? */
static void
_nop(int sig)
{
(void) sig;
}
/*
* Wait for child process using WNOHANG to limit
* the time spent waiting to 10 seconds (10,000ms).
*/
for (n = 0; n < 1000; n++) {
wpid = waitpid(pid, &status, WNOHANG);
if (wpid == (pid_t)-1) {
if (errno == EINTR)
continue;
zed_log_msg(LOG_WARNING,
"Failed to wait for \"%s\" eid=%llu pid=%d",
prog, eid, pid);
break;
} else if (wpid == 0) {
struct timespec t;
static void *
_reap_children(void *arg)
{
(void) arg;
struct launched_process_node node, *pnode;
pid_t pid;
int status;
struct rusage usage;
struct sigaction sa = {};
/* child still running */
t.tv_sec = 0;
t.tv_nsec = 10000000; /* 10ms */
(void) nanosleep(&t, NULL);
continue;
}
(void) sigfillset(&sa.sa_mask);
(void) sigdelset(&sa.sa_mask, SIGCHLD);
(void) pthread_sigmask(SIG_SETMASK, &sa.sa_mask, NULL);
if (WIFEXITED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d exit=%d",
prog, eid, pid, WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d sig=%d/%s",
prog, eid, pid, WTERMSIG(status),
strsignal(WTERMSIG(status)));
(void) sigemptyset(&sa.sa_mask);
sa.sa_handler = _nop;
sa.sa_flags = SA_NOCLDSTOP;
(void) sigaction(SIGCHLD, &sa, NULL);
for (_reap_children_stop = B_FALSE; !_reap_children_stop; ) {
(void) pthread_mutex_lock(&_launched_processes_lock);
pid = wait4(0, &status, WNOHANG, &usage);
if (pid == 0 || pid == (pid_t)-1) {
(void) pthread_mutex_unlock(&_launched_processes_lock);
if (pid == 0 || errno == ECHILD)
pause();
else if (errno != EINTR)
zed_log_msg(LOG_WARNING,
"Failed to wait for children: %s",
strerror(errno));
} else {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d status=0x%X",
prog, eid, (unsigned int) status);
memset(&node, 0, sizeof (node));
node.pid = pid;
pnode = avl_find(&_launched_processes, &node, NULL);
if (pnode) {
memcpy(&node, pnode, sizeof (node));
avl_remove(&_launched_processes, pnode);
free(pnode);
}
(void) pthread_mutex_unlock(&_launched_processes_lock);
__atomic_add_fetch(&_launched_processes_limit, 1,
__ATOMIC_SEQ_CST);
usage.ru_utime.tv_sec += usage.ru_stime.tv_sec;
usage.ru_utime.tv_usec += usage.ru_stime.tv_usec;
usage.ru_utime.tv_sec +=
usage.ru_utime.tv_usec / (1000 * 1000);
usage.ru_utime.tv_usec %= 1000 * 1000;
if (WIFEXITED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d "
"time=%llu.%06us exit=%d",
node.name, node.eid, pid,
(unsigned long long) usage.ru_utime.tv_sec,
(unsigned int) usage.ru_utime.tv_usec,
WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d "
"time=%llu.%06us sig=%d/%s",
node.name, node.eid, pid,
(unsigned long long) usage.ru_utime.tv_sec,
(unsigned int) usage.ru_utime.tv_usec,
WTERMSIG(status),
strsignal(WTERMSIG(status)));
} else {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d "
"time=%llu.%06us status=0x%X",
node.name, node.eid, pid,
(unsigned long long) usage.ru_utime.tv_sec,
(unsigned int) usage.ru_utime.tv_usec,
(unsigned int) status);
}
free(node.name);
}
break;
}
/*
* kill child process after 10 seconds
*/
if (wpid == 0) {
zed_log_msg(LOG_WARNING, "Killing hung \"%s\" pid=%d",
prog, pid);
(void) kill(pid, SIGKILL);
return (NULL);
}
void
zed_exec_fini(void)
{
struct launched_process_node *node;
void *ck = NULL;
if (_reap_children_tid == (pthread_t)-1)
return;
_reap_children_stop = B_TRUE;
(void) pthread_kill(_reap_children_tid, SIGCHLD);
(void) pthread_join(_reap_children_tid, NULL);
while ((node = avl_destroy_nodes(&_launched_processes, &ck)) != NULL) {
free(node->name);
free(node);
}
avl_destroy(&_launched_processes);
(void) pthread_mutex_destroy(&_launched_processes_lock);
(void) pthread_mutex_init(&_launched_processes_lock, NULL);
_reap_children_tid = (pthread_t)-1;
}
/*
* Process the event [eid] by synchronously invoking all zedlets with a
* matching class prefix.
*
* Each executable in [zedlets] from the directory [dir] is matched against
* the event's [class], [subclass], and the "all" class (which matches
* all events). Every zedlet with a matching class prefix is invoked.
* Each executable in [zcp->zedlets] from the directory [zcp->zedlet_dir]
* is matched against the event's [class], [subclass], and the "all" class
* (which matches all events).
* Every zedlet with a matching class prefix is invoked.
* The NAME=VALUE strings in [envs] will be passed to the zedlet as
* environment variables.
*
* The file descriptor [zfd] is the zevent_fd used to track the
* The file descriptor [zcp->zevent_fd] is the zevent_fd used to track the
* current cursor location within the zevent nvlist.
*
* Return 0 on success, -1 on error.
*/
int
zed_exec_process(uint64_t eid, const char *class, const char *subclass,
const char *dir, zed_strings_t *zedlets, zed_strings_t *envs, int zfd)
struct zed_conf *zcp, zed_strings_t *envs)
{
const char *class_strings[4];
const char *allclass = "all";
@@ -201,9 +332,22 @@ zed_exec_process(uint64_t eid, const char *class, const char *subclass,
char **e;
int n;
if (!dir || !zedlets || !envs || zfd < 0)
if (!zcp->zedlet_dir || !zcp->zedlets || !envs || zcp->zevent_fd < 0)
return (-1);
if (_reap_children_tid == (pthread_t)-1) {
_launched_processes_limit = zcp->max_jobs;
if (pthread_create(&_reap_children_tid, NULL,
_reap_children, NULL) != 0)
return (-1);
pthread_setname_np(_reap_children_tid, "reap ZEDLETs");
avl_create(&_launched_processes, _launched_process_node_compare,
sizeof (struct launched_process_node),
offsetof(struct launched_process_node, node));
}
csp = class_strings;
if (class)
@@ -219,11 +363,13 @@ zed_exec_process(uint64_t eid, const char *class, const char *subclass,
e = _zed_exec_create_env(envs);
for (z = zed_strings_first(zedlets); z; z = zed_strings_next(zedlets)) {
for (z = zed_strings_first(zcp->zedlets); z;
z = zed_strings_next(zcp->zedlets)) {
for (csp = class_strings; *csp; csp++) {
n = strlen(*csp);
if ((strncmp(z, *csp, n) == 0) && !isalpha(z[n]))
_zed_exec_fork_child(eid, dir, z, e, zfd);
_zed_exec_fork_child(eid, zcp->zedlet_dir,
z, e, zcp->zevent_fd, zcp->do_foreground);
}
}
free(e);
+8 -5
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -16,9 +16,12 @@
#define ZED_EXEC_H
#include <stdint.h>
#include "zed_strings.h"
#include "zed_conf.h"
void zed_exec_fini(void);
int zed_exec_process(uint64_t eid, const char *class, const char *subclass,
const char *dir, zed_strings_t *zedlets, zed_strings_t *envs,
int zevent_fd);
struct zed_conf *zcp, zed_strings_t *envs);
#endif /* !ZED_EXEC_H */
+24 -99
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -12,72 +12,17 @@
* You may not use this file except in compliance with the license.
*/
#include <dirent.h>
#include <errno.h>
#include <fcntl.h>
#include <limits.h>
#include <string.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include "zed_file.h"
#include "zed_log.h"
/*
* Read up to [n] bytes from [fd] into [buf].
* Return the number of bytes read, 0 on EOF, or -1 on error.
*/
ssize_t
zed_file_read_n(int fd, void *buf, size_t n)
{
unsigned char *p;
size_t n_left;
ssize_t n_read;
p = buf;
n_left = n;
while (n_left > 0) {
if ((n_read = read(fd, p, n_left)) < 0) {
if (errno == EINTR)
continue;
else
return (-1);
} else if (n_read == 0) {
break;
}
n_left -= n_read;
p += n_read;
}
return (n - n_left);
}
/*
* Write [n] bytes from [buf] out to [fd].
* Return the number of bytes written, or -1 on error.
*/
ssize_t
zed_file_write_n(int fd, void *buf, size_t n)
{
const unsigned char *p;
size_t n_left;
ssize_t n_written;
p = buf;
n_left = n;
while (n_left > 0) {
if ((n_written = write(fd, p, n_left)) < 0) {
if (errno == EINTR)
continue;
else
return (-1);
}
n_left -= n_written;
p += n_written;
}
return (n);
}
/*
* Set an exclusive advisory lock on the open file descriptor [fd].
* Return 0 on success, 1 if a conflicting lock is held by another process,
@@ -159,6 +104,13 @@ zed_file_is_locked(int fd)
return (lock.l_pid);
}
#if __APPLE__
#define PROC_SELF_FD "/dev/fd"
#else /* Linux-compatible layout */
#define PROC_SELF_FD "/proc/self/fd"
#endif
/*
* Close all open file descriptors greater than or equal to [lowfd].
* Any errors encountered while closing file descriptors are ignored.
@@ -166,51 +118,24 @@ zed_file_is_locked(int fd)
void
zed_file_close_from(int lowfd)
{
const int maxfd_def = 256;
int errno_bak;
struct rlimit rl;
int maxfd;
int errno_bak = errno;
int maxfd = 0;
int fd;
DIR *fddir;
struct dirent *fdent;
errno_bak = errno;
if (getrlimit(RLIMIT_NOFILE, &rl) < 0) {
maxfd = maxfd_def;
} else if (rl.rlim_max == RLIM_INFINITY) {
maxfd = maxfd_def;
if ((fddir = opendir(PROC_SELF_FD)) != NULL) {
while ((fdent = readdir(fddir)) != NULL) {
fd = atoi(fdent->d_name);
if (fd > maxfd && fd != dirfd(fddir))
maxfd = fd;
}
(void) closedir(fddir);
} else {
maxfd = rl.rlim_max;
maxfd = sysconf(_SC_OPEN_MAX);
}
for (fd = lowfd; fd < maxfd; fd++)
(void) close(fd);
errno = errno_bak;
}
/*
* Set the CLOEXEC flag on file descriptor [fd] so it will be automatically
* closed upon successful execution of one of the exec functions.
* Return 0 on success, or -1 on error.
*
* FIXME: No longer needed?
*/
int
zed_file_close_on_exec(int fd)
{
int flags;
if (fd < 0) {
errno = EBADF;
return (-1);
}
flags = fcntl(fd, F_GETFD);
if (flags == -1)
return (-1);
flags |= FD_CLOEXEC;
if (fcntl(fd, F_SETFD, flags) == -1)
return (-1);
return (0);
}
+3 -9
View File
@@ -1,9 +1,9 @@
/*
* This file is part of the ZFS Event Daemon (ZED)
* for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* This file is part of the ZFS Event Daemon (ZED).
*
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the ZoL git commit log for authoritative copyright attribution.
* Refer to the OpenZFS git commit log for authoritative copyright attribution.
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -18,10 +18,6 @@
#include <sys/types.h>
#include <unistd.h>
ssize_t zed_file_read_n(int fd, void *buf, size_t n);
ssize_t zed_file_write_n(int fd, void *buf, size_t n);
int zed_file_lock(int fd);
int zed_file_unlock(int fd);
@@ -30,6 +26,4 @@ pid_t zed_file_is_locked(int fd);
void zed_file_close_from(int fd);
int zed_file_close_on_exec(int fd);
#endif /* !ZED_FILE_H */

Some files were not shown because too many files have changed in this diff Show More