In case we have I/O and try to remove an L2ARC device a deadlock might
occur. arc_read()->zio_read()->zfs_blkptr_verify() waits for SCL_VDEV
to be dropped while holding the hash_lock. However, spa_l2cache_load()
holds SCL_ALL and waits for the hash_lock in l2arc_evict().
Fix this by moving zfs_blkptr_verify() to the top top arc_read() before
the hash_lock is taken. Verify the block pointer and return a checksum
error if damaged rather than halting the system, by using
BLK_VERIFY_LOG instead of BLK_VERIFY_HALT.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#12054
In zfs_znode_alloc we always hash inodes. If the
znode is unlinked, we do not need to hash it. This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes#9741Closes#11223Closes#11648Closes#12210
VFS_QUOTACTL(9) has been updated to allow each filesystem to indicate
whether it has changed the busy state of the mount. The filesystem
may still assume that its .vfs_quotactl entrypoint is always called
with the mount busied, but only needs to unbusy the mount (and clear
*mp_busy) if it does something that actually requires the mount to be
unbusied. It no longer needs to blindly copy-paste the UFS protocol
for calling vfs_unbusy(9) for the Q_QUOTAOFF and Q_QUOTAON commands.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jason Harmening <jason.harmening@gmail.com>
Closes#12052
In order for cppcheck to perform a proper analysis it needs to be
aware of how the sources are compiled (source files, include
paths/files, extra defines, etc). All the needed information is
available from the Makefiles and can be leveraged with a generic
cppcheck Makefile target. So let's add one.
Additional minor changes:
* Removing the cppcheck-suppressions.txt file. With cppcheck 2.3
and these changes it appears to no longer be needed. Some inline
suppressions were also removed since they appear not to be
needed. We can add them back if it turns out they're needed
for older versions of cppcheck.
* Added the ax_count_cpus m4 macro to detect at configure time how
many processors are available in order to run multiple cppcheck
jobs. This value is also now used as a replacement for nproc
when executing the kernel interface checks.
* "PHONY =" line moved in to the Rules.am file which is included
at the top of all Makefile.am's. This is just convenient becase
it allows us to use the += syntax to add phony targets.
* One upside of this integration worth mentioning is it now allows
`make cppcheck` to be run in any directory to check that subtree.
* For the moment, cppcheck is not run against the FreeBSD specific
kernel sources. The cppcheck-FreeBSD target will need to be
implemented and testing on FreeBSD to support this.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11508
If there is no scsi_debug module, then this test
must be skipped, in this case cleanup routine should
be prepared for absent pool.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes#11534
If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.
This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes#10593Closes#11682
Move zfs_get_data() in to platform-independent code. The only
platform-specific aspect of it is the way we release an inode
(Linux) / vnode_t (FreeBSD). I am not aware of a platform that
could be supported by ZFS that couldn't implement zfs_rele_async
itself. It's sibling zvol_get_data already is platform-independent.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes#10979
The zfs_holey() and zfs_access() functions can be made common
to both FreeBSD and Linux.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#11125
When a child process is killed waitpid() must be called on the
pid the reap the zombie process.
Update BUGS section to reflect reality by replacing "zedlets
aren't time limited with "zedlets can be interrupted".
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11769Closes#11798
Other (all?) Linux filesystems seem to return -EPERM instead of -EACCESS
when trying to set FS_APPEND_FL or FS_IMMUTABLE_FL without the
CAP_LINUX_IMMUTABLE capability. This was detected by generic/545 test
in the fstest suite.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Luis Henriques <henrix@camandro.org>
Closes#11791
zhold() wraps igrab() on Linux, and igrab() may fail when the inode
is in the process of being deleted. This means zhold() must only be
called when a reference exists and therefore it cannot be deleted.
This is the case for all existing consumers so add a VERIFY and a
comment explaining this requirement.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes#11704
For small objects the kernel's slab implementation is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.
This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).
To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to 16K for all architectures.
This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12152Closes#11429Closes#11574Closes#12150
If TX_WRITE is create on a file, and the file is later deleted and a new
directory is created on the same object id, it is possible that when
zil_commit happens, zfs_get_data will be called on the new directory.
This may result in panic as it tries to do range lock.
This patch fixes this issue by record the generation number during
zfs_log_write, so zfs_get_data can check if the object is valid.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes#10593Closes#11682
The additional iter advance is incorrect, as copy_from_iter() has
already done the right thing. This will result in the following
warning being printed to the console as of the 5.12 kernel.
Attempted to advance past end of bvec iter
This change should have been included with #11378 when a
similar change was made on the read side.
Suggested-by: @siebenmann
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Issue #11378Closes#12041Closes#12155
(cherry picked from commit 3f81aba766)
Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev>
Linux kernel commit 0f00b82e5413571ed225ddbccad6882d7ea60bc7 removes the
revalidate_disk() handler from struct block_device_operations. This
caused a regression, and this commit eliminates the call to it and the
assignment in the block_device_operations static handler assignment
code, when configure identifies that the kernel doesn't support that
API handler.
Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes#11967Closes#11977
(cherry picked from commit 48c7b0e444)
Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev>
Co-authored-by: Coleman Kane <ckane@colemankane.org>
Just like #12087, the set_acl signature changed with all the bolted-on
*userns parameters, which disabled set_acl usage, and caused #12076.
Turn zpl_set_acl into zpl_set_acl and zpl_set_acl_impl, and add a
new configure test for the new version.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12076Closes#12093
Linux changed the tmpfile() signature again in torvalds/linux@6521f89,
which in turn broke our HAVE_TMPFILE detection in configure.
Update that macro to include the new case, and change the signature of
zpl_tmpfile as appropriate.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes: #12060Closes: #12087
The BIO_MAX_PAGES macro is being retired in favor of a bio_max_segs()
function that implements the typical MIN(x,y) logic used throughout the
kernel for bounding the allocation, and also the new implementation is
intended to be signed-safe (which the former was not).
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes#11765
(cherry picked from commit ffd6978ef5)
Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev>
In Linux 5.12, the filesystem API was modified to support ipmapped
mounts by adding a "struct user_namespace *" parameter to a number
functions and VFS handlers. This change adds the needed autoconf
macros to detect the new interfaces and updates the code appropriately.
This change does not add support for idmapped mounts, instead it
preserves the existing behavior by passing the initial user namespace
where needed. A subsequent commit will be required to add support
for idmapped mounted.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes#11712
(cherry picked from commit e2a8296131)
Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev>
zp->z_lock is used in shared code for protecting projid and scantime.
We don't exercise these paths much if at all on FreeBSD, so have been
lucky enough not to have issues with the uninitialized locks so far.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes#12003
If zdb is not built with DEBUG mode, the ASSERT macros will be
eliminated.
This will leave vim defined, but not used (gcc warning) and
checkpoint spacemap validation loop will do nothing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes#11932
Both the zpool_initialize_import_export and checkpoint_discard_busy
test cases a known to occasionally fail. Add them to the list of
known possible failures and reference the appropriate issue on the
tracker.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11949
Receiving datasets while blanket inheriting properties like zfs
receive -x mountpoint can generally be desirable, e.g. to avoid
unexpected mounts on backup hosts.
Currently this will fail to receive zvols due to the mountpoint
property being applicable to filesystems only. This limitation
currently requires operators to special-case their minds and tools
for zvols.
This change gets rid of this limitation for inherit (-x) by
Spiting up the dataset type handling: Warnings for inheriting (-x),
errors for overriding (-o).
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes#11416Closes#11840Closes#11864
Introduce a specific valid function for avx512f+avx512bw (instead
of checking only for avx512f).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Romain Dolbeau <romain@dolbeau.org>
Closes#11937Closes#11938
The special_small_blocks section directed readers to zpool(8) for
documentation on special allocation classes, while they are actually
documented in zpoolconcepts(8).
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Daniel Stevenson <daniel@dstev.net>
Closes#11918
This deduplicates 2 sets of caches which use the same allocation size.
Memory savings fluctuate a lot, one sample result is FreeBSD running
"make buildworld" saving ~180MB RAM in reduced page count associated
with zio caches.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#11877
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes#11872Closes#11896
The awk command used by the checkbashisms target incorrectly
adds the escape character before the ! and # characters. This
results in the following warnings because these characters do not
need to be escaped.
awk: cmd. line:1: warning: regexp escape sequence
`\!' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence
`\#' is not a known regexp operator
Remove the unneeded escape character before ! and #.
Valid escape sequences are:
https://www.gnu.org/software/gawk/manual/html_node/Escape-Sequences.html
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11902
Do not (incorrectly, right instead left) pad health string itself,
it will be taken care of when printing property value below.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Yuri Pankov <yuripv@FreeBSD.org>
Closes#11899
zfs_crypto_load_key() only works on encryption roots,
and zfs mount -la would fail if it encounters a datasets that
is sorted before their encroots.
To trigger:
truncate -s 40G /tmp/test
dd if=/dev/urandom of=/tmp/k bs=128 count=1 status=none
zpool create -O encryption=on -O keylocation=file:///tmp/k \
-O keyformat=passphrase test /tmp/test
zfs create -o mountpoint=/a test/a
zfs create -o mountpoint=/b test/b
zfs umount test
zfs unload-key test
zfs mount -la
The final mount errored out with:
Key load error: Keys must be loaded for
encryption root of 'test/a' (test).
Key load error: Keys must be loaded for
encryption root of 'test/b' (test).
And only /test was mounted
This technically breaks the libzfs API, but the previous behavior was
decidedly a bug.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11870Closes#11875
It's been observed in the CI that the required 25% of obsolete bytes
in the mapping can be to high a threshold for this test resulting in
condensing never being triggered and a test failure. To prevent these
failures make the existing zfs_condense_indirect_obsolete_pct tuning
available so the obsolete percentage can be reduced from 25% to 5%
during this test.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11869
These were fd 3, 4, and 5 by the time zfs change-key hit
execute_key_fob()
glibc appends "e" to setmntent() mode, but musl's just returns fopen()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11866
This changes the password prompt for new encryption roots from
Enter passphrase:
Re-enter passphrase:
to
Enter new passphrase:
Re-enter new passphrase:
which makes more sense and is more consistent with "new passphrase"
now always meaning "come up with something" and plain "passphrase"
"remember that thing"
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11866
Kill the removal operation on every platform, not just Linux.
The test has been fixed and is now stable on FreeBSD.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11856
As described in #11854, zhack is occasionally segfaulting on FreeBSD.
Debugging this is proving to be tricky. To avoid false positives in
the CI add entries for the tests that use zhack in zts-report to
accept that they may occasionally fail on FreeBSD.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Issue #11854Closes#11855
Just as delay zevents can flood the zevent pipe when a vdev becomes
unresponsive, so do the deadman zevents.
Ratelimit deadman zevents according to the same tunable as for delay
zevents.
Enable deadman tests on FreeBSD and add a test for deadman event
ratelimiting.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11786
get_clones_string currently returns an empty string for filesystem
snapshots which have no clones. This breaks parsable `zfs get` output as
only three columns are output, instead of 4.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Fiddaman <github@m.fiddaman.uk>
Co-authored-by: matt <matt@fiddaman.net>
Closes#11837
The exact limitations on what features are supported when booting
vary considerably depending on the environment. In order to minimize
confusion avoid categorical statements which assume GRUB2 is being
used. The supported GRUB2 features are covered earlier in this man
page for easy reference.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11842
zpool list, which is the only user, would mistakenly try to parse the
empty string as the interval in this case:
$ zpool list "a"
cannot open 'a': no such pool
$ zpool list ""
interval cannot be zero
usage: <usage string follows>
which is now symmetric with zpool get:
$ zpool list ""
cannot open '': name must begin with a letter
Avoid breaking the "interval cannot be zero" string.
There simply isn't a need for this, and it's user-facing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11841Closes#11843
The pool_checkpoint tests may incorrectly fail because several of
them invoke zdb for an imported pool. In this scenario it's not
unexpected for zdb to fail if the pool is modified. To resolve
this these zdb checks are now done after the pool has been exported.
Additionally, the default cleanup functions assumed the pool would
be imported when they were run. If this was not the case they're
exit early and fail to cleanup all of the test state causing
subsequent tests to fail. Add a check to only destroy the pool
when it is imported.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11832
We have exclusive access to our zfsdev state object in this section
until it is invalidated by setting zs_minor to -1, so we can destroy
the state without taking a lock if we do the invalidation last, after
a member to ensure correct ordering.
While here, strengthen the assertions that zs_minor is valid when we
enter.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes#11751
Nothing bad happens if a prefix of your pool name matches a disk name.
This is a bit of a silly restriction at this point.
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes#11781Closes#11813
The lower bound for this scaling to too low and the upper bound is too
high. Use a fixed default length of 512 instead, which is a reasonable
value on any system.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11822
ratelimit_dropped isn't protected by a lock and is expected to
be updated atomically.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11822
Recently we've been running out of free space in the ubuntu 20.04
environment resulting in test failures. This appears to be caused
by a change in the default available free space and not because of
any change in OpenZFS. Try and avoid this failure by applying a
suggested workaround which removes some unnecessary files.
https://github.com/actions/virtual-environments/issues/2840
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11826
Commit 235a85657 introduced a regression in evaluation of POSIX modes
that require group DENY entries in the internal ZFS ACL. An example
of such a POSX mode is 007. When write_implies_delete_child is set,
then ACE_WRITE_DATA is added to `wanted_dirperms` in prior to calling
zfs_zaccess_common(). This occurs is zfs_zaccess_delete().
Unfortunately, when zfs_zaccess_aces_check hits this particular DENY
ACE, zfs_groupmember() is checked to determine whether access should be
denied, and since zfs_groupmember() always returns B_TRUE on Linux and
so this check is failed, resulting ultimately in EPERM being returned.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes#11760
The FreeBSD boot loader relies on the bootfs property and is capable
of booting from removed (indirect) vdevs.
Reviewed-by Eric van Gyzen
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes#11763
Don't handle (incorrectly) kmem_zalloc() failure. With KM_SLEEP,
will never return NULL.
Free the data allocated for non-virtual kstats when deleting the object.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11767
The man page was missing these two permissions.
Add the missing permissions to the man page.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jeremy Faulkner <gldisater@gldis.ca>
Closes#11727
Create a new section of tests to run with acltype=off.
For now the only test we have is for the DOS mode READONLY attribute on
FreeBSD.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11734
You can't use user_run to eval ksh functions defined in libtest unless
you include libtest in the user shell.
Fix xattr_003_neg by:
* include libtest in the user shell
* *then* run get_xattr
* assert this fails
* use variables for filenames so they don't change in the user's shell
* don't log the contents of /etc/passwd
* cleanup all byproducts
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11185
The current user_run often does not work as expected. Commands are run
in a different shell, with a different environment, and all output is
discarded.
Simplify user_run to retain the current environment, eliminate eval,
and feed the command string into ksh. Enhance the logging for
user_run so we can see out and err.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11185
Add parsing of the rewind options.
When I was upstreaming the change [1], I omitted the part where we
detect that the pool should be rewind. When the FreeBSD repo has
synced with the OpenZFS, this part of the code was removed.
[1] FreeBSD repo: 277f38abffc6a8160b5044128b5b2c620fbb970c
[2] OpenZFS repo: f2c027bd6a
External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254152
Originally reviewed by: tsoome, allanjude
Originally reviewed by: kevans (ok from high-level overview)
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes#11730
Resolve some oddities in zfsdev_close() which could result in a
panic and were not present in the equivalent function for Linux.
- Remove unused definition ZFS_MIN_MINOR
- FreeBSD: Simplify zfsdev state destruction
- Assert zs_minor is valid in zfsdev_close
- Make locking around zfsdev state match Linux
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11720
This deserializes otherwise non-contending operations.
The previous scheme of using 17 locks hashed by curthread runs into
conflicts very quickly. Check the pull request for sample results.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#11153
This will allow platforms to implement it as they see fit, in particular
in a different manner than rrm locks.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#11153
Importing a pool using the cachefile is ideal to reduce the time
required to import a pool. However, if the devices associated with
a pool in the cachefile have changed, then the import would fail.
This can easily be corrected by doing a normal import which would
then read the pool configuration from the labels.
The goal of this change is make importing using a cachefile more
resilient and auto-correcting. This is accomplished by having
the cachefile import logic automatically fallback to reading the
labels of the devices similar to a normal import. The main difference
between the fallback logic and a normal import is that the cachefile
import logic will only look at the device directories that were
originally used when the cachefile was populated. Additionally,
the fallback logic will always import by guid to ensure that only
the pools in the cachefile would be imported.
External-issue: DLPX-71980
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes#11716
A few deadman tunables ended up in the wrong sysctl node.
Move them to vfs.zfs.deadman.*
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11715
Add branch hints and constify the intermediate evaluations of
left/right params in VERIFY3*().
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adam Moss <c@yotes.com>
Closes#11708
Some .h files that were added were missed in this Makefile. Since
they are .h files, their being missing only resulted in them
disappeared from the dist archive.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes#11705
Our checkstyle doesn't work well on Ubuntu 20.04,
temporary pin it to 18.04.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes#11713
events_002 exercises the ZED, ensuring that it neither misses events,
nor reporting events twice.
On slow test hardware, some of the timeouts are insufficient to allow
the ZED to properly settle. Conversely, on fast hardware these same
timeouts are too long, unnecessarily slowing the test run.
Instead of using a fixed timeout, wait for the expected final event
before returning. Additionally, wait with a timeout for unexpected
events to avoid missing them if they show up late.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes#11703
zil_replaying(zil, tx) has the side-effect of informing the ZIL that an
entry has been replayed in the (still open) tx. The ZIL uses that
information to record the replay progress in the ZIL header when that
tx's txg syncs.
ZPL log entries are not idempotent and logically dependent and thus
calling zil_replaying() is necessary for correctness.
For ZVOLs the question of correctness is more nuanced: ZVOL logs only
TX_WRITE and TX_TRUNCATE, both of which are idempotent. Logical
dependencies between two records exist only if the write or discard
request had sync semantics or if the ranges affected by the records
overlap.
Thus, at a first glance, it would be correct to restart replay from
the beginning if we crash before replay completes. But this does not
address the following scenario:
Assume one log record per LWB.
The chain on disk is
HDR -> 1:W(1, "A") -> 2:W(1, "B") -> 3:W(2, "X") -> 4:W(3, "Z")
where N:W(O, C) represents log entry number N which is a TX_WRITE of C
to offset A.
We replay 1, 2 and 3 in one txg, sync that txg, then crash.
Bit flips corrupt 2, 3, and 4.
We come up again and restart replay from the beginning because
we did not call zil_replaying() during replay.
We replay 1 again, then interpret 2's invalid checksum as the end
of the ZIL chain and call replay done.
The replayed zvol content is "AX".
If we had called zil_replaying() the HDR would have pointed to 3
and our resumed replay would not have replayed anything because
3 was corrupted, resulting in zvol content "BX".
If 3 logically depends on 2 then the replay corrupted the ZVOL_OBJ's
contents.
This patch adds the zil_replaying() calls to the replay functions.
Since the callbacks in the replay function need the zilog_t* pointer
so that they can call zil_replaying() we open the ZIL while
replaying in zvol_create_minor(). We also verify that replay has
been done when on-demand-opening the ZIL on the first modifying
bio.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes#11667
* Restore original kern.corefile value after the test.
* Don't leave behind a frozen pool.
* Clean up leftover vdev files.
* Make zpool_002_pos and zpool_003_pos consistent in their handling of
core files while here.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11694
When a device which is actively trimming or initializing becomes
FAULTED, and therefore no longer writable, cancel the active
TRIM or initialization. When the device is merely taken offline
with `zpool offline` then stop the operation but do not cancel it.
When the device is brought back online the operation will be
resumed if possible.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Closes#11588
Several of the TRIM tests were based of the initialize tests and
then adapted for TRIM. The zpool_trim_start_and_cancel_pos.ksh
test was intended to be one such test but it was overlooked and
actually never adapted. Update it accordingly.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11649
Calling vdev_free() only requires the we acquire the spa config
SCL_STATE_ALL locks, not the SCL_ALL locks. In particular, we need
need to avoid taking the SCL_CONFIG lock (included in SCL_ALL) as a
writer since this can lead to a deadlock. The txg_sync_thread() may
block in spa_txg_history_init_io() when taking the SCL_CONFIG lock
as a reading when it detects there's a pending writer.
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11585
Docs for send and receive do not explain behavior when sending a
compressed stream then receiving on a host that overrides compression
with -o compress=value.
The data from the send stream is written as it was from the send is
the compressed form but the compression algorithm set on the receiver
is the overridden version which causes some confusion as to what
algorithm was actually used.
Updated man docs to clarify behavior
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed By: Allan Jude <allanjude@freebsd.org>
Signed-off-by: manfromafar <manfromafar@outlook.com>
Closes#11690
ZFS_READONLY represents the "DOS R/O" attribute.
When that flag is set, we should behave as if write access
were not granted by anything in the ACL. In particular:
We _must_ allow writes after opening the file r/w, then
setting the DOS R/O attribute, and writing some more.
(Similar to how you can write after fchmod(fd, 0444).)
Restore these semantics which were lost on FreeBSD when refactoring
zfs_write. To my knowledge Linux does not actually expose this flag,
but we'll need it to eventually so I've added the supporting checks.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11693
For some reason cppcheck 1.90 is generating an invalidSyntax warning
when the BF64_SET macro is used in the zstream source. The same
warning is not reported by cppcheck 2.3, nor is their any evident
problem with the expanded macro. This appears to be an issue with
this version of cppcheck. This commit annotates the source to suppress
the warning.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11700
When populating a ZIL destination buffer ensure it is always
zeroed before its contents are constructed.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11687
Bring the output of the removal status in line with the other
"fields" that zpool status outputs, and thus allows an parser to
easier detect this as continuation of the 'remove:' output.
Before:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
776K memory used for removed device mappings
Now:
remove: Removal of vdev 0 copied 282G in 0h9m, completed on [...]
776K memory used for removed device mappings
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Closes#11674
The function has three similar pieces of code: for read-behind pages,
requested pages and read-ahead pages. All three pieces had an
assert to ensure that the page is not mapped. Later the assert was
relaxed to require that the page is not mapped for writing. But that
was done in two places out of three. This change fixes the third piece,
read-ahead.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes#11654
The bio_*_acct functions became GPL exports, which causes the
kernel modules to refuse to compile. This replaces code with
alternate function calls to the disk_*_io_acct interfaces, which
are not GPL exports. This change was added in kernel commit
99dfc43ecbf67f12a06512918aaba61d55863efc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes#11639
The struct bio member bi_disk was moved underneath a new member named
bi_bdev. So all attempts to reference bio->bi_disk need to now become
bio->bi_bdev->bd_disk.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes#11639
On Linux increase the maximum allowed size of the src nvlist which
can be passed to the /dev/zfs ioctl. Originally, this was set
to a maximum of KMALLOC_MAX_SIZE (4M) because it was kmalloc'd.
Since that time it's been converted to a vmalloc so that's no
longer a hard limit, and it's desirable for `zfs send/recv` to
allow larger nvlists so more snapshots can be sent at once.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6572Closes#11638
The behavior of a NULL fromsnap was inadvertently changed for a doall
send when the send/recv logic in libzfs was updated. Restore the
previous behavior by correcting send_iterate_snap() to include all
the snapshots in the nvlist for this case.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cedric Maunoury <cedric.maunoury@gmail.com>
Closes#11608
This prevents a panic after a SLOG add/removal on the root pool followed
by a zpool scrub.
When a SLOG is removed, a hole takes its place - the vdev_ops for a hole
is vdev_hole_ops, which defines the handler functions of vdev_op_hold
and vdev_op_rele as NULL.
This bug has been reported in illumos and FreeBSD, a different trigger
in the FreeBSD report though.
Credit for this patch goes to Patrick Mooney <pmooney@pfmooney.com>
Obtained from: illumos-gate commit: c65bd18728f34725
External-issue: https://www.illumos.org/issues/12981
External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252396
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.fx907@gmail.com>
Closes#11623
Without this patch I get the error
Setting global variables is only supported on little-endian systems
when using `zdb -o` on my amd64 machine.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes#11602
First, the crypto request completion handler contains a bug in that it
fails to reset fs_done correctly after the request is completed. This
is only a problem for asynchronous drivers. Second, some hardware
drivers have input constraints which ZFS does not satisfy. For
instance, ccp(4) apparently requires the AAD length for AES-GCM to be a
multiple of the cipher block size, and with qat(4) the AES-GCM AAD
length may not be longer than 240 bytes. FreeBSD's generic crypto
framework doesn't have a mechanism to automatically fall back to a
software implementation if a hardware driver cannot process a request,
and ZFS does not tolerate such errors.
The plan is to implement such a fallback mechanism, but with FreeBSD
13.0 approaching we should simply disable the use hardware drivers for
now.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#11612
That happens because of an off-by-one mistake.
share_mount_one_cb() calls report_mount_progress(current=sm_done) after
having incremented sm_done by one. Then report_mount_progress()
increments the parameter again. It appears that that logic became
obsolete after commit a10d50f999, parallel zfs mount.
On FreeBSD I observe that zfs mount -a -v prints, for example,
(null): (201/248)
That happens because set_progress_header() is never called.
With this change the output becomes correct:
Mounting ZFS filesystems: (209/248)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andriy Gapon <avg@FreeBSD.org>
Closes#11607
There are two issues that don't allow ZFS to be compiled using uClibc.
`backtrace()`, and `program_invocation_short_name` as a `const`.
This patch adds uClibc to the conditionals in the same way there are
already for Glibc for `backtrace()`; and removes the external param
`program_invocation_short_name` because its only used here for the
whole project.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes#11600
Increase the Linux-Maximum version in the META file to 5.11.
All of the required compatibility patches have been merged.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11586
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between
Linux and BSD. After a successful write, the suid/sgid bits are reset,
and the mode to be written is stored in newmode. On Linux, this was
propagated to both the in-memory inode and znode, which is then updated
with sa_update.
3d40b65 accidentally removed the initialization of newmode, which
happened to occur on the same line as the inode update (which has been
moved out of the function).
The uninitialized newmode can be saved to disk, leading to a crash on
stat() of that file, in addition to a merely incorrect file mode.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes#11474Closes#11576
When all pools are exported ZFS will generate an empty cache file.
This will cause the import service to fail, which is sub-optimal,
since this means that dracut fails, and it necessary to run
`zpool import -a` to boot, delete the file, and regenerate+reinstall
the initrd.
This resolves the issue by treating an zero-length cache files the
same as a missing cache file. This aligns the behavior with that
of the `zpool` command itself.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11568
zfs-load-key.sh is called by the dracut-pre-mount.service unit which has
no explicit 'After' dependency on zfs-import.target. That way it can be
that the pool has not yet been imported and the zfs-load-key.sh finishes
without ever seeing the relevant pool.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Lorenz Hüdepohl <dev@stellardeath.org>
Closes#11500
Clarify how to include snapshots in the `zpool list` output by
referencing the full name of the `listsnapshots` pool property,
and the `zpool list -t snapshot` option.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11562Closes#11565
If we do not write any buffers to the cache device and the evict hand
has not advanced do not update the cache device header.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#11522Closes#11537
There is a race condition in zfs_zrele_async when we are checking if
we would be the one to evict an inode. This can lead to a txg sync
deadlock.
Instead of calling into iput directly, we attempt to perform the atomic
decrement ourselves, unless that would set the i_count value to zero.
In that case, we dispatch a call to iput to run later, to prevent a
deadlock from occurring.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#11527Closes#11530
It was observed that vdev_id exists silently when
the $CONFIG file is missing.
This patch adds error message in case vdev_id is
called without default $CONFIG or '-c'. This makes
end user observe the exit message more easily.
Before Patch:
~~~~~~~~~~~~~
$ ./cmd/vdev_id/vdev_id
$
After Patch:
~~~~~~~~~~~~
$ ./cmd/vdev_id/vdev_id
Error: Config file "/etc/zfs/vdev_id.conf" not found
$
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes#11498
Fix two minor errors reported by cppcheck:
In module/zfs/abd.c (abd_get_offset_impl), add non-NULL
assertion to prevent NULL dereference warning.
In module/zfs/arc.c (l2arc_write_buffers), change 'try'
variable to 'pass' to avoid C++ reserved word.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes#11507
Follow up for commit 624222a, value asserted <= SPA_OLD_MAXBLOCKSIZE
instead of SPA_MAXBLOCKSIZE as it should be after the previous change.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes#11501
zgenhostid(8) is used to modify or create /etc/hostid. This
administrative tool is currently installed to bindir. System utilities
are typically placed in sbin.
Modify the installation directory for zgenhostid. Additionally, track
this change in its use in dracut and the rpm installation.
Authored-by: наб <nabijaczleweli@nabijaczleweli.xyz>
Authored-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes#11485
Prior to util-linux 2.36.2, if a file or directory in the
current working directory was named 'dataset' then mount(8)
would prepend the current working directory to the dataset.
Eventually, we should be able to drop this workaround.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes#11295Closes#11462
As described in #11445, the kernel interface kernel_{read,write} no
longer act on special devices. In the ZTS, zfs send and receive are
tested by piping to these devices, leading to spurious failures (for
positive tests) and may mask errors (for negative tests).
Until a more permanent mechanism to address this deficiency is
developed, clean up the output from the ZTS by avoiding directly piping
to or from /dev/null and /dev/zero.
For /dev/zero input, simply use a pipe: `cat </dev/zero |` .
However, for /dev/null output, the shell semantics for pipe failures
means that zfs send error codes will be masked by the successful
`| cat >/dev/null` command execution. In that case, use a temporary
file under $TEST_BASE_DIR for output in favor.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes#11478
The zfs_rollback_001 test modifies files in a temporary, test dataset
repeatedly. Before each iteration, any preexisting dataset is removed,
after unmounted with umount -f, if necessary.
Add a short delay after the forced unmount, avoiding a race that can
prevent zfs destroy from succeeding, leading to a test failure.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes#11451
If the system is very low on memory (specifically,
`arc_free_memory() < arc_sys_free/2`, i.e. less than 1/16th of RAM
free), `arc_evict_state_impl()` will defer wakups. In this case, the
arc_evict_waiter_t's remain on the list, even though `arc_evict_count`
has been incremented past their `aew_count`.
The problem is that `arc_wait_for_eviction()` assumes that if there are
waiters on the list, the count they are waiting for has not yet been
reached. However, the deferred wakeups may violate this, causing
`ASSERT(last->aew_count > arc_evict_count)` to fail.
This commit resolves the issue by having new waiters use the greater of
`arc_evict_count` and the last `aew_count`.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#11285Closes#11397
zfsdev_close sets zs_minor to -1 to avoid duplicate calls to
destroy. This doesn't mix well with the current u_int used.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#11437
As part of commit 1c2358c1 the custom uio_prefaultpages() code
was removed in favor of using the generic kernel provided
iov_iter_fault_in_readable() interface. Unfortunately, it
turns out that up until the Linux 4.7 kernel the function would
only ever fault in the first iovec of the iov_iter. The result
being uiomove_iov() may hang waiting for the page.
This commit effectively restores the custom uio_prefaultpages()
pages code for Linux 4.9 and earlier kernels which contain the
troublesome version of iov_iter_fault_in_readable().
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11463Closes#11484
`zpool create -n` fails to list cache and spare vdevs.
`zpool add -n` fails to list spare devices.
`zpool split -n` fails to list `special` and `dedup` labels.
`zpool add -n` and `zpool split -n` shouldn't list hole devices.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes#11122Closes#11167
Several m4 macros have been retired in autoconf 2.70. Update the
the build system to use the new macros provided to replace them.
* Replaced AC_HELP_STRING with AS_HELP_STRING.
* Replaced AC_TRY_COMPILE with AC_COMPILE_IFELSE/AC_LANG_PROGRAM.
* Replaced AC_CANONICAL_SYSTEM with AC_CANONICAL_TARGET
* Replaced AC_PROG_LIBTOOL with LT_INIT
* $CPP is not defined in ZFS_AC_KERNEL and really shouldn't be
directly used like this. Replace it with an $AWK command
to extract the kernel source version.
Reviewed-by: Eli Schwartz <eschwartz@archlinux.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11413Closes#11419
if pool root is not mounted, then zpool umount in next test will leave
dataset mountpoint directory around and next zfs mount -a will fail
with error: cannot mount '/testpool': directory is not empty
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes#11417
Virtuozzo 7 kernels starting 3.10.0-1127.18.2.vz7.163.46
have the following configuration:
* no HAVE_VFS_RW_ITERATE
* HAVE_VFS_DIRECT_IO_ITER_RW_OFFSET
=> let's add implementation of zpl_direct_IO() via
zpl_aio_{read,write}() in this case.
https://bugs.openvz.org/browse/OVZ-7243
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Closes#11410Closes#11411
In `zpool_find_config()`, the `pools` nvlist is leaked. Part of it (a
sub-nvlist) is returned in `*configp`, but the callers also leak that.
Additionally, in `zdb.c:main()`, the `searchdirs` is leaked.
The leaks were detected by ASAN (`configure --enable-asan`).
This commit resolves the leaks.
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#11396
Build error on illumos with gcc 10 did reveal:
In function 'dmu_objset_refresh_ownership':
../../common/fs/zfs/dmu_objset.c:857:25: error: implicit conversion
from 'boolean_t' to 'ds_hold_flags_t' {aka 'enum ds_hold_flags'}
[-Werror=enum-conversion]
857 | dsl_dataset_disown(ds, decrypt, tag);
| ^~~~~~~
cc1: all warnings being treated as errors
libzfs_input_check.c: In function 'zfs_ioc_input_tests':
libzfs_input_check.c:754:28: error: implicit conversion from
'enum dmu_objset_type' to 'enum lzc_dataset_type'
[-Werror=enum-conversion]
754 | err = lzc_create(dataset, DMU_OST_ZFS, NULL, NULL, 0);
| ^~~~~~~~~~~
cc1: all warnings being treated as errors
The same issue is present in openzfs, and also the same issue about
ds_hold_flags_t, which currently defines exactly one valid value.
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes#11406
As of 5.11 the blk_register_region() and blk_unregister_region()
functions have been retired. This isn't a problem since add_disk()
has implicitly allocated minor numbers for a very long time.
Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11387Closes#11390
Both revalidate_disk_size() and revalidate_disk() have been removed.
Functionally this isn't a problem because we only relied on these
functions to call zvol_revalidate_disk() for us and to perform any
additional handling which might be needed for that kernel version.
When neither are available we know there's no additional handling
needed and we can directly call zvol_revalidate_disk().
Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11387Closes#11390
The bd_contains member was removed from the block_device structure.
Callers needing to determine if a vdev is a whole block device should
use the new bdev_whole() wrapper. For older kernels we provide our
own bdev_whole() wrapper which relies on bd_contains for compatibility.
Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11387Closes#11390
The generic IO accounting functions have been removed in favor of the
bio_start_io_acct() and bio_end_io_acct() functions which provide a
better interface. These new functions were introduced in the 5.8
kernels but it wasn't until the 5.11 kernel that the previous generic
IO accounting interfaces were removed.
This commit updates the blk_generic_*_io_acct() wrappers to provide
and interface similar to the updated kernel interface. It's slightly
different because for older kernels we need to pass the request queue
as well as the bio.
Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11387Closes#11390
The lookup_bdev() function has been updated to require a dev_t
be passed as the second argument. This is actually pretty nice
since the major number stored in the dev_t was the only part we
were interested in. This allows to us avoid handling the bdev
entirely. The vdev_lookup_bdev() wrapper was updated to emulate
the behavior of the new lookup_bdev() for all supported kernels.
Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11387Closes#11390
Update the ZFS_LINUX_TEST_PROGRAM macro to always set the module
license. As of the 5.11 kernel not setting a license has been
converted from a warning to an error.
Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11387Closes#11390
Increase the Linux-Maximum version in the META file to 5.10.
All of the required compatibility patches have been merged.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11391
Before this patch, dracut wouldn't find zfs.ko for inclusion in
initramfs. This was caused by the packages installing in to
/lib/modules instead of /usr/lib/modules. Correcting this allows
dracut to do the right thing, even without
# /etc/dracut.conf
add_drivers+=" zfs "
Notably, rpm/redhat/zfs-kmod.spec.in does not contain the definition of
the `prefix` macro that this commit removes in the generic kmod spec.
And https://rpmfusion.org/Packaging/KernelModules/Kmods2 does not
mention `prefix` at all.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes#11381
After porting the fix for https://github.com/openzfs/zfs/issues/5295
over to illumos, we started hitting an assertion failure when running
the testsuite:
assertion failed: rc->rc_count == number, file: .../refcount.c
and the unexpected hold has this stack:
dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f
The simplest reproducer for this in illumos is
zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool
which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andy Fiddaman <andy@omnios.org>
Closes#11368
The CentOS stream 4.18.0-257 kernel appears to have backported
the Linux 5.9 change to make_request_fn and the associated API.
To maintain weak modules compatibility the original symbol was
retained and the new interface blk_alloc_queue_rh() was added.
Unfortunately, blk_alloc_queue() was replaced in the blkdev.h
header by blk_alloc_queue_bh() so there doesn't seem to be a way
to build new kmods against the old interfces. Even though they
appear to still be available for weak module binding.
To accommodate this a configure check is added for the new _rh()
variant of the function and used if available. If compatibility
code gets added to the kernel for the original blk_alloc_queue()
interface this should be fine. OpenZFS will simply continue to
prefer the new interface and only fallback to blk_alloc_queue()
when blk_alloc_queue_rh() isn't available.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11374
Commit 59b68723 added a configure check for 5.10, which removed
revalidate_disk(), and conditionally replaced it's usage with a call to
the new revalidate_disk_size() function. However, the old function also
invoked the device's registered callback, in our case
zvol_revalidate_disk(). This commit adds a call to zvol_revalidate_disk()
in zvol_update_volsize() to make sure the code path stays the same.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Michael D Labriola <michael.d.labriola@gmail.com>
Closes#11358
Commit 1c2358c12 restructured this code and introduced a warning
about the variable maybe not being initialized. This cannot happen
with the updated code but we should initialize the variable anyway
to silence the warning.
zpl_file.c: In function ‘zpl_iter_write’:
zpl_file.c:324:9: warning: ‘count’ may be used uninitialized
in this function [-Wmaybe-uninitialized]
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11373
There's no need to call iov_iter_advance() in zpl_iter_read().
This was preserved from the previous code where it wasn't needed
but also didn't cause any problems. Now that the iter functions
also handle pipes that's no longer the case. When fully reading a
pipe buffer iov_iter_advance() may results in the pipe buf release
function being called which will not be registered resulting in
a NULL dereference.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11375Closes#11378
As of the 5.10 kernel the generic splice compatibility code has been
removed. All filesystems are now responsible for registering a
->splice_read and ->splice_write callback to support this operation.
The good news is the VFS provided generic_file_splice_read() and
iter_file_splice_write() callbacks can be used provided the ->iter_read
and ->iter_write callback support pipes. However, this is currently
not the case and only iovecs and bvecs (not pipes) are ever attached
to the uio structure.
This commit changes that by allowing full iov_iter structures to be
attached to uios. Ever since the 4.9 kernel the iov_iter structure
has supported iovecs, kvecs, bvevs, and pipes so it's desirable to
pass the entire thing when possible. In conjunction with this the
uio helper functions (i.e uiomove(), uiocopy(), etc) have been
updated to understand the new UIO_ITER type.
Note that using the kernel provided uio_iter interfaces allowed the
existing Linux specific uio handling code to be simplified. When
there's no longer a need to support kernel's older than 4.9, then
it will be possible to remove the iovec and bvec members from the
uio structure and always use a uio_iter. Until then we need to
maintain all of the existing types for older kernels.
Some additional refactoring and cleanup was included in this change:
- Added checks to configure to detect available iov_iter interfaces.
Some are available all the way back to the 3.10 kernel and are used
when available. In particular, uio_prefaultpages() now always uses
iov_iter_fault_in_readable() which is available for all supported
kernels.
- The unused UIO_USERISPACE type has been removed. It is no longer
needed now that the uio_seg enum is platform specific.
- Moved zfs_uio.c from the zcommon.ko module to the Linux specific
platform code for the zfs.ko module. This gets it out of libzfs
where it was never needed and keeps this Linux specific code out
of the common sources.
- Removed unnecessary O_APPEND handling from zfs_iter_write(), this
is redundant and O_APPEND is already handled in zfs_write();
NOTE: Cleanly applying this kernel compatibility change required
applying the following commits. This makes the change larger than
it absolutely needs to be, but the resulting code matches what's in
the branch branch. This is both more tested and makes it easier to
apply any future backports in this area.
7cf4cd824 Remove incorrect assertion
783be694f Reduce confusion in zfs_write
af5626ac2 Return EFAULT at the end of zfs_write() when set
cc1f85be8 Simplify offset and length limit in zfs_write
9585538d0 Const some unchanging variables in zfs_write
86e74dc16 Remove redundant oid parameter to update_pages
b3d723fb0 Factor uid, gid, and projid out of loop in zfs_write
3d40b6554 Share zfs_fsync, zfs_read, zfs_write, et al between Linux and FreeBSD
Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11351
Commit 85703f6 added a new ASSERT to zfs_write() as part of the
cleanup which isn't correct in the case where multiple processes
are concurrently extending a file. The `zp->z_size` is updated
atomically while holding a range lock on only a portion of the
file. Therefore, it's possible for the file size to increase
after a same check is performed earlier in the loop causing this
ASSERT to fail. The code itself handles this case correctly so
only the invalid ASSERT needs to be removed.
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11235
Is this block when abuf != NULL ever reached? Yes, it is.
Add asserts and comments to prove that when we get here, we have a full
block write at an aligned offset extending past EOF.
Simplify by removing the check that tx_bytes == max_blksz, since we can
assert that it is always true.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11191
FreeBSD's VFS expects EFAULT from zfs_write() if we didn't complete
the full write so it can retry the operation. Add some missing
SET_ERRORs in zfs_write().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11193
The zfs_fsync, zfs_read, and zfs_write function are almost identical
between Linux and FreeBSD. With a little refactoring they can be
moved to the common code which is what is done by this commit.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#11078
Consider the test to be a success as long as the initializing pattern
is found at least once per metaslab. This indicates that at least
part of the free space was initialized. Ideally we'd check that the
pattern was written to all free space but that's much trickier so this
check is a reasonable compromise.
Using a here-string to feed the loop in this test causes an empty
string to still trigger the loop so we miss the `spacemaps=0` case.
Pipe into the loop instead.
While here, we can use `zpool wait -t initialize $TESTPOOL` to wait for
the pool to initialize.
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11365
The space in special devices is not included in spa_dspace (or
dsl_pool_adjustedsize(), or the zfs `available` property). Therefore
there is always at least as much free space in the normal class, as
there is allocated in the special class(es). And therefore, there is
always enough free space to remove a special device.
However, the checks for free space when removing special devices did not
take this into account. This commit corrects that.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#11329
Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes#9891Closes#11128Closes#11242Closes#11335
Avoid a bug with gcc's -Wreturn-local-addr warning with some
obfuscation. In buggy versions of gcc, if a return value is an
expression that involves the address of a local variable, and even if
that address is legally converted to a non-pointer type, a warning may
be emitted and the value of the address may be replaced with zero.
Howerver, buggy versions don't emit the warning or replace the value
when simply returning a local variable of non-pointer type.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90737
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes#11337
Building the spa module for i386 caused gcc to emit
-Wint-to-pointer-cast "cast to pointer from integer of different size"
because spa.spa_did was uint64_t but pthread_join (via thread_join in
spa_deactivate) takes a pointer (32-bit on i386). Define spa_did to be
pointer-size instead. For now spa_did is in fact never non-zero and the
thread_join could instead be ifdef'd out, but changing the size of
spa_did may be more useful for the future.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes#11336
When removing and subsequently reattaching a vdev, CKSUM errors may
occur as vdev_indirect_read_all() reads from all children of a mirror
in case of a resilver.
Fix this by checking whether a child is missing the data and setting a
flag (ic_error) which is then checked in vdev_indirect_repair() and
suppresses incrementing the checksum counter.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#11277
Some tunables shown by arc_summary3 have string values that may exceed
the normal line length, leaving a negative offset between the name and
value fields. The negative space is of course not valid and Python
rightly barfs up an exception traceback.
Handle an overflowing value field width by ignoring the line length
and separating the name from the value by a single space instead.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11270
There is a tunable to select the fletcher 4 checksum implementation on
Linux but it was not present in FreeBSD.
Implement the sysctl handler for FreeBSD and use ZFS_MODULE_PARAM_CALL
to provide the tunable on both platforms.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11270
In the redaction list traversal code, there is a bug in the binary search
logic when looking for the resume point. Maxbufid can be decremented to -1,
causing us to read the last possible block of the object instead of the one we
wanted. This can cause incorrect resume behavior, or possibly even a hang in
some cases. In addition, when examining non-last blocks, we can treat the
block as being the same size as the last block, causing us to miss entries in
the redaction list when determining where to resume. Finally, we were ignoring
the case where the resume point was found in the buffer being searched, and
resuming from minbufid. All these issues have been corrected, and the code has
been significantly simplified to make future issues less likely.
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#11297
vfs.zfs.arc_no_grow_shift has an invalid type (15) and this causes
py-sysctl to format it as a bytearray when it should be an integer.
"U" is not a valid format, it should be "I" and the type should match
the variable type, int. We can return EINVAL if the value is set below
zero.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11318
py-sysctl now includes the CTLTYPE_NODE type nodes in the list returned
by sysctl.filter() on FreeBSD head. It also provides descriptions now.
Eliminate the subprocess call to get descriptions, and filter out the
nodes so we only deal with values.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11318
Resolve an uninitialized variable warning when compiling.
In function ‘zfs_domount’:
warning: ‘root_inode’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
sb->s_root = d_make_root(root_inode);
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11306
There has been a panic affecting some system configurations where the
thread FPU context is disturbed during the fletcher 4 benchmarks,
leading to a panic at boot.
module_init() registers zcommon_init to run in the last subsystem
(SI_SUB_LAST). Running it as soon as interrupts have been configured
(SI_SUB_INT_CONFIG_HOOKS) makes sure we have finished the benchmarks
before we start doing other things.
While it's not clear *how* the FPU context was being disturbed, this
does seem to avoid it.
Add a module_init_early() macro to run zcommon_init() at this earlier
point on FreeBSD. On Linux this is defined as module_init().
Authored by: Konstantin Belousov <kib@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11302
Tracking down an error message with the errno value can be difficult,
using strerror makes the error message clearer.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes#11303
The fnvlist_lookup_boolean_value() function should not be used
to check the force argument since it's optional. It may not be
provided or may have been created with the wrong flags.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11281Closes#11284
Run zfs-tests with sanity.run for brief results. Timeouts
are rare, so minimize false positives by increasing the
default from 60 to 180 seconds.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes#11304
During module load time all of the available fetcher4 and raidz
implementations are benchmarked for a fixed amount of time to
determine the fastest available. Manual testing has shown that this
time can be significantly reduced with negligible effect on the final
results.
This commit changes the benchmark time to 1ms which can reduce the
module load time by over a second on x86_64. On an x86_64 system
with sse3, ssse3, and avx2 instructions the benchmark times are:
Fletcher4 603ms -> 15ms
RAIDZ 1,322ms -> 64ms
Reviewed-by: Matthew Macy <mmacy@freebsd.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11282
When running in the CI the zpool_import_012_pos test case occasionally
takes longer than the maximum 600 seconds. When this happens the test
case is considered to have failed but always completes a few minutes
latter. Since the logs suggest nothing has actually failed this commit
increases timeout and removes the exception.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11286
Occasionally an out of memory error is hit by this test case
when mounting the filesystems. Try and reduce the likelihood
of this occurring by reducing the thread count from 100 to 50.
It also has the advantage of slightly speeding up the test.
cannot mount 'testpool/testfs3/79': Cannot allocate memory
filesystem successfully created, but not mounted
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11283
This run file contains a subset of functional tests which exercise
as much functionality as possible while still executing relatively
quickly. The included tests should take no more than a few seconds
each to run at most. This provides a convenient way to sanity test a
change before committing to a full test run which takes several hours.
$ ./scripts/zfs-tests.sh -r sanity
...
Results Summary
PASS 813
Running Time: 00:14:42
Percent passed: 100.0%
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11271
It was found that setting min_active tunables for non-interactive I/Os
makes them stuck. It is caused by zfs_vdev_nia_delay, that can never
be reached if we never issue any I/Os due to min_active set to zero.
Fix this by issuing at least one non-interactive I/O at a time when
there are no interactive I/Os. When there are interactive I/Os, zero
min_active allows to completely block any non-interactive I/O. It may
min_active starvation in some scenarios, but who we are to deny foot
shooting?
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes#11261
Investigating influence of scrub (especially sequential) on random read
latency I've noticed that on some HDDs single 4KB read may take up to 4
seconds! Deeper investigation shown that many HDDs heavily prioritize
sequential reads even when those are submitted with queue depth of 1.
This patch addresses the latency from two sides:
- by using _min_active queue depths for non-interactive requests while
the interactive request(s) are active and few requests after;
- by throttling it further if no interactive requests has completed
while configured amount of non-interactive did.
While there, I've also modified vdev_queue_class_to_issue() to give
more chances to schedule at least _min_active requests to the lowest
priorities. It should reduce starvation if several non-interactive
processes are running same time with some interactive and I think should
make possible setting of zfs_vdev_max_active to as low as 1.
I've benchmarked this change with 4KB random reads from ZVOL with 16KB
block size on newly written non-fragmented pool. On fragmented pool I
also saw improvements, but not so dramatic. Below are log2 histograms
of the random read latency in milliseconds for different devices:
4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before:
0, 0, 2, 1, 12, 21, 19, 18, 10, 15, 17, 21
after:
0, 0, 0, 24, 101, 195, 419, 250, 47, 4, 0, 0
, that means maximum latency reduction from 2s to 500ms.
4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before:
0, 0, 2, 31, 38, 28, 18, 12, 17, 20, 24, 10, 3
after:
0, 0, 55, 247, 455, 470, 412, 181, 36, 0, 0, 0, 0
, i.e. from 4s to 250ms.
1 SAS HDD SEAGATE ST14000NM0048 before:
0, 0, 29, 70, 107, 45, 27, 1, 0, 0, 1, 4, 19
after:
1, 29, 681, 1261, 676, 1633, 67, 1, 0, 0, 0, 0, 0
, i.e. from 4s to 125ms.
1 SAS SSD SEAGATE XS3840TE70014 before (microseconds):
0, 0, 0, 0, 0, 0, 0, 0, 70, 18343, 82548, 618
after:
0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844, 90
I've also measured scrub time during the test and on idle pools. On
idle fragmented pool I've measured scrub getting few percent faster
due to use of QD3 instead of QD2 before. On idle non-fragmented pool
I've measured no difference. On busy non-fragmented pool I've measured
scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#11166
Busybox's mktemp requires at least six X's in the template, causing
the current sed --in-place check to fail because the file does not
exist. This change adds additional X's to mktemp templates that do
not already have at least six X's in them.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Quentin Zdanis <zdanisq@gmail.com>
Closes#11269
When ZFS_COLOR is set, zpool status shows row headings in bold,
except for the "remove:" heading. This is a quick fix that makes
it print in bold too.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Sun <me@andrewsun.com>
Closes#11255
Check for the history_event type instead.
The zfs-list-cacher.sh script currently respects the event types
excluded from syslog(!) in ZED_SYSLOG_SUBCLASS_EXCLUDE.
This makes little sense in this single-purpose script and
silently breaks when history_events are excluded from syslog,
which is the default since 13d65987a9.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes#11164Closes#11347
Extend the change made in ae12b02 to verify the zfs kernel
modules are loaded to the rest of the OpenZFS services. If
the modules aren't loaded the neither the share, volume, or
and zed services can be started.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11243
Despite that dracut has a hard dependency on bash,
its modules doesn't, dracut only has a hard dependency on bash for
module-setup (on a fully usable machine). Inside initramfs, dracut
allows users choose from a list of handful other shells, e.g. bash,
busybox, dash, mkfsh.
In fact, my local machine's initramfs is being built with dash,
and it's functional for a very long time.
Before 64025fa3a (Silence 'make checkbashisms', 2020-08-20), we also
allows our users to have that right, too.
Let's fix the problem 'make checkbashisms' reported and allows our users
to have that right, again.
For 'plymouth' case, let's simply run the command inside the if instead
of checking for the existence of command before running it, because the
status is also failture if plymouth is unavailable.
While we're at it, let's remove an unnecessary fork for grep in
zfs-generator.sh.in and its following complicated 'if elif fi' with
a simple 'case ... esac'.
To support this change, also exclude 90zfs from "make checkbashisms"
because the current CI infrastructure ships an old version of
"checkbashisms", which complains about "command -v", while the current
latest "checkbashisms" thinks it's fine. In the near future, we can
revert that change to "Makefile.am" when CI infrastructure is updated.
Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Closes#11244
Under certain conditions commit a3a4b8def appears to result in a
hang, or poor performance, when importing a pool. Until the root
cause can be identified it has been reverted from the release branch.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11245
Investigating influence of scrub (especially sequential) on random read
latency I've noticed that on some HDDs single 4KB read may take up to 4
seconds! Deeper investigation shown that many HDDs heavily prioritize
sequential reads even when those are submitted with queue depth of 1.
This patch addresses the latency from two sides:
- by using _min_active queue depths for non-interactive requests while
the interactive request(s) are active and few requests after;
- by throttling it further if no interactive requests has completed
while configured amount of non-interactive did.
While there, I've also modified vdev_queue_class_to_issue() to give
more chances to schedule at least _min_active requests to the lowest
priorities. It should reduce starvation if several non-interactive
processes are running same time with some interactive and I think should
make possible setting of zfs_vdev_max_active to as low as 1.
I've benchmarked this change with 4KB random reads from ZVOL with 16KB
block size on newly written non-fragmented pool. On fragmented pool I
also saw improvements, but not so dramatic. Below are log2 histograms
of the random read latency in milliseconds for different devices:
4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before:
0, 0, 2, 1, 12, 21, 19, 18, 10, 15, 17, 21
after:
0, 0, 0, 24, 101, 195, 419, 250, 47, 4, 0, 0
, that means maximum latency reduction from 2s to 500ms.
4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before:
0, 0, 2, 31, 38, 28, 18, 12, 17, 20, 24, 10, 3
after:
0, 0, 55, 247, 455, 470, 412, 181, 36, 0, 0, 0, 0
, i.e. from 4s to 250ms.
1 SAS HDD SEAGATE ST14000NM0048 before:
0, 0, 29, 70, 107, 45, 27, 1, 0, 0, 1, 4, 19
after:
1, 29, 681, 1261, 676, 1633, 67, 1, 0, 0, 0, 0, 0
, i.e. from 4s to 125ms.
1 SAS SSD SEAGATE XS3840TE70014 before (microseconds):
0, 0, 0, 0, 0, 0, 0, 0, 70, 18343, 82548, 618
after:
0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844, 90
I've also measured scrub time during the test and on idle pools. On
idle fragmented pool I've measured scrub getting few percent faster
due to use of QD3 instead of QD2 before. On idle non-fragmented pool
I've measured no difference. On busy non-fragmented pool I've measured
scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#11166
Name of dataset for user home directory may vary from the expected
$homes_prefix/$username, if different naming scheme is being used.
We can use property mountpoint to specify the dataset for $username
as long as its value is identical to passwd's pw_dir.
For example:
NAME PROPERTY VALUE
rpool/home/myuser_123456 mountpoint /home/myuser
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Crag Wang <crag0715@gmail.com>
Closes#11165
In order for package managers such as dnf to upgrade cleanly after
the package SONAME bump the obsolete package names must be known.
Update the new packages to correctly obsolete the old ones.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11230Closes#11233
We do not build libnvpair.pc. Moreover, it is automatically pulled in
by libzfs.pc, so no additional specific dependency is required.
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes#11227
The ABI should be included when generating the `make dist` tarball
since it's required by the `make checkabi` target.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11225
Commit a1d477c2 accidentally disabled DTL updates for the zil_claim()
case described at the end of vdev_stat_update() by unconditionally
disabling all DTL updates when loading. This was done to avoid
a deadlock on the vd_dtl_lock when loading the DTLs from disk.
vdev_dtl_contains <--- Takes vd->vd_dtl_lock
vdev_mirror_child_missing
vdev_mirror_io_start
zio_vdev_io_start
__zio_execute
arc_read
dbuf_issue_final_prefetch
dbuf_prefetch_impl
dbuf_prefetch
dmu_prefetch
space_map_iterate
space_map_load_length
space_map_load
vdev_dtl_load <--- Takes vd->vd_dtl_lock
vdev_load
spa_ld_load_vdev_metadata
spa_tryimport
The missing DTL updates can be restored by moving the space_map_load()
call outside the vd_dtl_lock. A private range tree is populated by
reading the space map and then merged in to the DTL_MISSING tree
under the lock.
Furthermore, the SPA_LOAD_NONE check in vdev_dtl_contains() leads to an
additional problem. Any resilvering which occurs before SPA_LOAD_NONE
is set will incorrectly determine that there's nothing to repair. This
can result in full redundancy not being restored for some blocks.
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11218
RPM and DEB packages are named after the SONAME version of the library
they contain. After bumping this version, the packaging should be
renamed.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes#11219
For the OpenZFS 2.0 release branch extend the CI checkstyle
workflow to perform the library ABI checks.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11215
Add a snapshot of the OpenZFS 2.0 ABI using libabigail-1.7-2.
The included ABI passes `make checkabi` for CentOS 7, Fedora 33,
Debian 10, and Ubuntu 20.04. This covers a fairly wide range
of glibc, gcc, and libabigail versions plus other changes which
are platform specific.
Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11144
Provide two make targets: checkabi and storeabi.
storeabi uses libabigail to generate a reference copy of the ABI for the
public libraries.
checkabi compares such a reference to the compiled version, failing if
they are not compatible. No ABI is generated for libzpool.so, it is
only used by ztest and zdb and not external consumers.
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes#11144
- Don't leave fstrans set when passed a snapshot
- Don't remove minor if volmode already matches new value
- (FreeBSD) Wait for GEOM ops to complete before trying
remove (at create time GEOM will be "tasting" in parallel)
- (FreeBSD) Don't leak zvol_state_lock on open if zv == NULL
- (FreeBSD) Don't try to unlock zv->zv_state lock if zv == NULL
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#11199
zpool_expand_proplist() now ignores pl_fixed if its new literal
argument is true. The rest is a consequence of needing to pass
that down.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiao?=~Dska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11202
For encrypted receives, where user accounting is initially disabled on
creation, both 'zfs userspace' and 'zfs groupspace' fails with
EOPNOTSUPP: this is because dmu_objset_id_quota_upgrade_cb() forgets to
set OBJSET_FLAG_USERACCOUNTING_COMPLETE on the objset flags after a
successful dmu_objset_space_upgrade().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#9501Closes#9596
In case of cache device removal it is possible that at the end of
l2arc_evict() we have l2ad_hand = l2ad_evict. This can lead to the
following panic in case of a debug build:
VERIFY3(dev->l2ad_hand < dev->l2ad_evict) failed (321920512 < 321920512)
Call Trace:
dump_stack+0x66/0x90
spl_panic+0xef/0x117 [spl]
l2arc_remove_vdev+0x11d/0x290 [zfs]
spa_load_l2cache+0x275/0x5b0 [zfs]
spa_vdev_remove+0x4a5/0x6e0 [zfs]
zfs_ioc_vdev_remove+0x59/0xa0 [zfs]
zfsdev_ioctl_common+0x5b3/0x630 [zfs]
zfsdev_ioctl+0x53/0xe0 [zfs]
do_vfs_ioctl+0x42e/0x6b0
ksys_ioctl+0x5e/0x90
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
In case of cache device removal it also possible that l2ad_hand +
distance > l2ad_end since we do not iterate l2arc_evict() and l2ad_hand
is not reset. This has no functional consequence however as the cache
device is about to be removed.
Fix this by omitting the ASSERT in case of device removal.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#11205
On systems with musl libc, hostid(1) always prints "00000000", which
will cause improper behavior when the 90zfs module is configured in a
dracut initramfs. Work around this by copying the host /etc/hostid if
the file exists, and otherwise only write /etc/hostid if hostid(1)
returns something meaningful. This avoids zgenhostid creating a random
/etc/hostid for the initramfs, which could lead to errors when trying to
import the pool if spl_hostid isn't defined in the kernel command line.
Furthermore, tag the /etc/hostid file as hostonly, since it is system
specific and shouldn't be taken into account when trying to use an
initramfs generated in one system to boot into a different system.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Co-authored-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes#11174Closes#11189
A common usage pattern for zgenhostid, including in the ZFS dracut
module, is running it as:
zgenhostid $(hostid)
However, zgenhostid only accepted hostid arguments greater than 0, which
meant that, when the output of hostid(1) was "00000000", zgenhostid
would error out, even though 0 is a possible return value for the
gethostid(3) function used by hostid(1):
- On current musl libc, gethostid(3) is a stub that always returns 0.
- On glibc, gethostid(3) will return 0 if /etc/hostid exists but is
smaller than 4 bytes.
In these cases, it makes more sense for zgenhostid to treat a value of 0
as other parts of the zfs codebase do, meaning that a hostid value
couldn't be determined; therefore, it should attempt to generate a
random value to write into /etc/hostid.
The manpage and usage output have been updated to reflect this.
Whitespace has also been fixed in the usage output.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes#11174Closes#11189
The ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP macros should not be used
in the Linux zpl_*.c source files. They return a positive error
value which is correct for the common code, but not for the Linux
specific kernel code which expects a negative return value. The
ZPL_ENTER/ZPL_EXIT/ZPL_VERFY_ZP macros should be used instead.
Furthermore, the ZPL_EXIT macro has been updated to not call the
zfs_exit_fs() function. This prevents a possible deadlock which
can occur when a snapshot is automatically unmounted because the
zpl_show_devname() must never wait on in progress automatic
snapshot unmounts.
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11169Closes#11201
The output of ZFS channel programs is logged on-disk in the zpool
history, and printed by `zpool history -i`. Channel programs can use
10MB of memory by default, and up to 100MB by using the `zfs program -m`
flag. Therefore their output can be up to some fraction of 100MB.
In addition to being somewhat wasteful of the limited space reserved for
the pool history (which for large pools is 1GB), in extreme cases this
can result in a failure of `ASSERT(length <= DMU_MAX_ACCESS);` in
`dmu_buf_hold_array_by_dnode()`.
This commit limits the output size that will be logged to 1MB. Larger
outputs will not be logged, instead a entry will be logged indicating
the size of the omitted output.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#11194
ZFS channel programs (invoked by `zfs program`) are executed in a LUA
sandbox with a limit on the amount of memory they can consume. The
limit is 10MB by default, and can be raised to 100MB with the `-m` flag.
If the memory limit is exceeded, the LUA program exits and the command
fails with a message like `Channel program execution failed: Memory
limit exhausted.`
The LUA sandbox allocates memory with `vmem_alloc(KM_NOSLEEP)`, which
will fail if the requested memory is not immediately available. In this
case, the program fails with the same message, `Memory limit exhausted`.
However, in this case the specified memory limit has not been reached,
and the memory may only be temporarily unavailable.
This commit changes the LUA memory allocator `zcp_lua_alloc()` to use
`vmem_alloc(KM_SLEEP)`, so that we won't spuriously fail when memory is
temporarily low. Instead, we rely on the system to be able to free up
memory (e.g. by evicting from the ARC), and we assume that even at the
highest memory limit of 100MB, the channel program will not truly
exhaust the system's memory.
External-issue: DLPX-71924
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#11190
The custom zpl_show_devname() helper should translate spaces in
to the octal escape sequence \040. The getmntent(2) function
is aware of this convention and properly translates the escape
character back to a space when reading the fsname.
Without this change the `zfs mount` and `zfs unmount` commands
incorrectly detect when a dataset with a name containing spaces
is mounted.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11182Closes#11187
The microzap hash can sometimes be zero for single digit snapnames.
The zap cursor can then have a serialized value of two (for . and ..),
and skip the first entry in the avl tree for the .zfs/snapshot directory
listing, and therefore does not return all snapshots.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: Tony Perkins <tperkins@datto.com>
Closes#11039
It is a leftover from illumos always set to NULL and introducing a
spurious difference between zio_buf and zio_data_buf.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#11188
This looks like it was once from the illumnos compat code.
FreeBSD doesn't have cmn_err as a compiler format attribute, so
it definitely errors out.
It doesn't show up on LLVM because it doesn't trigger at all.
Add in the format flags but keep them behind #if 0 for now;
there are too many format issues that trigger when one does
format checking in the shared code.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes#11068Closes#11069
This shows up when compiling freebsd-head on amd64 using gcc-6.4.
The lib32 compat build ends up tripping over this assumption.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes#11068Closes#11069
Remove reference to EFI(?), explain that the new space
is beyond the GPT for whole-disk vdevs, and add section noting how it
behaves with partition vdevs in terms of how the user is most likely to
encounter it ‒ the previous phrasing was confusing
and seemed to indicate that "zpool online -e" will be able to claim
GPT[whatever, ZFS, free space, whatever]
into
GPT[whatever, ZFS, whatever]
but that's not the case, as it'll only be able to do so after manually
resizing the ZFS partition to include the free space beforehand, i.e.:
GPT[whatever, ZFS, free space, whatever]
GPT[whatever, [ZFS + free space], potentially left-overs, whatever]
# zpool online -e
GPT[whatever, ZFS, whatever]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11158
The copy_exec() function expects that the full path of the target
file is passed rather than just the directory, and will take care
of creating the underlying directories if they don't exist.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Closes#11162
We can consolidate the unlocking procedure into one place by starting
with drop_suspend set to B_FALSE and moving the open count check up.
While here, a little code cleanup. Match the out labels between
zvol_geom_open and zvol_cdev_open, and add a missing period in some
comments.
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11175
zvol_first_open can fail with EINTR if spa_namespace_lock is not held
and cannot be taken without waiting.
Apply the same logic that was done for zvol_geom_open to take
spa_namespace_lock if not already held on first open in zvol_cdev_open.
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11175
After initial arc_c was reduced to arc_c_min it became possible that
on datasets with primarycache=metadata or none dirty data make up most
of ARC capacity and easily more than configured 50% of initial arc_c,
that causes forced txg commits by arc_tempreserve_space() and periodic
very long write delays.
This patch makes arc_tempreserve_space() to use arc_c only after ARC
warmed up once and arc_c really means something, but use arc_c_max
before that.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#11178
Fix a couple of places where the wrong tag is passed
to dnode_{hold, rele}
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#11184
Add a new test case which corrupts all level 1 block in a file.
Then verifies that corruption is detected and repaired.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11141
The second part of list_file_blocks transforms the object description
output by zdb -ddddd $ds $objnum into a stream of lines of the form
"level path offset length" for the indirect blocks in the given file.
The current code only works for the first copy of L0 blocks. L1 and
L2 indirect blocks have more than one copy on disk.
Add one more -d to the zdb command so we get all block copies and
rewrite the transformation to match more than L0 and output all DVAs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11141
The first part of list_file_blocks transforms the pool configuration
output by zdb -C $pool into shell code to set up a shell variable,
VDEV_MAP, that maps from vdev id to the underlying vdev path. This
variable is a simple indexed array. However, the vdev id in a DVA is
only the id of the top level vdev.
When the pool is mirrored, the top level vdev is a mirror and its
children are the mirrored devices. So, what we need is to map from
the top level vdev id to a list of the underlying vdev paths.
ist_file_blocks does not need to work for raidz vdevs, so we can
disregard that case.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11141
Check if the ZVOL has been written before calling zil_async_to_sync.
The ZIL will be opened on the first write, not earlier.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes#11152
spa_config_load() passes NULL into resid when doing zfs_file_read().
This would trip over when vfs.zfs.autoimport_disable=0.
Sponsored by: The FreeBSD Foundation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ka Ho Ng <khng@freebsdfoundation.org>
Closes#11149
Bump library SOVERSION under Linux to match FreeBSD's.
Additionally, this bump properly accounts for the ABI changes relative
to ZoL 0.8.5 for the Linux build.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Issue #11144
SET_ERROR is useful to trace errors, so use it where the errors occur
rather than factored out to the end of a function.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11146
The expected variance for this test case was originally set at 10%
based on local testing. Additional testing via the CI has show it
can be as large as 11%. Increase the expected maximum to 12% to
prevent this test from incorrectly failing.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11148
The events_001_pos.ksh test case can fail because it's possible,
and correct, for the config_sync event to be posted after the last
"expected" event. To accommodate this the run_and_verify() function
has been updated to wait for all non-history events, not just the
last event. This does not increase the run time of the test as
long as all the events do get generated.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11147
A new function was added named revalidate_disk_size() and the old
revalidate_disk() appears to have been deprecated. As the only ZFS
code that calls this function is zvol_update_volsize, swapping the
old function call out for the new one should be all that is required.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes#11085
Kernel 5.10 removed check_disk_change() in favor of callers using
the faster bdev_check_media_change() instead, and explicitly forcing
bdev revalidation when they desire that behavior. To preserve prior
behavior, I have wrapped this into a zfs_check_media_change() macro
that calls an inline function for the new API that mimics the old
behavior when check_disk_change() doesn't exist, and just calls
check_disk_change() if it exists.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes#11085
Kernel commit 2b0d3d3e4fcfb brought in some changes to the struct
percpu_ref structure that moves most of its fields into a member
struct named "data" of type struct percpu_ref_data. This includes
the "count" member which is updated by vdev_blkg_tryget(), so update
this function to chase the API change, and detect it via configure.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes#11085
In Linux 5.10 the linux/frame.h header was renamed linux/objtool.h.
Add a configure check to detect and use the correctly named header.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11085
Avoid checking the whole array of objects each time by removing the self
organized memory reaping. this can be managed by the global memory reap
callback which is called every 60 seconds. this will reduce the use if
locking operations significant.
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes#11126
zvol private data is supposed to be nulled by zvol_clear_private before
zvol_free is called as an indicator that the zvol is going away.
Implement zvol_clear_private for volmode=dev.
Assert that zvol_clear_private has been called before zvol_free.
Check that zvol_clear_private has not been called when updating
volsize. If it has, fail with ENXIO.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11117
We fall back to a default volmode and continue when looking up a zvol's
volmode property fails. After this we should set the error to 0 to
ensure we take the success paths in the out section.
While here, make sure we only log that the zvol was created on success.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11117
zvol_geom_close gets a count of the number of close operations to do.
Make sure we're always using this count to check if this will be the
last close operation performed on the zvol.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11117
Using more specific assert variants gives better messages on failure.
No functional change.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11117
Note that this only tracks sizes as requested by the caller.
Actual allocated space will almost always be bigger (e.g., rounded up to
the next power of 2 or page size). Additionally the allocated buffer may
be holding other areas hostage. Nonetheless, this is a starting point
for tracking memory usage in zstd.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#11129
While evaluating other assembler implementations it turns out that
the precomputed hash subkey tables vary in size, from 8*16 bytes
(avx2/avx512) up to 48*16 bytes (avx512-vaes), depending on the
implementation.
To be able to handle the size differences later, allocate
`gcm_Htable` dynamically rather then having a fixed size array, and
adapt consumers.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes#11102
AT_BENEATH was merged to stable/12, where kern_unlinkat takes a
non-const path. DECONST the path passed to kern_unlinkat in the
case where AT_BENEATH is defined.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11139
L2ARC devices of several terabytes filled with 4KB blocks may take 15
minutes to rebuild. Due to the way L2ARC log reading is implemented
it is quite likely that for all that time rebuild thread will never
sleep. At least on FreeBSD kernel threads have absolute priority and
can not be preempted by threads with lower priorities. If some thread
is also bound to that specific CPU it may not get any CPU time for all
the 15 minutes.
Reviewed-by: Cedric Berger <cedric@precidata.com>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes#11116
ZFS always waits for the write completion before flushing the cache.
That is why it does not require explicit ordering fences around it,
which are pretty difficult to implement for NVMe, since one has no
internal concept of strict request ordering.
This was already removed from FreeBSD once, but got resurrected
by mistake during OpenZFS merge.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes#11130
Previously, xattr_004_pos would create files with xattrs on both
tmpfs and ext2, and then copy them to zfs to verify that their
xattrs were preserved. However tmpfs doesn't support xattrs.
This was never noticed until Fedora 33. In Fedora 32 and older,
/tmp was on the root partition (like ext4), whereas on Fedora 33
/tmp is actually tmpfs. That caused this test to fail on Fedora 33.
This fix updates the test to only create the file on ext2, not tmpfs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#11133
The port removed provisions for zfs_znode_move but the cleanup missed
this bit. To quote the original:
[snip]
list_insert_tail(&zfsvfs->z_all_znodes, zp);
membar_producer();
/*
* Everything else must be valid before assigning z_zfsvfs makes the
* znode eligible for zfs_znode_move().
*/
zp->z_zfsvfs = zfsvfs;
[/snip]
In the current code it is immediately followed by unlock which issues
the same fence, thus plays no role in correctness.
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#11115
The Linux kernel MODULE_LICENSE macro only recognizes a handful of
license strings and "MIT" is not one of the them. Update the macro
to use "Dual MIT/GPL" which is recognized and what the kernel expects
MIT licensed modules to use.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11112Closes#11113
These kstats are often expensive to compute so we want to avoid them
unless specifically requested.
The following kstats are affected by this change:
kstat.zfs.${pool}.multihost
kstat.zfs.${pool}.misc.state
kstat.zfs.${pool}.txgs
kstat.zfs.misc.fletcher_4_bench
kstat.zfs.misc.vdev_raidz_bench
kstat.zfs.misc.dbufs
kstat.zfs.misc.dbgmsg
In FreeBSD 13, sysctl(8) has been updated to still list the
names/description/type of skipped sysctls so they are still
discoverable.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11099
Use proper names (i.e. zfs-allow and zpool-add) in NAME subsections
of zfs/zpool subcommands instead of current "pretty-printed" ones as
makewhatis utilities (or some implementations of it, namely the one
from mandoc suite used in FreeBSD) look not only at the document title
but also in NAME subsection, adding zfs(8)/zpool(8) to search results
which is not correct. (Common sense and other utilities splitting
subcommands in multiple man pages, e.g. git, do the same.)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: xtouqh <xtouqh@hotmail.com>
Closes#11086
This was removed in a reorganization of directories preparing for the
merge of FreeBSD support, 006e9a4088 by mmacy. While llvm is perfectly
happy with the nonexistent -I directory, the gcc6 and gcc9 we can elect
to use as cross-toolchains both trip over it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes#11077
ZED will log zevents summaries to the syslog, however the log entries
tend to drop event details that can be useful for diagnosis. This is
especially true for ereport events, like io, checksum, and delay.
Update the all-syslog.sh script to log additional event information.
Add an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc
for choosing GUIDs over names for pool and vdev.
Change the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event
events. These events tend to be frequent, convey no meaningful info,
and are already logged in the zpool history.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes#10967
The removal of a vdev in the normal class would fail if there was a
special or deup vdev that had a different ashift than the vdevs in
the normal class.
Moved the initialization of spa_min_ashift / spa_max_ashift from
vdev_open so that it occurs after the vdev allocation bias was
initialized (i.e. after vdev_load).
Caveat -- In order to remove a special/dedup vdev it must have the
same ashift as the normal pool vdevs. This could perhaps be lifted
in the future (i.e. for the case where there is ample space in any
surviving special class vdevs)
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes#9363Closes#9364Closes#11053
This is a follow up fix for commit 0fdd6106bb. The VERIFY is
only true when we haven't hit an error code path. See added
test case for a reproducer.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes#11048
After a side-effectful call like add or remove, references to range
segs stored in btrees can no longer be used safely. We move the
remove call to just before the reinsertion call so that the seg
remains valid for as long as we need it.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#11044Closes#11056
The 32-bit counter eventually wraps to 0 which is a sentinel for invalid
id.
Make it 64-bit on LP64 platforms and 0-check otherwise.
Note: Linux counterpart uses id stored per queue instead of a global.
I did not check going that way is feasible with the goal being the
minimal fix doing the job.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#11059
The acltype property is currently hidden on FreeBSD and does not
reflect the NFSv4 style ZFS ACLs used on the platform. This makes it
difficult to observe that a pool imported from FreeBSD on Linux has a
different type of ACL that is being ignored, and vice versa.
Add an nfsv4 acltype and expose the property on FreeBSD.
Make the default acltype nfsv4 on FreeBSD.
Setting acltype to an unhanded style is treated the same as setting
it to off. The ACLs will not be removed, but they will be ignored.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10520
In FreeBSD, there are three compile environments that are supported:
user land, the kernel and the bootloader / standalone. Adjust the
headers to compile in the standalone environment. Limit kernel-only
items from view when _STANDALONE is defined.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Warner Losh <imp@FreeBSD.org>
Closes#10998
The zstd code assumes that if you are on aarch64, you have NEON
instructions. This is not necessarily true. In a boot loader, where
you might not have the VFP properly initialized, these instructions
may not be available. It's also an error to include arm_neon.h when
the NEON insturctions aren't enabled. Change the guards for using the
NEON instructions from __aarch64__ to __ARM_NEON which is the standard
symbol for knowing if they are available.
__ARM_NEON is the proper symbol, defined in ARM C Language Extensions
Release 2.1 (https://developer.arm.com/documentation/ihi0053/d/). Some
sources suggest __ARM_NEON__, but that's the obsolete spelling from
prior versions of the standard.
Updated based on zstd pull request https://github.com/facebook/zstd/pull/2356
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@bsdimp.com>
Closes#11055
Missing struct initialization in a config test results in the
interface being incorrectly detected.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: Mathieu Velten <matmaul@gmail.com>
Closes#10713Closes#11049
This increases the Linux kernel version to 5.9 from 5.8
as most compatibility fixes should already be included.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes#11050
It is a common mistake to have failed to autoload the module due to
permission issues when running a ZFS command as a user. "Operation
not permitted" is an unhelpfully vague error message.
Use a thread-local message buffer to format a nicer error message.
We can infer that loading the kernel module failed if the module is
not loaded. This can be extended with heuristics for other errors
in the future.
While looking at this stuff, remove an unused thread-local message
buffer found in libspl and remove some inaccurate verbiage from the
comment on libzfs_load_module.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11033
FreeBSD had this value tunable before the switch to the new OpenZFS.
The tunable name has changed, breaking legacy compat.
Restore legacy compat for this tunable, properly expose the tunable
with the new name on all platforms, and document it in
zfs-module-parameters(5).
While here, clean up the documentation for zfetch_max_distance a bit.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11038
The value of zp is used without having been initialized under some
conditions. Initialize the pointer to NULL.
Add a regression test case using chown in acl/posix. However, this is
not enough because the setup sets xattr=sa, which means zfs_setattr_dir
will not be called. Create a second group of acl tests in acl/posix-sa
duplicating the acl/posix tests with symlinks, and remove xattr=sa from
the original acl/posix tests. This provides more coverage for the
default xattr=on code.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10043Closes#11025
This change updates the documentation to refer to the project
as OpenZFS instead ZFS on Linux. Web links have been updated
to refer to https://github.com/openzfs/zfs. The extraneous
zfsonlinux.org web links in the ZED and SPL sources have been
dropped.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11007
A missing semicolon between kmoddir variable declaration and the
uninstall for loop caused modules_uninstall-Linux to fail with:
Syntax error: "do" unexpected
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jacob Adams <jacob@tookmund.com>
Closes#11032
When running libzpool with the Undefined Behavior Sanitizer (ubsan)
enabled, a zpool create causes a run-time error:
module/zfs/vdev_label.c:600:14: runtime error: shift exponent 64 is
too large for 64-bit type 'long long unsigned int'`
in vdev_config_generate()
Fix is to convert vdev_removal_max_span to its base-2 logarithm, using
highbit64(), and then compare the "shifts".
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Chuck Tuffli <ctuffli@gmail.com>
Closes#9744Closes#11024
fixup of 196bee4
On gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), the code removed
caused `-Wmaybe-uninitialized` errors.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes#11021
With procfs_list kstats implemented for FreeBSD, dbufs are now exposed
as kstat.zfs.misc.dbufs.
On FreeBSD, dbufstats can use the sysctl instead of procfs when no
input file has been given.
Enable the dbufstats tests on FreeBSD.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes#11008
Code cleanup. Sort includes, remove duplicates, and drop
some extra blank lines in kmod_core.c.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes#11000
We were missing an include for kernel FPU functions, breaking the build
on FreeBSD 12.1-RELEASE. This was apparently being pulled in from
elsewhere on stable/12 and head.
Sorted the other includes in these files while here.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11005
When resuming an interrupted ZFS send stream that creates a new dataset
with the same name as an existing dataset, if the existing dataset is
accessed after the failed receive, then after the subsequent successful
receive it will return EIO. This happens because nothing mounts the new
dataset, leaving the old, no longer valid dataset still mounted.
This commit fixes zfs receive to always unmount and remount the
destination, regardless of whether the stream is a new stream or a
resumed stream.
Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
External-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249579Closes#10995Closes#10999
In C, const indicates to the reader that mutation will not occur.
It can also serve as a hint about ownership.
Add const in a few places where it makes sense.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes#10997
This causes "zfs send -vt ..." to fail with:
cannot resume send: Unknown error 1030
It turns out that some of the name/value pairs in the verification
list for zfs_ioc_send_space(), zfs_keys_send_space, had the wrong
name, so the ioctl got kicked out in zfs_check_input_nvpairs().
Update the names accordingly.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes#10978
The kernel seq_read() helper function expects ->next() to update
the passed position even there are no more entries. Failure to
do so results in the following warning being logged.
seq_file: buggy .next function procfs_list_seq_next [spl]
did not update position index
Functionally there is no issue with the way procfs_list_seq_next()
is implemented and the warning is harmless. However, we want to
silence this some what scary incorrect warning. This commit
updates the Linux procfs code to advance the position even for
the last entry.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#10984Closes#10996
The request number is out of bounds of the platform table.
Subtract the starting offset to get the correct subscript.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10994
`dbuf_stats_hash_table_data` can take much longer than it needs to
by repeatedly bzeroing its buffer when in fact the buffer only needs
to be NULL terminated.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#10993
In non regular use cases allocated memory might stay persistent in memory
pool. This small patch checks every minute if there are old objects which
can be released from memory pool.
Right now with regular use, the pool is checked for old objects on each
allocation attempt from this pool. so basically polling by its use. Now
consider what happens if someone writes a lot of files and stops use of
the volume or even unmounts it. So the code will no longer check if
objects can be released from the pool. Already allocated objects will
still stay in pool cache. this is no big issue for common use. But
someone discovered this issue while doing tests. personally i know this
behavior and I'm aware of it. Its no big issue. just a enhancement
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes#10938Closes#10969
When an invalid incremental send is requested where the "to" ds is
before the "from" ds, make sure to drop the reference to the pool
and the dataset before returning the error.
Add an assert on FreeBSD to make sure we don't hold any locks after
returning from an ioctl.
Add some test coverage.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10919
Add community compatibility patches for Intel QAT
Due to incompatibility with higher kernel versions.
Also includes basic instructions.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes#10961Closes#10962
The Linux kernel MODULE_LICENSE macro only recognizes a handful of
license strings and "BSD" is not one of the them. Update the macro
to use "Dual BSD/GPL" which is recognized and what the kernel expects
BSD licensed module to use.
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#10982Closes#10992
This check was accidentally broken when the kABI checks were updated
to run in parallel, commit 608f874. The check must be for the
config_debug_lock_alloc_license name to determine if the symbol
is license compatible.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#10991
The m4 objtool configure check can incorrectly fail because of a
missing header in the test. This appears to be the result of a
recent kernel change and was observed on the Fedora 5.8.11-200
kernel.
In file included from /home/fedora/zfs/build/objtool/objtool.c:75:
./arch/x86/include/asm/frame.h:100:57: error: 'struct pt_regs'
declared inside parameter list will not be visible outside
of this definition or declaration [-Werror]
The consequence of this is that the "stack_frame_non_standard"
check is never run and HAVE_STACK_FRAME_NON_STANDARD is set
incorrectly which results in a build failure. This change adds
the appropriate header to the "objtool" check so it now behaves
as intended.
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#10990
The error returned by `zpool remove` when the encryption keys aren't
loaded isn't very helpful. Furthermore, the man pages make no
mention that the keys need to be loaded. This change doesn't resolve
the error message but it does update the man page to mention this
requirement.
Authored-by: grodik <pat@litke.dev>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#10939Closes#10948
This change documents the currently used branching structure.
It has been cut down to not include any controversial changes.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Closes#10976
Change zfs userspace subcommand to use zfs_path_to_zhandle() so that
the provided dataset can be a path (/usr) or a dataset (rpool/usr).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes#8915
Prefetching of dnodes in dbuf_read() can cause significant mutex
contention for some workloads and isn't very helpful. This is
because we already get 32 dnodes for each block read, and when
iterating over a directory we prefetch the dnodes in the directory.
Disable this prefetching to prevent the lock contention.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Submitted-by: Adam Moss <c@yotes.com>
Submitted-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Adam Moss <c@yotes.com>
Closes#10877Closes#10953
With PREEMPTION=y and BLK_CGROUP=y preempt_schedule_notrace() is being
used on arm64 which is a GPL-only function and hence the build of the
DKMS kernel module fails.
Fix that by redefining preempt_schedule_notrace() to preempt_schedule()
which should be safe as long as tracing is not used.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Closes#8545Closes#9948Closes#10416Closes#10973
Address some unused value and control flow issues flagged by Coverity.
Unreachable code is pruned and unused values are avoided.
Some scattered sections are reordered for coherence.
We can assume kmem_alloc(n, KM_SLEEP) doesn't fail, so there is no need
to check if it returned NULL. The allocated memory doesn't need to be
zeroed, other than the last iovec (the MAC).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10884
If the /etc/exports.d directory does not exist, then we should only
create it when we're performing an action which already requires root
privileges.
This commit moves the directory creation to the enable/disable code
path which ensures that we have the appropriate privileges.
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes#10785Closes#10934
lr_write_t records that are WR_COPIED have the record data directly
appended to them (see lr_write_t type definition).
The data is copied from the debuf using dmu_read_by_dnode.
This function was called, only for WR_COPIED records, as part of a
short-circuiting if-statement's if-expression.
I found this side-effectful call to dmu_read_by_dnode pretty
hard to spot.
This patch improves readability by moving the call to its own line.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes#10956
The procfs_list interface is required by several kstats. Implement
this functionality for FreeBSD to provide access to these kstats.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#10890
Resolves an issue with `zfs send` streams from 0.8.4 which
prevents them from being received by versions < 0.7.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#10911Closes#10916
In commit cd32b4f5b7 ("Fix a deadlock in the FreeBSD getpages VOP") I
introduced a bug while porting the patch originally committed to
FreeBSD: the rangelock pointer may be NULL if the try operation failed,
so we must avoid calling zfs_rangelock_unlock() in that case.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reported-by: Steve Wills <swills@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#10519Closes#10960
Use the same reduced buffer size for lauxlib that is used on Linux.
Fixes panic on HEAD in lua gsub test designed to exhaust stack space.
With this we can remove the special case to reserve more stack space
on FreeBSD.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kyle Evans <kevans@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10959
Without this, the sysctl system calls will acquire a global lock before
invoking the handler. This is noticeable in some situations when
running top(1). The global lock is mostly vestigal but continues to see
some use and so contention is still a problem; until the default sense
of the MPSAFE flag changes, we have to annotate each and every handler.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#10836
== Motivation and Context
The new vdev ashift optimization prevents the removal of devices when
a zfs configuration is comprised of disks which have different logical
and physical block sizes. This is caused because we set 'spa_min_ashift'
in vdev_open and then later call 'vdev_ashift_optimize'. This would
result in an inconsistency between spa's ashift calculations and that
of the top-level vdev.
In addition, the optimization logical ignores the overridden ashift
value that would be provided by '-o ashift=<val>'.
== Description
This change reworks the vdev ashift optimization so that it's only
set the first time the device is configured. It still allows the
physical and logical ahsift values to be set every time the device
is opened but those values are only consulted on first open.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
External-Issue: DLPX-71831
Closes#10932
When expanding a device zfs needs to rescan the partition table to
get the correct size. This can only happen when we're in the kernel
and requires the device to be closed. As part of the rescan, udev is
notified and the device links are removed and recreated. This leave a
window where the vdev code may try to reopen the device before udev
has recreated the link. If that happens, then the pool may end up in
a suspended state.
To correct this, we leverage the BLKPG_RESIZE_PARTITION ioctl which
allows the partition information to be modified even while it's in use.
This ioctl also does not remove the device link associated with the zfs
data partition so it eliminates the race condition that can occur in
the kernel.
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes#10897
When a device removal is in progress, there are 2 locations for the data
that's already been moved: the original location, on the device that's
being removed; and the new location, which is pointed to by the indirect
mapping. When doing leak detection, zdb needs to know about both
locations. To determine what's already been copied, we load the
spacemaps of the removing vdev, omit the blocks that are yet to be
copied, and then use the vdev's remap op to find the new location.
The problem is with an optimization to the spacemap-loading code in zdb.
When processing the log spacemaps, we ignore entries that are not
relevant because they are past the point that's been copied. However,
entries which span the point that's been copied (i.e. they are partly
relevant and partly irrelevant) are processed normally. This can lead
to an illegal spacemap operation, for example if offsets up to 100KB
have been copied, and the spacemap log has the following entries:
ALLOC 50KB-150KB (partly relevant)
FREE 50KB-100KB (entirely relevant)
FREE 100KB-150KB (entirely irrlevant - ignored)
ALLOC 50KB-150KB (partly relevant)
Because the entirely irrelevant entry was ignored, its space remains in
the spacemap. When the last entry is processed, we attempt to add it to
the spacemap, but it partially overlaps with the 100-150KB entry that
was left over.
This problem was discovered by ztest/zloop.
One solution would be to also ignore the irrelevant parts of
partially-irrelevant entries (i.e. when processing the ALLOC 50-150, to
only add 50-100 to the spacemap). However, this commit implements a
simpler solution, which is to remove this optimization entirely. I.e.
to process the entire spacemap log, without regard for the point that's
been copied. After reconstructing the entire allocatable range tree,
there's already code to remove the parts that have not yet been copied.
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-71820
Closes#10920
In zpl_mount_impl, there is:
dmu_objset_hold ; returns with pool & ds held
dsl_pool_rele
sget
dsl_dataset_rele
As spelled out in the "DSL Pool Configuration Lock" in dsl_pool.c,
this requires a long hold.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes#10936
A small bug did slip into initial libzfsbootenv; while storing nvlist
in nvlist, we should make sure the bootenv is using VB_NVLIST format.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes#10937
Prefer acltype=off|posix, retaining the old names as aliases.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10918
It was discovered that dracut scripts and zgenhostid
always generate little-endian /etc/hostid.
This commit provides simple endianess-aware binary
and updates the scripts to use it.
New features include:
-f flag to force overwrite.
-o flag to write to different file (for dracut)
accepting both 0x01234567 and 01234567 values as input
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes#10887Closes#10925
nvlist does allow us to support different data types and systems.
To encapsulate user data to/from nvlist, the libzfsbootenv library is
provided.
Reviewed-by: Arvind Sankar <nivedita@alum.mit.edu>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes#10774
Use ZFS_ENTER and ZFS_EXIT to protect datasets while their mount
devname is being retrieved.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10892Closes#10927
The zfs-initramfs package has never worked as no RPM-based distribution
uses initramfs-tools, which is listed as a dependency of zfs-initramfs.
This would not ordinarily be a problem, as it is only enabled when
/usr/share/initramfs-tools is present, which should not normally be the
case on RPM-based distributions. However, other packages may install
unused files there even if initramfs-tools is not used, so remove this
auto-detection for the rpm-utils target.
This does not fully remove the logic for the zfs-initramfs package. This
splits it out into a separate rpm-utils-initramfs target so that the
Debian builds can still use it.
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Harald van Dijk <harald@gigawatt.nl>
Closes#10898
libzutil depends on libnvpair, but this dependency is undeclared in the
build system. Therefore it isn't possible to make a new command that
depends on libzutil, but does not (directly) depend on libnvpair.
This commit makes this dependency explicit.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reivewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#10915
The lock is taken all the time and as a regular read-write lock
avoidably serves as a mount point-wide contention point.
This forward ports FreeBSD revision r357322.
To quote aforementioned commit:
Sample result doing an incremental -j 40 build:
before: 173.30s user 458.97s system 2595% cpu 24.358 total
after: 168.58s user 254.92s system 2211% cpu 19.147 total
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#10896
This solves issues occurring with a different decimal operator and
keeps the command line interface consistent for all locales .
E.g. `zfs set quota=0.5T`
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Felix Neumärker <xdch47@posteo.de>
Closes#10878
A great deal of time may go by between when mmp_init() is called and
the MMP thread starts, particularly if there are bad devices, because
there is I/O checking configs etc. If this time is too long,
(gethrtime() - mmp_last_write) > mmp_fail_ns
at the time the MMP thread starts. If MMP is configured to suspend
the pool, the pool will be suspended immediately.
This can be seen in issue #10838
The value of mmp_last_write doesn't matter before the mmp thread
starts. To give the MMP thread time to issue and land MMP writes,
initialize mmp_last_write when the MMP thread starts.
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#10873
We only need the kernel interfaces in crypto, not the device node in
cryptodev.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10901
In certain workloads it may be beneficial to reduce wear of L2ARC
devices by not caching MRU metadata and data into L2ARC. This commit
introduces a new tunable l2arc_mfuonly for this purpose.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#10710
When hz > 1000, msec / (1000 / hz) results in division by zero.
I found somewhere in FreeBSD using howmany(msec * hz, 1000) to convert
ms to ticks, avoiding the potential for a zero in the divisor.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes#10894
The pbkdf2iters property is an iteration counter
and should be displayed as plain number rather
than in binary unit.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fabio Buso <buso.fabio@gmail.com>
Closes#10871
On musl libc, zfs failed to compile due to the missing <fcntl.h>
include, which is required for `open()` per POSIX.
This commit add the missing <fcntl.h> include.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Hiếu Lê <leorize+oss@disroot.org>
Closes#10880
On FreeBSD, if priorities divided by four (RQ_PPQ) are equal then
a difference between them is insignificant. In other words,
incrementing pri by only one as on Linux is insufficient.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#10872
Several of the listed library dependencies are not relevant on FreeBSD.
Have ./configure save libraries that are found via pkg-config as
${LIB}_PC and use the configured automake variables instead of hard
coded names so we only get what was actually needed.
While here, update the URL to point at the OpenZFS Github repo.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10869
Commit d4a72f2 which introduced multi-phase scrubs and resilvers
continued the work presented by Nexenta at the 2016 ZFS developer
summit. Update the source to reflect their contribution.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Duplicate io and checksum ereport events can misrepresent that
things are worse than they seem. Ideally the zpool events and the
corresponding vdev stat error counts in a zpool status should be
for unique errors -- not the same error being counted over and over.
This can be demonstrated in a simple example. With a single bad
block in a datafile and just 5 reads of the file we end up with a
degraded vdev, even though there is only one unique error in the pool.
The proposed solution to the above issue, is to eliminate duplicates
when posting events and when updating vdev error stats. We now save
recent error events of interest when posting events so that we can
easily check for duplicates when posting an error.
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes#10861
If a `zfs_space_check_t` other than `ZFS_SPACE_CHECK_NONE` is used with
`dsl_sync_task_nowait()`, the sync task may fail due to ENOSPC.
However, there is no way to notice or communicate this failure, so it's
extremely difficult to use this functionality correctly, and in fact
almost all callers use `ZFS_SPACE_CHECK_NONE`.
This commit removes the `zfs_space_check_t` argument from
`dsl_sync_task_nowait()`, and always uses `ZFS_SPACE_CHECK_NONE`.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#10855
When created, a zthr is given a name to identify it by. This name is
lost when a cancelled zthr is resumed.
Retain the name of a zthr so it can be used when resuming.
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10881
There are a number of places where cv_?_sig is used simply for
accounting purposes but the surrounding code has no ability to
cope with actually receiving a signal. On FreeBSD it is possible
to send signals to individual kernel threads so this could
enable undesirable behavior.
This patch adds routines on Linux that will do the same idle
accounting as _sig without making the task interruptible. On
FreeBSD cv_*_idle are all aliases for cv_*
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#10843
Added comments in following files
with links to Illumos manual pages:
./module/avl/avl.c
./module/nvpair/nvpair.c
./module/os/linux/spl/spl-kstat.c
./module/os/freebsd/spl/spl_kstat.c
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Spencer Kinny <spencerkinny1995@gmail.com>
Closes#5113Closes#10859
Those macros are also defined by the compiler-provided float.h which
will be included later on (at least in the FreeBSD buildworld case) and
triggers these -Werror warnings. Including <float.h> first and only
defining the macros when DBL_DIG/FLT_DIG is missing fixes this problem.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Closes#10864
Use ZFS_MODULE_PARAM for cross-platform tunables in spa_stats.c, and
add update tunables.cfg in tests for the newly supported ones.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10858
Moving spa_stats added the additional burden of supporting
KSTAT_TYPE_IO.
spa_state_addr will always return a valid value regardless of
the value of 'n'. On FreeBSD this will cause an infinite loop
as it relies on the raw ops addr routine to indicate that there
is no more data.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#10860
Allow to rename file systems without remounting if it is possible.
It is possible for file systems with 'mountpoint' property set to
'legacy' or 'none' - we don't have to change mount directory for them.
Currently such file systems are unmounted on rename and not even
mounted back.
This introduces layering violation, as we need to update
'f_mntfromname' field in statfs structure related to mountpoint (for
the dataset we are renaming and all its children).
In my opinion it is worth it, as it allow to update FreeBSD in even
cleaner way - in ZFS-only configuration root file system is ZFS file
system with 'mountpoint' property set to 'legacy'. If root dataset is
named system/rootfs, we can snapshot it (system/rootfs@upgrade), clone
it (system/oldrootfs), update FreeBSD and if it doesn't boot we can
boot back from system/oldrootfs and rename it back to system/rootfs
while it is mounted as /. Before it was not possible, because
unmounting / was not possible.
Authored by: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported by: Matt Macy <mmacy@freebsd.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10839
FreeBSD has the concept of jails, a precursor to Solaris's zones, which
can be mapped to the required zones interface with relative ease. The
previous ZFS implementation in FreeBSD did so, and we should continue
to provide an appropriate implementation in OpenZFS as well.
Move lib/libspl/zone.c into platform code and adopt the correct
implementation for FreeBSD.
While here, prune unused code.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes#10851
FreeBSD's previous ZFS implemented INGLOBALZONE(thread) as
(!jailed((thread)->td_ucred)) and passed curthread to INGLOBALZONE.
We pass curproc instead of curthread, so we can achieve the same effect
with (!jailed((proc)->p_ucred)). The implementation is trivial enough
to fit on a single line in a define. We don't really need a whole
separate function for something that's already macros all the way down.
Eliminate in_globalzone.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes#10851
The previous ZFS implementation on FreeBSD had ifdefs to use jailed()
instead of crgetzoneid() in dsl_dir.c, however we can simply provide an
appropriate definition of crgetzoneid for the same effect.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes#10851
Initially it was considered simplest to stub out all
of the functions on FreeBSD. Now that FreeBSD supports
KSTAT_TYPE_RAW at least some of the functionality should
be made available.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#10842
In zvol_geom_open on first open we need to guarantee
that the namespace lock is held to avoid spurious
failures in zvol_first_open.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#10841
Commit dcdc12e added compatibility code to treat NR_SLAB_RECLAIMABLE_B
as if it were the same as NR_SLAB_RECLAIMABLE. However, the new value
is in bytes while the old value was in pages which means they are not
interchangeable.
The only place the reclaimable slab size is used is as a component of
the calculation done by arc_free_memory(). This function returns the
amount of memory the ARC considers to be free or reclaimable at little
cost. Rather than switch to a new interface to get this value it has
been removed it from the calculation. It is normally a minor component
compared to the number of inactive or free pages, and removing it
aligns the behavior with the FreeBSD version of arc_free_memory().
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#10834
zfs-load-key-DATASET.service was gaining an
After=systemd-journald.socket due to its stdout/stderr going to the
journal (which is the default). systemd-journald.socket has an After
(via RequiresMountsFor=/run/systemd/journal) on -.mount. If the root
filesystem is encrypted, -.mount gets an After
zfs-load-key-DATASET.service.
By setting stdout and stderr to null on the key load services, we avoid
this loop.
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes#10356Closes#10388
When generating units with zfs-mount-generator, if the pool is already
imported, zfs-import.target is not needed. This avoids a dependency
loop on root-on-ZFS systems:
systemd-random-seed.service After (via RequiresMountsFor)
var-lib.mount After
zfs-import.target After
zfs-import-{cache,scan}.service After
cryptsetup.service After
systemd-random-seed.service
Reviewed-by: Antonio Russo <antonio.e.russo@gmail.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes#10388
This will allow an override of auto-detection of distribution, which
is based on checking presence of /etc/*-release files.
Build systems makes a lot of file location assumptions based on
detected distribution.
Some distributions (like gentoo) may prefer explicitly
setting --with-vendor=gentoo to avoid auto-detection.
Since auto-detection checks all files in order, current script may
misdetect even on gentoo system if /etc/redhat-release file is present
Default behavior is unchanged and default is --with-vendor=check
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes#10835
If kernel is compiled with -march=znver1 or -march=znver2 zstd module
compilation will fail due to SSE register return with SSE disabled.
What's interesting, is that -march=skylake also implies -mbmi which
defines __BMI__ but compilation succeeds. It is probably due to
different BMI implementations on AMD and INTEL processors and the
way compiler uses instructions.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes#10758Closes#10829
There are a ton of zfs-* and zpool-* man pages. This adds them to
the SEE ALSO section so that people can more quickly look through
what all the options are, now that the pages have been split.
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: John-Mark Gurney <jmg@funkthat.com>
Closes#10589
Because dnode_sync_free_range() must drop dn_mtx during its processing,
using it as a callback to range_tree_vacate() is not safe. No other
operations (besides destroy) are allowed once range_tree_vacate() has
begun, and dropping dn_mtx would leave a window open for another thread
to observe that invalid (and unsafe) state via dnode_block_freed().
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Patrick Mooney <pmooney@oxide.computer>
Closes#10708Closes#10823
The zfs/sa.c source file accidentally includes sys/dnode.h twice.
Remove the second occurrence.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#10816Closes#10819
The root cause of the issue is that we only occasionally do as the
comments in the code suggest and actually ignore the %recv dataset when
it comes to filesystem limit tracking. Specifically, the only time we
ignore it is when initializing the filesystem and snapshot limit values;
when creating a new %recv dataset or deleting one, we always update
the bookkeeping. This causes a problem if you init the fs count on a
filesystem that already has a %recv dataset, since the bookmarking
will be decremented but not incremented. This is resolved in this
patch by simply always tracking the %recv dataset as a child.
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#10791
The #pragma ident is a historical relic and not needed any more, this
pragma is actually unknown for common compilers and is only causing
trouble.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes#10810
Since L2ARC buffers are not evicted on memory pressure, too large
amount of headers on system with irrationally large L2ARC can render
it slow or even unusable. This change limits L2ARC writes and
rebuild if unevictable L2ARC-only headers reach dangerous level.
While there, call arc_adapt() on L2ARC rebuild, so that it could
properly grow arc_c, reflecting potentially significant ARC size
increase and avoiding slow growth with hopeless eviction attempts
later when "overflow" is detected.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes#10765
2020-08-27 16:06:28 -07:00
716 changed files with 27607 additions and 6867 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.