The presence of indirect vdevs was confusing get_redundancy(), which
considered a pool with e.g. only mirror top-level vdevs and at least
one indirect vdev (due to the removal of a previous vdev) as already
having a broken redundancy, which is not the case. This lead to the
possibility of compromising the redundancy of a pool by adding
mismatched vdevs without requiring the use of `-f`, and with no
visible notice or warning.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stéphane Lesimple <speed47_github@speed47.net>
Closes#13705Closes#13711
Make dd_snap_cmtime property persistent across mount and unmount
operations by storing in ZAP and restore the value from ZAP on hold
into dd_snap_cmtime instead of updating it.
Expose dd_snap_cmtime as 'snapshots_changed' property that provides a
mechanism to quickly determine whether snapshot list for dataset has
changed without having to mount a dataset or iterate the snapshot list.
It specifies the time at which a snapshot for a dataset was last
created or deleted. This allows us to be more efficient how often we
query snapshots.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#13635
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes#9372
Not all Linux distribution kernels enable io_uring support by
default. Update the run time check to verify that the booted
kernel was built with CONFIG_IO_URING=y.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Co-authored-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13648Closes#13685
The mountpoint may still be busy when the `zfs unmount -a` command
is run causing an unexpected failure. Retry the unmount a couple
of times since it should not remain busy for long.
19:10:50.29 NOTE: Reading state from .../inheritance/state021.cfg
19:10:50.32 cannot unmount '/TESTPOOL': pool or dataset is busy
19:10:50.32 ERROR: zfs unmount -a exited 1
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13686
When scrubbing a raidz/draid pool, which contains a replacing or
sparing mirror with multiple online children, only one child will
be read. This is not normally a serious concern because the DTL
records are used to determine where a good copy of the data is.
As long as the data can be read from one child the mirror vdev
will use it to repair gaps in any of its children. Furthermore,
even if the data which was read is corrupt the raidz code will
detect this and issue its own repair I/O to correct the damage
in the mirror vdev.
However, in the scenario where the DTL is wrong due to silent
data corruption (say due to overwriting one child) and the scrub
happens to read from a child with good data, then the other damaged
mirror child will not be detected nor repaired.
While this is possible for both raidz and draid vdevs, it's most
pronounced when using draid. This is because by default the zed
will sequentially rebuild a draid pool to a distributed spare,
and the distributed spare half of the mirror is always preferred
since it delivers better performance. This means the damaged
half of the mirror will go undetected even after scrubbing.
For system administrations this behavior is non-intuitive and in
a worst case scenario could result in the only good copy of the
data being unknowingly detached from the mirror.
This change resolves the issue by reading all replacing/sparing
mirror children when scrubbing. When the BP isn't available for
verification, then compare the data buffers from each child. They
must all be identical, if not there's silent damage and an error
is returned to prompt the top-level vdev to issue a repair I/O to
rewrite the data on all of the mirror children. Since we can't
tell which child was wrong a checksum error is logged against the
replacing or sparing mirror vdev.
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13555
This allows ZFS datasets to be delegated to a user/mount namespace
Within that namespace, only the delegated datasets are visible
Works very similarly to Zones/Jailes on other ZFS OSes
As a user:
```
$ unshare -Um
$ zfs list
no datasets available
$ echo $$
1234
```
As root:
```
# zfs list
NAME ZONED MOUNTPOINT
containers off /containers
containers/host off /containers/host
containers/host/child off /containers/host/child
containers/host/child/gchild off /containers/host/child/gchild
containers/unpriv on /unpriv
containers/unpriv/child on /unpriv/child
containers/unpriv/child/gchild on /unpriv/child/gchild
# zfs zone /proc/1234/ns/user containers/unpriv
```
Back to the user namespace:
```
$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
containers 129M 47.8G 24K /containers
containers/unpriv 128M 47.8G 24K /unpriv
containers/unpriv/child 128M 47.8G 128M /unpriv/child
```
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will.andrews@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Mateusz Piotrowski <mateusz.piotrowski@klarasystems.com>
Sponsored-by: Buddy <https://buddy.works>
Closes#12263
Add support for the kernel's block multiqueue (blk-mq) interface in
the zvol block driver. blk-mq creates multiple request queues on
different CPUs rather than having a single request queue. This can
improve zvol performance with multithreaded reads/writes.
This implementation uses the blk-mq interfaces on 4.13 or newer
kernels. Building against older kernels will fall back to the
older BIO interfaces.
Note that you must set the `zvol_use_blk_mq` module param to
enable the blk-mq API. It is disabled by default.
In addition, this commit lets the zvol blk-mq layer process whole
`struct request` IOs at a time, rather than breaking them down
into their individual BIOs. This reduces dbuf lock contention
and overhead versus the legacy zvol submit_bio() codepath.
sequential dd to one zvol, 8k volblocksize, no O_DIRECT:
legacy submit_bio() 292MB/s write 453MB/s read
this commit 453MB/s write 885MB/s read
It also introduces a new `zvol_blk_mq_chunks_per_thread` module
parameter. This parameter represents how many volblocksize'd chunks
to process per each zvol thread. It can be used to tune your zvols
for better read vs write performance (higher values favor write,
lower favor read).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#13148
Issue #12483
This commit adds BLAKE3 checksums to OpenZFS, it has similar
performance to Edon-R, but without the caveats around the latter.
Homepage of BLAKE3: https://github.com/BLAKE3-team/BLAKE3
Wikipedia: https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE3
Short description of Wikipedia:
BLAKE3 is a cryptographic hash function based on Bao and BLAKE2,
created by Jack O'Connor, Jean-Philippe Aumasson, Samuel Neves, and
Zooko Wilcox-O'Hearn. It was announced on January 9, 2020, at Real
World Crypto. BLAKE3 is a single algorithm with many desirable
features (parallelism, XOF, KDF, PRF and MAC), in contrast to BLAKE
and BLAKE2, which are algorithm families with multiple variants.
BLAKE3 has a binary tree structure, so it supports a practically
unlimited degree of parallelism (both SIMD and multithreading) given
enough input. The official Rust and C implementations are
dual-licensed as public domain (CC0) and the Apache License.
Along with adding the BLAKE3 hash into the OpenZFS infrastructure a
new benchmarking file called chksum_bench was introduced. When read
it reports the speed of the available checksum functions.
On Linux: cat /proc/spl/kstat/zfs/chksum_bench
On FreeBSD: sysctl kstat.zfs.misc.chksum_bench
This is an example output of an i3-1005G1 test system with Debian 11:
implementation 1k 4k 16k 64k 256k 1m 4m
edonr-generic 1196 1602 1761 1749 1762 1759 1751
skein-generic 546 591 608 615 619 612 616
sha256-generic 240 300 316 314 304 285 276
sha512-generic 353 441 467 476 472 467 426
blake3-generic 308 313 313 313 312 313 312
blake3-sse2 402 1289 1423 1446 1432 1458 1413
blake3-sse41 427 1470 1625 1704 1679 1607 1629
blake3-avx2 428 1920 3095 3343 3356 3318 3204
blake3-avx512 473 2687 4905 5836 5844 5643 5374
Output on Debian 5.10.0-10-amd64 system: (Ryzen 7 5800X)
implementation 1k 4k 16k 64k 256k 1m 4m
edonr-generic 1840 2458 2665 2719 2711 2723 2693
skein-generic 870 966 996 992 1003 1005 1009
sha256-generic 415 442 453 455 457 457 457
sha512-generic 608 690 711 718 719 720 721
blake3-generic 301 313 311 309 309 310 310
blake3-sse2 343 1865 2124 2188 2180 2181 2186
blake3-sse41 364 2091 2396 2509 2463 2482 2488
blake3-avx2 365 2590 4399 4971 4915 4802 4764
Output on Debian 5.10.0-9-powerpc64le system: (POWER 9)
implementation 1k 4k 16k 64k 256k 1m 4m
edonr-generic 1213 1703 1889 1918 1957 1902 1907
skein-generic 434 492 520 522 511 525 525
sha256-generic 167 183 187 188 188 187 188
sha512-generic 186 216 222 221 225 224 224
blake3-generic 153 152 154 153 151 153 153
blake3-sse2 391 1170 1366 1406 1428 1426 1414
blake3-sse41 352 1049 1212 1174 1262 1258 1259
Output on Debian 5.10.0-11-arm64 system: (Pi400)
implementation 1k 4k 16k 64k 256k 1m 4m
edonr-generic 487 603 629 639 643 641 641
skein-generic 271 299 303 308 309 309 307
sha256-generic 117 127 128 130 130 129 130
sha512-generic 145 165 170 172 173 174 175
blake3-generic 81 29 71 89 89 89 89
blake3-sse2 112 323 368 379 380 371 374
blake3-sse41 101 315 357 368 369 364 360
Structurally, the new code is mainly split into these parts:
- 1x cross platform generic c variant: blake3_generic.c
- 4x assembly for X86-64 (SSE2, SSE4.1, AVX2, AVX512)
- 2x assembly for ARMv8 (NEON converted from SSE2)
- 2x assembly for PPC64-LE (POWER8 converted from SSE2)
- one file for switching between the implementations
Note the PPC64 assembly requires the VSX instruction set and the
kfpu_begin() / kfpu_end() calls on PowerPC were updated accordingly.
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Closes#10058Closes#12918
The EXTRA_DIST variable is ignored when used in the FALSE conditional
of a Makefile.am. This results in the `make dist` target omitting
these files from the generated tarball unless CONFIG_USER is defined.
This issue can be avoided by switching to use the dist_noinst_DATA
variable which is handled as expected by autoconf.
This change also adds support for --with-config=dist as an alias
for --with-config=srpm and updates the GitHub workflows to use it.
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13459Closes#13505
Commit 63b18e4 fixed an issue in zpl_aio_write() to make sure that
kiocb->ki_pos was updated correctly when opening a file with O_APPEND.
Adding a test to verify O_APPEND functionality with lseek can make
sure that all other distros/kernel versions also have the correct
behavior.
Also moved the threadappends_001_pos test into this append test
directory in functional ZTS directory. This way the two append tests
are together for organization purposes.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes#13424
We drop /multiple/ seconds off the generation, a dozen off a clean
rebuild, 185 files, and trivialise the distribution,
which can now be trivially generated via the provided snippets
Dist diff:
-zfs-2.1.99/tests/zfs-tests/tests/functional/pam/utilities.kshlib
+zfs-2.1.99/tests/zfs-tests/tests/functional/pam/utilities.kshlib.in
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13316
Only down to tests/zfs-tests/tests, but pull out C programs into the
main Makefile ‒ this means we get correct dependency tracking for all
programs (and parallelise across them)
dist diff:
-zfs-2.1.99/tests/zfs-tests/tests/stress/
-zfs-2.1.99/tests/zfs-tests/tests/stress/Makefile.am
-zfs-2.1.99/tests/zfs-tests/tests/stress/Makefile.in
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13316
No installation diff, dist lost
-zfs-2.1.99/cmd/fsck_zfs/fsck.zfs
which was distributed erroneously, since it's generated
Also clean gitrev on clean
Also add -e 'any possible bashisms' to default checkbashisms flags,
and fully parallelise it and shellcheck, and it works out-of-tree, too
Also align the Release in the dist META file correctly
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13316
Page writebacks with WB_SYNC_NONE can take several seconds to complete
since they wait for the transaction group to close before being
committed. This is usually not a problem since the caller does not
need to wait. However, if we're simultaneously doing a writeback
with WB_SYNC_ALL (e.g via msync), the latter can block for several
seconds (up to zfs_txg_timeout) due to the active WB_SYNC_NONE
writeback since it needs to wait for the transaction to complete
and the PG_writeback bit to be cleared.
This commit deals with 2 cases:
- No page writeback is active. A WB_SYNC_ALL page writeback starts
and even completes. But when it's about to check if the PG_writeback
bit has been cleared, another writeback with WB_SYNC_NONE starts.
The sync page writeback ends up waiting for the non-sync page
writeback to complete.
- A page writeback with WB_SYNC_NONE is already active when a
WB_SYNC_ALL writeback starts. The WB_SYNC_ALL writeback ends up
waiting for the WB_SYNC_NONE writeback.
The fix works by carefully keeping track of active sync/non-sync
writebacks and committing when beneficial.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shaan Nobee <sniper111@gmail.com>
Closes#12662Closes#12790
Increase the default allowed maximum recordsize from 1M to 16M.
As described in the zfs(4) man page, there are significant costs
which need to be considered before using very large blocks.
However, there are scenarios where they make good sense and
it should no longer be necessary to artificially restrict their
use behind a module option.
Note that for 32-bit platforms we continue to leave this
restriction in place due to the limited virtual address space
available (256-512MB). On these systems only a handful
of blocks could be cached at any one time severely impacting
performance and potentially stability.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12830Closes#13302
Currently, determining which datasets are affected by corruption is
a manual process.
The primary difficulty in reporting the list of affected snapshots is
that since the error was initially found, the snapshot where the error
originally occurred in, may have been deleted. To solve this issue, we
add the ID of the head dataset of the original snapshot which the error
was detected in, to the stored error report. Then any time a filesystem
is deleted, the errors associated with it are deleted as well. Any time
a clone promote occurs, we modify reports associated with the original
head to refer to the new head. The stored error reports are identified
by this head ID, the birth time of the block which the error occurred
in, as well as some information about the error itself are also stored.
Once this information is stored, we can find the set of datasets
affected by an error by walking back the list of snapshots in the given
head until we find one with the appropriate birth txg, and then traverse
through the snapshots of the clone family, terminating a branch if the
block was replaced in a given snapshot. Then we report this information
back to libzfs, and to the zpool status command, where it is displayed
as follows:
pool: test
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:00:00 with 800 errors on Fri Dec 3
08:27:57 2021
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
sdb ONLINE 0 0 1.58K
errors: Permanent errors have been detected in the following files:
test@1:/test.0.0
/test/test.0.0
/test/1clone/test.0.0
A new feature flag is introduced to mark the presence of this change, as
well as promotion and backwards compatibility logic. This is an updated
version of #9175. Rebase required fixing the tests, updating the ABI of
libzfs, updating the man pages, fixing bugs, fixing the error returns,
and updating the old on-disk error logs to the new format when
activating the feature.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Co-authored-by: TulsiJain <tulsi.jain@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#9175Closes#12812
It turns out, no, in fact, ZERO_RANGE and PUNCH_HOLE do
have differing semantics in some ways - in particular,
one requires KEEP_SIZE, and the other does not.
Also added a zero-range test to catch this, corrected a flaw
that made the punch-hole test succeed vacuously, and a typo
in file_write.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#13329Closes#13338
Originally it was thought it would be useful to split up the kmods
by functionality. This would allow external consumers to only load
what was needed. However, in practice we've never had a case where
this functionality would be needed, and conversely managing multiple
kmods can be awkward. Therefore, this change merges all but the
spl.ko kmod in to a single zfs.ko kmod.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13274
Found with -Wunused-but-set-variable on Clang trunk
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13304
The auto_spare_multiple.ksh test may incorrectly fail for a similar
reason as the auto_spare_shared.ksh test. Add it to known list of
exceptions which should be retried to prevent failures in the CI.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13318
The redundancy_draid_spare1.ksh and redundancy_draid_spare3.ksh test
cases are a little to strict for the sequential resilver case. While
unlikely it is possible that a handful of correctable checksum errors
will be reported resulting in a test failure. Update the zts-report.py
script to allow this the test case to be retried if requested.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13318
What remains is a bunch of anonymous untraceable /tmp/tmp.XXXXXXXXXX
files and bak.root.receive.staff1.3835 from an error branch, testdir.1,
testdir.3, and testroot454470 (with children) in testroot
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13259
As found by
git -C tests/ grep ^function | grep -vFe '.lua:' -e '.zcp:' | while IFS=":$IFS" read -r _ _ fn _; do [ $(git -C tests/ grep -wF $fn | head -2 | wc -l) -eq 1 ] && echo $fn; done
after all rounds this comes out to, sorted:
check_slog_state
chgusr_exec
cksum_files
cleanup_pools
compare_modes
count_ACE
dataset_set_defaultproperties
ds_is_snapshot
get_ACE
get_group
get_min
get_mode
get_owner
get_rand_checksum
get_rand_checksum_any
get_rand_large_recsize
get_rand_recsize
get_user_group
getitem
indirect_vdev_mapping_size
is_dilos
log_noresult
log_notinuse
log_other
log_timed_out
log_uninitiated
log_warning
num_jobs_by_cpu
plus_sign_check_l
plus_sign_check_v
record_cksum
rwx_node
seconds_mmp_waits_for_activity
set_cur_usr
setup_mirrors
setup_raidzs
showshares_smb
zfs_zones_setup
This, of course, doesn't catch recursive ones, or ones that log with
their own function name as a prefix, but
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13259
This is a valid configuration and both (a) skips the tests if it's
unbuilt/not installed and (b) makes it work even if installed outside
the system directory (like in /u/l/l/s instead of /l/s)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13259
Otherwise, they leak past the tests and contaminate the running system,
breaking coredumps entirely
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Yannick Le Pennec <yannick.lepennec@live.fr>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13259
Original error:
23:47:40.59 SUCCESS: eval zfs receive -dFv testpool2 < /mnt/testroot/backdir-rsend/pool-final-p
23:47:40.61 1,23d0
23:47:40.61 < type filesystem -
23:47:40.61 < origin POOL@psnap -
23:47:40.61 < volblocksize - -
23:47:40.61 < acltype nfsv4 inherited from POOL
23:47:40.61 < dnodesize legacy inherited from POOL
23:47:40.61 < atime off local
23:47:40.61 < canmount off local
23:47:40.61 < checksum off local
23:47:40.61 < compression off local
23:47:40.61 < copies 3 local
23:47:40.61 < devices off local
23:47:40.61 < exec off local
23:47:40.61 < quota none default
23:47:40.61 < readonly on local
23:47:40.61 < recordsize 128K local
23:47:40.61 < reservation none default
23:47:40.61 < setuid off local
23:47:40.61 < snapdir hidden local
23:47:40.61 < version 5 -
23:47:40.61 < volsize - -
23:47:40.61 < xattr off local
23:47:40.61 < mountpoint /PREFIX inherited from POOL
23:47:40.61 < jailed on local
23:47:40.62 cannot open 'testpool2/pclone': dataset does not exist
23:47:40.62 ERROR: cmp_ds_prop testpool/pclone testpool2/pclone exited 1
So: (a) actually send all the datasets in -p mode and
(b) drop origin for clones sent with -p:
00:38:05.46 SUCCESS: eval zfs receive -dFv testpool2 < /mnt/testroot/backdir-rsend/pool-final-p
00:38:05.48 2c2
00:38:05.48 < origin POOL@psnap
00:38:05.48 ---
00:38:05.48 > origin POOL
00:38:05.49 ERROR: cmp_ds_prop testpool/pclone testpool2/pclone nosource exited 1
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13250Closes#13259
This fixes rsend_012_pos:
20:28:50.50 SUCCESS: eval zfs receive -d -F testpool2 < /mnt/testroot/backdir-rsend/pool-final-R
20:28:50.53 4,6c4,6
20:28:50.53 < acltype off local
20:28:50.53 < dnodesize 4k local
20:28:50.53 < atime off local
20:28:50.53 ---
20:28:50.53 > acltype off received
20:28:50.53 > dnodesize 4k received
20:28:50.53 > atime off received
20:28:50.53 8,13c8,13
20:28:50.53 < checksum sha256 local
20:28:50.53 < compression off local
20:28:50.53 < copies 2 local
20:28:50.53 < devices on local
20:28:50.53 < exec on local
20:28:50.53 < quota 1G local
20:28:50.53 ---
20:28:50.53 > checksum sha256 received
20:28:50.53 > compression off received
20:28:50.53 > copies 2 received
20:28:50.53 > devices on received
20:28:50.53 > exec on received
20:28:50.53 > quota 1G received
20:28:50.53 15c15
20:28:50.53 < recordsize 128K local
20:28:50.53 ---
20:28:50.53 > recordsize 128K received
20:28:50.53 17,18c17,18
20:28:50.53 < setuid off local
20:28:50.53 < snapdir visible local
20:28:50.53 ---
20:28:50.53 > setuid off received
20:28:50.53 > snapdir visible received
20:28:50.53 ERROR: cmp_ds_prop testpool testpool2 exited 1
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13250Closes#13259
This also fixes line welding in test error output
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13259
This confers an >10x speedup on t/z-t/cmd builds (12s -> 1.1s),
gets rid of 23 redundant identical automake specs and gitignores,
and groups the binaries with their common headers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13259
Also: actually accept all the flags in write_d_a
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13259
The users spew udevadm ENOENTs on FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13259
This only worked by accident on FreeBSD, where sum doesn't take flags
POSIX guarantees us cksum (Ethernet CRC) ‒ use that instead
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13259
This thoroughly destroys logapi and races to the log files horribly
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13259
Previously, they'd all be skipped on FreeBSD where share is showmount
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13259
Parts of the Linux kernel build system struggle with _Noreturn. This
results in the following warnings when building on RHEL 8.5, and likely
other environments. Switch to using the __attribute__((noreturn)).
warning: objtool: dbuf_free_range()+0x2b8:
return with modified stack frame
warning: objtool: dbuf_free_range()+0x0:
stack state mismatch: cfa1=7+40 cfa2=7+8
...
WARNING: EXPORT symbol "arc_buf_size" [zfs.ko] version generation
failed, symbol will not be versioned.
WARNING: EXPORT symbol "spa_open" [zfs.ko] version generation
failed, symbol will not be versioned.
...
Additionally, __thread_exit() has been renamed spl_thread_exit() and
made a static inline function. This was needed because the kernel
will generate a warning for symbols which are __attribute__((noreturn))
and then exported with EXPORT_SYMBOL.
While we could continue to use _Noreturn in user space I've also
switched it to __attribute__((noreturn)) purely for consistency
throughout the code base.
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13238
Add a -K option to the test suite to log each test name to /dev/kmsg
(on Linux), so if there's a kernel warning we'll be able to match
it up to a particular test.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13227
Add support for a -exclude/-X option to `zfs send` to allow dataset
hierarchies to be excluded.
Snapshots can be excluded using a channel program; however,
this can result in failures with 'zfs send -R'; this option allows
them to be excluded. Fortunately, this required a change only to
cmd/zfs/zfs_main.c, using the already-existing callback argument
to zfs_send() that is currently unused.
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sean Eric Fagan <kithrup@mac.com>
Signed-off-by: Sean Eric Fagan <kithrup@mac.com>
Closes#13158
bcopy() has a confusing argument order and is actually a move, not a
copy; they're all deprecated since POSIX.1-2001 and removed in -2008,
and we shim them out to mem*() on Linux anyway
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12996
The send_partial_dataset test verifies that partial send streams
can be resumed. This test may occasionally fail with a "token is
corrupt" error if the `mess_send_file` truncates a send stream
below the size of the DRR_BEGIN record. Update this function to
set a minimum size to ensure there is at least an intact DDR_BEGIN
record which allows for the receiving dataset to be created.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13177
Add physical device size/capacity only for physical devices in
'zpool list -v' instead of displaying "-" in the SIZE column.
This would make it easier to see the individual device capacity and
to determine which spares are large enough to replace which devices.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes#12561Closes#13106
Instead of writing to "devnull" and rming it later, just
> /dev/null to not have to cleanup later.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13133
With the zfs_destroy ZTS test case the setup script needed to call
default_setup_noexit so compression could be turned off. Also, added
log_must to setting compression off in the reservation setup script for
turning off compression.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes#13173
The regular default_raidz_setup function in the ZFS test suite called
log_pass after creating the zpool. However, with compression now being
on by default 56fa4aa, there is no way to turn compression off in the
setup.ksh scripts when creating a raidz VDEV. The addition of the
function default_raidz_setup_noexit allows for a raidz VDEV to be
created, additional zfs property settings to be applied and for the
setup.ksh script itself to call log_pass.
With the addition of default_raidz_setup_noexit some stray log_pass
calls were removed from any setup.ksh scripts that call
default_raidz_setup.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes#13173
When unlinking multiple files from a pool at 100% capacity, it was
possible for ENOSPC to be returned after the first unlink. e.g.
rm -f /mnt/fs/test1.0.0 /mnt/fs/test1.1.0 /mnt/fs/test1.2.0
rm: cannot remove '/mnt/fs/test1.1.0': No space left on device
rm: cannot remove '/mnt/fs/test1.2.0': No space left on device
After waiting for the pending deferred frees from the first unlink to
be processed the remaining files can then be unlinked. This is caused
by the quota limit in dsl_dir_tempreserve_impl() being temporarily
decreased to the allocatable pool capacity less any deferred free
space.
This is resolved using the existing mechanism of returning ERESTART
when over quota as long as we know enough space will shortly be
available after processing the pending deferred frees.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13172
ZFS allows to update and retrieve additional file level attributes for
FreeBSD. This commit allows additional file level attributes to be
updated and retrieved for Linux. These include the flags stored in the
upper half of z_pflags only.
Two new IOCTLs have been added for this purpose. ZFS_IOC_GETDOSFLAGS
can be used to retrieve the attributes, while ZFS_IOC_SETDOSFLAGS can
be used to update the attributes.
Attributes that are allowed to be updated include ZFS_IMMUTABLE,
ZFS_APPENDONLY, ZFS_NOUNLINK, ZFS_ARCHIVE, ZFS_NODUMP, ZFS_SYSTEM,
ZFS_HIDDEN, ZFS_READONLY, ZFS_REPARSE, ZFS_OFFLINE and ZFS_SPARSE.
Flags can be or'd together while calling ZFS_IOC_SETDOSFLAGS.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes#13118
A function that returns with no value is a different thing from a
function that doesn't return at all. Those are two orthogonal
concepts, commonly confused.
pthread_create(3) expects a pointer to a start routine that has a
very precise prototype:
void *(*start_routine)(void *);
However, other thread functions, such as kernel ones, expect:
void (*start_routine)(void *);
Providing a different one is incorrect, and has only been working
because the ABIs happen to produce a compatible function.
We should use '_Noreturn void', since it's the natural type, and
then provide a '_Noreturn void *' wrapper for pthread functions.
For consistency, replace most cases of __NORETURN or
__attribute__((noreturn)) by _Noreturn. _Noreturn is understood
by -std=gnu89, so it should be safe to use everywhere.
Ref: https://github.com/openzfs/zfs/pull/13110#discussion_r808450136
Ref: https://software.codidact.com/posts/285972
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Closes#13120
A simple change, but so many tests break with it,
and those are the majority of this.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#13078
Related to commit 90b77a036. Retry the `zpool export` if the pool
is "busy" indicating there is a process accessing the mount point.
This can happen after an import, allowing it to be retried will
avoid spurious test failures.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13169
While "diff -r" is the most straightforward way of comparing directory
trees for differences, it has two major issues:
* File metadata is not compared, which means that subtle bugs may be
missed even if a test is written that exercises the buggy behaviour.
* diff(1) doesn't know how to compare special files -- it assumes they
are always different, which means that a test using diff(1) on
special files will always fail (resulting in such tests not being
added).
rsync can be used in a very similar manner to diff (with the -ni flags),
but has the additional benefit of being able to detect and resolve many
more differences between directory trees. In addition, rsync has a
standard set of features and flags while diffs feature set depends on
whether you're using GNU or BSD binutils.
Note that for several of the test cases we expect that file timestamps
will not match. For example, the ctime for a file creation or modify
event is stored in the intent log but not the mtime. Thus when replaying
the log the correct ctime is set but the current mtime is used. This is
the expected behavior, so to prevent these tests from failing, there's a
replay_directory_diff function which ignores those kinds of changes.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes#12588
As previously noted in #12272 the receive-o-x_props_override.ksh test
reliably fails on FreeBSD. Since we don't expect this test to pass
move the exception from the "maybe" to "known" section. This way we
don't retry the FAILED test when it is not expected to pass.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13167
On FreeBSD pools are not allowed to be created using vdevs which are
backed by ZFS volumes. This configuration is not recommended for any
supported platform, nevertheless the largest_pool_001_pos.ksh test
case makes use of it as a convenience. This causes the test case to
fail reliably on FreeBSD. The layout is still tolerated on Linux
so only perform this test on Linux.
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13166
- Kmemleak `clear` is invoked right before every test case run.
- Kmemleak `scan` is requested right after each test case is finished.
- Kmemleak instrumentation is not used for
setup/cleanup/pretest/posttest/failsafe stages to shorten the test
case execution time.
- Kmemleak periodic scan is disabled (`scan=0`) before the test suite
run to avoid interfering with the on-demand scan results.
- There are unavoidable potential false positives coming from kernel
areas other than OpenZFS module.
- The ZTS with kmemleak enabled duration is increased by ~50%.
Example run
```
Running Time: 07:12:13
Percent passed: 98.3%
unreferenced object 0xffff9da82aea5410 (size 80):
comm "kworker/u32:10", pid 942206, jiffies 4296749716 (age 2615.516s)
hex dump (first 32 bytes):
00 30 30 00 00 00 00 00 ff 8f 30 00 00 00 00 00 .00.......0.....
51 e6 77 05 a8 9d ff ff 00 00 00 00 00 00 00 00 Q.w.............
backtrace:
[<000000005cf1fea2>] alloc_extent_state+0x1d/0xb0 [btrfs]
[<0000000083f78ae5>] set_extent_bit+0x2ff/0x670 [btrfs]
[<00000000de29249e>] lock_extent_bits+0x6b/0xa0 [btrfs]
[<00000000b241f424>] lock_and_cleanup_extent_if_need+0xaf/0x1c0
[btrfs]
[<0000000093ca72b5>] btrfs_buffered_write+0x297/0x7d0 [btrfs]
[<000000002c2938c8>] btrfs_file_write_iter+0x127/0x390 [btrfs]
[<00000000b888f720>] do_iter_readv_writev+0x152/0x1b0
[<00000000320f0bcc>] do_iter_write+0x7c/0x1c0
[<000000000b5a8fe0>] lo_write_bvec+0x62/0x150 [loop]
[<000000009aa03c73>] loop_process_work+0x250/0xbd0 [loop]
[<00000000c7487d8a>] process_one_work+0x1f1/0x390
[<000000000b236831>] worker_thread+0x53/0x3e0
[<0000000023cb3e57>] kthread+0x127/0x150
[<000000002d48676a>] ret_from_fork+0x22/0x30
```
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes#13084
As such, there are no specific synchronous semantics defined for
the xattrs. But for xattr=on, it does log to ZIL and zil_commit() is
done, if sync=always is set on dataset. This provides sync semantics
for xattr=on with sync=always set on dataset.
For the xattr=sa implementation, it doesn't log to ZIL, so, even with
sync=always, xattrs are not guaranteed to be synced before xattr call
returns to caller. So, xattr can be lost if system crash happens, before
txg carrying xattr transaction is synced.
This change adds xattr=sa logging to ZIL on xattr create/remove/update
and xattrs are synced to ZIL (zil_commit() done) for sync=always.
This makes xattr=sa behavior similar to xattr=on.
Implementation notes:
The actual logging is fairly straight-forward and does not warrant
additional explanation.
However, it has been 14 years since we last added new TX types
to the ZIL [1], hence this is the first time we do it after the
introduction of zpool features. Therefore, here is an overview of the
feature activation and deactivation workflow:
1. The feature must be enabled. Otherwise, we don't log the new
record type. This ensures compatibility with older software.
2. The feature is activated per-dataset, since the ZIL is per-dataset.
3. If the feature is enabled and dataset is not for zvol, any append to
the ZIL chain will activate the feature for the dataset. Likewise
for starting a new ZIL chain.
4. A dataset that doesn't have a ZIL chain has the feature deactivated.
We ensure (3) by activating on the first zil_commit() after the feature
was enabled. Since activating the features requires waiting for txg
sync, the first zil_commit() after enabling the feature will be slower
than usual. The downside is that this is really a conservative
approximation: even if we never append a 'TX_SETSAXATTR' to the ZIL
chain, we pay the penalty for feature activation. The upside is that the
user is in control of when we pay the penalty, i.e., upon enabling the
feature.
We ensure (4) by hooking into zil_sync(), where ZIL destroy actually
happens.
One more piece on feature activation, since it's spread across
multiple functions:
zil_commit()
zil_process_commit_list()
if lwb == NULL // first zil_commit since zil_open
zil_create()
if no log block pointer in ZIL header:
if feature enabled and not active:
// CASE 1
enable, COALESCE txg wait with dmu_tx that allocated the
log block
else // log block was allocated earlier than this zil_open
if feature enabled and not active:
// CASE 2
enable, EXPLICIT txg wait
else // already have an in-DRAM LWB
if feature enabled and not active:
// this happens when we enable the feature after zil_create
// CASE 3
enable, EXPLICIT txg wait
[1] da6c28aaf6
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes#8768Closes#9078
As explained by the disclaimer in the test case,
"This test can fail since nothing guarantees that old
MOS blocks aren't overwritten."
This behavior is expected and correct, but results in a
flaky test case which is problematic for the CI. The best
we can do to resolve this is to retry the sub-test which
failed when the MOS blocks have clearly been overwritten.
When testing failures were rare enough that a single retry
should normally be sufficient. However, we allow up to
five for good measure.
Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13119
When attaching a vdev to a mirror wait for the resilver to complete
before invoking `zdb` to inspect the pool. This ensures the pool is
essentially idle which allows `zdb` to open the imported pool reliably.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13112Closes#6935
Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with
ashift=12 results to failure when mounting pool2/encrypted (Input/Output
error). Notably, the opposite, raw sending from a greater ashift to a
lower one does not fail.
This happens because zio_compress_write() falsely checks only
ZIO_FLAG_RAW_COMPRESS and not ZIO_FLAG_RAW_ENCRYPT which is also set in
encrypted raw send streams. In this case it rounds up the psize and if
not equal to the zio->io_size it modifies the block by zeroing out
the extra bytes. Because this happens in a SA attr. registration object
(type=46), the decryption fails upon mounting the filesystem, and zpool
status falsely reports an error.
Fix this by checking both ZIO_FLAG_RAW_COMPRESS and ZIO_FLAG_RAW_ENCRYPT
before deciding whether to zero-pad a block.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#13067Closes#13074
ZFS on Linux originally implemented xattr namespaces in a way that is
incompatible with other operating systems. On illumos, xattrs do not
have namespaces. Every xattr name is visible. FreeBSD has two
universally defined namespaces: EXTATTR_NAMESPACE_USER and
EXTATTR_NAMESPACE_SYSTEM. The system namespace is used for protected
FreeBSD-specific attributes such as MAC labels and pnfs state. These
attributes have the namespace string "freebsd:system:" prefixed to the
name in the encoding scheme used by ZFS. The user namespace is used
for general purpose user attributes and obeys normal access control
mechanisms. These attributes have no namespace string prefixed, so
xattrs written on illumos are accessible in the user namespace on
FreeBSD, and xattrs written to the user namespace on FreeBSD are
accessible by the same name on illumos.
Linux has several xattr namespaces. On Linux, ZFS encodes the
namespace in the xattr name for every namespace, including the user
namespace. As a consequence, an xattr in the user namespace with the
name "foo" is stored by ZFS with the name "user.foo" and therefore
appears on FreeBSD and illumos to have the name "user.foo" rather than
"foo". Conversely, none of the xattrs written on FreeBSD or illumos
are accessible on Linux unless the name happens to be prefixed with one
of the Linux xattr namespaces, in which case the namespace is stripped
from the name. This makes xattrs entirely incompatible between Linux
and other platforms.
We want to make the encoding of user namespace xattrs compatible across
platforms. A critical requirement of this compatibility is for xattrs
from existing pools from FreeBSD and illumos to be accessible by the
same names in the user namespace on Linux. It is also necessary that
existing pools with xattrs written by Linux retain access to those
xattrs by the same names on Linux. Making user namespace xattrs from
Linux accessible by the correct names on other platforms is important.
The handling of other namespaces is not required to be consistent.
Add a fallback mechanism for listing and getting xattrs to treat xattrs
as being in the user namespace if they do not match a known prefix.
Do not allow setting or getting xattrs with a name that is prefixed
with one of the namespace names used by ZFS on supported platforms.
Allow choosing between legacy illumos and FreeBSD compatibility and
legacy Linux compatibility with a new tunable. This facilitates
replication and migration of pools between hosts with different
compatibility needs.
The tunable controls whether or not to prefix the namespace to the
name. If the xattr is already present with the alternate prefix,
remove it so only the new version persists. By default the platform's
existing convention is used.
Reviewed-by: Christian Schwarz <christian.schwarz@nutanix.com>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11919
Unfortunately macOS has obj-C keyword "fallthrough" in the OS headers.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Closes#13097
Related to commit 90b77a036. Retry the `zpool export` if the pool is
"busy" indicating there is a process accessing the mount point. This
can happen after an import and allowing it to be retried will avoid
spurious test failures.
Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13092
The dRAID section of the zpool_expand_001_pos test would reliably fail
because the calculated expansion size assumed the dRAID top-level vdev
was created with a distributed spare. Create the vdev as expected to
resolve the test failure.
This test case flaw was accidentally caused by changing the default
number of dRAID distributed spares from one to zero while dRAID was
being developed.
Additionally, remove zpool_expand_005_pos from the list of possible
faulty tests. It appears to be passing consistently in my testing.
Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13091
Use large numbers for datasets with numeric names to avoid name
and id collisions. Sporadic test failures were observed when the
test would create $TESTPOOL/100 with an objset ID of 100.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes#13087
Changing volmode may need to remove minors, which could be open, so
call udev_wait() before we "zfs set volmode=<value>". This ensures
no udev process has the zvol open (i.e. blkid) and the kernel
zvol_remove_minor_impl() function won't skip removing the in use
device.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13075
dmu_recv_begin_check() unconditionally sets the DS_HOLD_FLAG_DECRYPT
flag before calling dsl_dataset_hold_flags(). If the key on the
receiving side isn't loaded or the send stream contains embedded
blocks, the receive check fails for a stream which is perfectly
valid and could be received without any problem. This seems like
a remnant of the initial design, where unencrypted datasets below
encrypted ones weren't allowed.
Add a condition to set `DS_HOLD_FLAG_DECRYPT` only for encrypted
datasets, modify an existing test to detect this regression and add
a test for raw replication streams.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes#13033Closes#13076
Linux 5.16 moved XSTATE_XSAVE and XSTATE_XRESTORE out of our reach,
so add our own XSAVE{,OPT,S} code and use it for Linux 5.16.
Please note that this differs from previous behavior in that it
won't handle exceptions created by XSAVE an XRSTOR. This is sensible
for three reasons.
- Exceptions during XSAVE and XRSTOR can only occur if the feature
is not supported or enabled or the memory operand isn't aligned
on a 64 byte boundary. If this happens something else went
terribly wrong, and it may be better to stop execution.
- Previously we just printed a warning and didn't handle the fault,
this is arguable for the above reason.
- All other *SAVE instruction also don't handle exceptions, so this
at least aligns behavior.
Finally add a test to catch such a regression in the future.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes#13042Closes#13059
The on-disk cost of creating a snapshot or bookmark is sufficiently low
that it is difficult to make it reliably fail even when the pool is
"full". In order to avoid false positives remove these two checks from
the test case.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#13060
POSIX requires that set-uid and set-gid bits to be removed when an
unprivileged user writes to a file and ZFS does that during normal
operation.
The problem arrises when the write is stored in the ZIL and replayed.
During replay we have no access to original credentials of the process
doing the write, so zfs_write() will be performed with the root
credentials. When root is doing the write set-uid and set-gid bits
are not removed from the file.
To correct that, log a separate TX_SETATTR entry that removed those bits
on first write to such file.
Idea from: Christian Schwarz
Add test for ZIL replay of setuid/setgid clearing.
Improve various edge cases when clearing setid bits:
- The setid bits can be readded during a single write, so make sure to check
for them on every chunk write.
- Log TX_SETATTR record at most once per transaction group (if the setid bits
are keep coming back).
- Move zfs_log_setattr() outside of zp->z_acl_lock.
Reviewed-by: Dan McDonald <danmcd@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Christian Schwarz <me@cschwarz.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes#13027
`configure` now accepts `--enable-asan` and `--enable-ubsan` switches
which results in passing `-fsanitize=address`
and `-fsanitize=undefined`, respectively, to the compiler. Those
flags are enabled in GitHub workflows for ZTS and zloop. Errors
reported by both instrumentations are corrected, except for:
- Memory leak reporting is (temporarily) suppressed. The cost of
fixing them is relatively high compared to the gains.
- Checksum computing functions in `module/zcommon/zfs_fletcher*`
have UBSan errors suppressed. It is completely impractical
to enforce 64-byte payload alignment there due to performance
impact.
- There's no ASan heap poisoning in `module/zstd/lib/zstd.c`. A custom
memory allocator is used there rendering that measure
unfeasible.
- Memory leaks detection has to be suppressed for `cmd/zvol_id`.
`zvol_id` is run by udev with the help of `ptrace(2)`. Tracing is
incompatible with memory leaks detection.
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes#12928
This commit adds enumerated names to disambiguate between the
different vdevs. Previously only 'zpool status' showed enumerated
vdev names, now 'zpool list -v' and 'zpool iostat -v' also shows
the enumerated vdev names.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Signed-off-by: Akash B <akash-b@hpe.com>
Closes#12510Closes#13031
In files created/modified before 4254acb there may be a corruption of
xattrs which is not reported during scrub and normal send/receive. It
manifests only as an error when raw sending/receiving. This happens
because currently only the raw receive path checks for discrepancies
between the dnode bonus length and the spill pointer flag.
In case we encounter a dnode whose bonus length is greater than the
predicted one, we should report an error. Modify in this regard
dnode_sync() with an assertion at the end, dump_dnode() to error out,
dsl_scan_recurse() to report errors during a scrub, and zstream to
report a warning when dumping. Also added a test to verify spill blocks
are sent correctly in a raw send.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#12720Closes#13014
CLOCK_MONOTONIC_RAW is only a thing on Linux and macOS. I'm not
actually sure why the previous hardcoding of a constant didn't
error out, but when we removed it, it sure does now.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12995
Raw receiving a snapshot back to the originating dataset is currently
impossible because of user accounting being present in the originating
dataset.
One solution would be resetting user accounting when raw receiving on
the receiving dataset. However, to recalculate it we would have to dirty
all dnodes, which may not be preferable on big datasets.
Instead, we rely on the os_phys flag
OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is
incomplete when raw receiving. Thus, on the next mount of the receiving
dataset the local mac protecting user accounting is zeroed out.
The flag is then cleared when user accounting of the raw received
snapshot is calculated.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#12981Closes#10523Closes#11221Closes#11294Closes#12594
Issue #11300
zdb -d <pool>/<objset ID> does not work when
other command line arguments are included i.e.
zdb -U <cachefile> -d <pool>/<objset ID>
This change fixes the command line parsing
to handle this situation. Also fix issue
where zdb -r <dataset> <file> does not handle
the root <dataset> of the pool. Introduce -N
option to force <objset ID> to be interpreted
as a numeric objsetID.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes#12845Closes#12944
Deprecation of Python versions below 3.6 gives opportunity to unify the
build and install requirements for OpenZFS packages. The minimal
supported Python version is 3.6 as this is the most recent Python
package CentOS/RHEL 7 users can get.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes#12925
- Replaces use of manual `zpool sync`
- Don't use `log_must sync_pool` as `sync_pool` uses it internally
- Replace many (but not all) uses of `sync` with `sync_pool`
This makes the tests more consistent, and makes searching easier.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes#12894
Verify that all empty sectors are zero filled before using them to
calculate parity. Failure to do so can result in incorrect parity
columns being generated and written to disk if the contents of an
empty sector are non-zero. This was possible because the checksum
only protects the data portions of the buffer, not the empty sector
padding.
This issue has been addressed by updating raidz_parity_verify() to
check that all dRAID empty sectors are zero filled. Any sectors
which are non-zero will be fixed, repair IO issued, and a checksum
error logged. They can then be safely used to verify the parity.
This specific type of damage is unlikely to occur since it requires
a disk to have silently returned bad data, for an empty sector, while
performing a scrub. However, if a pool were to have been damaged
in this way, scrubbing the pool with this change applied will repair
both the empty sector and parity columns as long as the data checksum
is valid. Checksum errors will be reported in the `zpool status`
output for any repairs which are made.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12857
This is a follow up commit for e03a41a60 which aimed to resolve
this same test failure. The core "problem" here is that it takes
very little space to perform a clone/snapshot/bookmark, which
means if we want these commands to reliably fail the pool must
truely have exhausted all free space.
This commit increases the number of fill iterations to try and
consume every block which we can. This still can't guarantee
the clone/snapshot/bookmark will fail, but it significantly
improves the odds. The exception was kept since it's still
not a sure thing.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12903
Under Linux when rolling back a mounted filesystem negative dentries
may not be dropped from the cache. This can result in an ENOENT
being incorrectly returned on first access. Issuing a `df` before
the unmount results in the negative dentries being invalidated and
side steps the issue.
This is solely a workaround for the test case on Linux and not
correct behavior. The core issue of invalidating negative dentries
needs to be handled with a kernel side change. This is being
tracked as issue #6143.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12898
Issue #6143
The rerefreserv_raidz test was failing on Linux because the sync being
issued doesn't guarantee a pool sync. Switch to using the sync_pool
function and remove the ZTS exception for Linux.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12897
The build on musl needs linux/fs.h for SEEK_DATA and friends,
and sys/sysmacros.h for P2ROUNDUP. Add the needed headers.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Closes#12891
With some minor tweaks several of rsend tests can be sped up
considerably without significantly reducing test coverage.
* send-c_verify_ratio: ~120s -> ~60s
* send_realloc_*_files: ~330s -> ~65s
For the send_realloc* tests this also has the advantage of removing
(most of) the linux/freebsd conditional logic. Note that for this
test more passes, and thus more incremental send/recvs, are preferable
to a larger number of files.
Total run time of the rsend test group was reduced from roughly 20 to
11 minutes in an environment similar to what's used by the CI.
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12876
The rsend_007_pos test reliably fails on Linux in the cleanup
function. This is caused by an unmount error when attempting to
recursively destroy the newly received datasets. Invoking `df`
prior to the `zfs destroy` interestingly avoids the unmont error.
Why this should matter is unclear and should be investigated.
However, this minor tweak may allow us to remove the ZTS rsend
exceptions. The subsequent rsend_010_pos and rsend_011_pos
failures were a result of this initial failure. The other
"maybe" failures I was unable to reproduce and have not been
recently observed in the master branch.
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#5665Closes#6086Closes#6087Closes#6446Closes#12876
Provide access to file generation number on Linux.
Add test coverage.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes#12856
The alloc_class_* tests may fail on Linux with an EBUSY error if
`zfs destroy` is run before the `dd` process has had a chance to
terminate. Wait on the pid after the `kill -9` to make sure.
When testing I didn't observe any failures for the alloc_class
tests. Remove them from the exceptions list, the CI was used to
verify the tests pass on all platforms.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12873
Unfortunately, #11445 means while we fail gracefully now, we still
fail, unless people want to implement a complex workaround just to
support /dev/null.
So let's just use the cheap workaround in a test for now.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12872
The zpool_reopen_[1-5] tests are failing Fedora 35 with:
zpool_reopen_001_pos.ksh[64]: log_must[67]: log_pos[270]:
wait_for_resilver_end[98]: wait_for_action: line 71: func: is read only
Renaming 'func' -> 'funct' fixes the issue.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#12871
This patch adds detail section on adding and running
test-case. It also changes markdown number list to
more readeable headers
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Closes#12737
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#12728
The import_rewind_device_replaced.ksh test was never entirely reliable
because it depends on MOS data not being overwritten. The MOS data is
not protected by the snapshot so occasional failures were always
expected. However, this test is now failing reliably on all platforms
indicating something has changed in the code since the test was marked
"maybe". Convert the test to a "known" failure until the root cause
is identified and resolved.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12821
Due to a possible lock inversion the zvol open call path on Linux
needs to be able to retry in the case where the spa_namespace_lock
cannot be acquired.
For Linux 5.12 an older kernel this was accomplished by returning
-ERESTARTSYS from zvol_open() to request that blkdev_get() drop
the bdev->bd_mutex lock, reaquire it, then call the open callback
again. However, as of the 5.13 kernel this behavior was removed.
Therefore, for 5.12 and older kernels we preserved the existing
retry logic, but for 5.13 and newer kernels we retry internally in
zvol_open(). This should always succeed except in the case where
a pool's vdev are layed on zvols, in which case it may fail. To
handle this case vdev_disk_open() has been updated to retry when
opening a device when -ERESTARTSYS is returned.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12301Closes#12759
With the addition of functionality to rerun failing tests, some
tests that fail only sometimes still fail often enough to degrade
the reliability of the sanity runs. Remove them from the runfile
until they reliably pass.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes#12814
This was a project proposed as part of the Quality theme for the
hackthon for the 2021 OpenZFS Developer Summit. The idea is to improve
the usability of the automated tests that get run when a PR is created
by having failing tests automatically rerun in order to make flaky
tests less impactful.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#12740
The pam_zfs_key pam module does not enforce a minimum password
length while changing the user password and thus the users home
dataset passphrase. To not end up with a dateset `zfs load-key`
can't load the key for, `zfs load-key` should not enforce a minimum
passphrase length. This adds a test for that.
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes#12765Closes#12651Closes#12656
Remove the generated pam service config file
`/etc/pam.d/pam_zfs_key_test` on test cleanup, since the tests
shouldn't alter system state.
While here, move the pam service config file name into a variable.
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes#12765
The code is integrated, builds fine, runs fine, there's not really
any reason not to.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12735
The l2cache device could be added twice because vdev_inuse() does not
check spa_l2cache for added devices. Make l2cache vdevs inuse checking
logic more closer to spare vdevs.
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes#9153Closes#12689
In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. The zhack with newly added cli options could
be used to restore label checksums and make pool importable again.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes#2510Closes#12686
When ZFS is on root, /tmp is a ZFS. This causes zfs_list_004_neg to
fail since `zfs list` on /tmp passes when the test expects it not to.
The fix is to exclude paths that belong to ZFS.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Palash Gandhi <pbg4930@rit.edu>
Closes#12744
This test case may fail on 5.13 and newer Linux kernels if the
/dev/zvol/ device is not created by udev.
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12301
Closes #12738
In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. From other side zdb with -l option will not
provide any useful information why it happened. Add notifications
about corrupted label checksums.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes#2509Closes#12685
It keeps failing, on changes which aren't related at all.
So until someone runs down why, I'd like it to stop being the
sole reason for CI failures.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12733
When using lseek(2) to report data/holes memory mapped regions of
the file were ignored. This could result in incorrect results.
To handle this zfs_holey_common() was updated to asynchronously
writeback any dirty mmap(2) regions prior to reporting holes.
Additionally, while not strictly required, the dn_struct_rwlock is
now held over the dirty check to prevent the dnode structure from
changing. This ensures that a clean dnode can't be dirtied before
the data/hole is located. The range lock is now also taken to
ensure the call cannot race with zfs_write().
Furthermore, the code was refactored to provide a dnode_is_dirty()
helper function which checks the dnode for any dirty records to
determine its dirtiness.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900Closes#12724
When a parent dataset has normalization set to any value other than
"none", and a file system is created with the property "utf8only=off",
implicitly also set "normalization=none" instead of overriding the
desire for a non-UTF8 enforcing file system.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mike Swanson <mikeonthecomputer@gmail.com>
Closes#11892Closes#12038
The values of next properties: filesystem_limit, filesystem_count,
snapshot_limit, snapshot_count were returned to user as UINT64_MAX
integers in case if -p cli option is used, return 'none' value instead.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes#9306Closes#12690
It turns out, userland is much more happy with aliased property
names than the kernel is.
So let's normalize those to the expected names before we pass
them off.
Added a test case hacked up from the other recv -o/-x test that fails
on unpatched git and passes here.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12607Closes#12609
When cleaning up a test case standardize on using the convention:
datasetexists $ds && destroy_dataset $ds <flags>
By using 'destroy_dataset' instead of 'log_must zfs destroy' we ensure
that the destroy is retried in the event that a ZFS volume is busy.
This helps ensures ensure tests are fully cleaned up and prevents false
positive test failures on Linux.
Note that all of the tests which used 'zfs destroy' in cleanup have
been updated even if they don't use volumes. This was done to
clearly establish the expected convention.
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12663
The useradd(8) command on my system won't accept login names with
uppercase letters in them, so adjust for that.
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes#12665
The intention of the zfs_iter_mounted() is to traverse the dataset
and its descendants, not the snapshots. The current code can cause
a mounted snapshot to be included and thus zfs_open() on the snapshot
with ZFS_TYPE_FILESYSTEM would print confusing message such as "cannot
open 'rpool/fs@snap': snapshot delimiter '@' is not expected here".
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes#12447Closes#12448
Recognize when the host part of a sharenfs attribute is an ipv6
Literal and pass that through without modification.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Felix Dörre <felix@dogcraft.de>
Closes: #11171Closes#11939Closes: #1894
Add the following test failures to the exception list for FreeBSD
to ensure we notice new unexpected failures.
pool_checkpoint/checkpoint_big_rewind
pool_checkpoint/checkpoint_indirect
And the following for Linux.
zvol/zvol_misc/zvol_misc_snapdev
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12621
Issue #12622
Issue #12623Closes#12624
In the CI environment it's possible for events to be slightly
delayed resulting in 4, instead of 5, events appearing in the
log file. This isn't a problem and should be considered a
success to avoid false positive test results.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12625
zfs send -R -i snap1 pool/ds@snap1 is an invalid invocation of zfs send
because the incremental source and target snapshots are the same. We
have an error message for this condition, but we don't make it there
because of a failed assert while iterating through the dataset's
snapshots.
Check for NULL to avoid the assert so we can make it to the error
message.
Test this form of invalid send invocation in rsend tests. Fix the
rsend_016_neg test while here: log_neg itself doesn't fail the test,
and writing to /dev/null is not supported on all Linux kernels.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11121Closes#12533
For those not already familiar with the code base it can be a
challenge to understand how the libraries are laid out. This
has sometimes resulted in functionality being added in the
wrong place. To help avoid that in the future this commit
documents the high-level dependencies for easy reference in
lib/Makefile.am. It also simplifies a few things.
- Switched libzpool dependency on libzfs_core to libzutil.
This change makes it clear libzpool should never depend
on the ioctl() functionality provided by libzfs_core.
- Moved zfs_ioctl_fd() from libzutil to libzfs_core and
renamed it lzc_ioctl_fd(). Normal access to the kmods
should all be funneled through the libzfs_core library.
The sole exception is the pool_active() which was updated
to not use lzc_ioctl_fd() to remove the libzfs_core
dependency.
- Removed libzfs_core dependency on libzutil.
- Removed the lib/libzfs/os/freebsd/libzfs_ioctl_compat.c
source file which was all dead code.
- Removed libzfs_core dependency from mkbusy and ctime
test utilities. It was only needed for some trivial
wrapper functions and that code is easy to replicate
to shed the unneeded dependency.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12602
The zvol_misc tests, in particular zvol_misc_volmode, make use of a
common udev_wait function to wait for zvol devices in /dev to quiesce
on Linux. On other platforms this function currently only sleeps for
one second before returning. This is insufficient, and
zvol_misc_volmode has been flaky on FreeBSD as a result.
Replace udev_wait with block_device_wait, passing through the optional
device parameter where possible. Rearrange a few checks to strengthen
the verifications we are making and avoid unnecessarily sleeping. We
must keep udev_wait in a couple places to pass in Github CI workflows.
Remove zvol_misc_volmode from the maybe failing tests on FreeBSD in
zts-report.py.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#12583
* Add async runs for sequential_writes, random_readwrite_fixed and
random_writes
* Remove some larger block sizes that give similar results to others
* Remove nthreads == 4 from random_writes_zil test
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes#12576
As detailed in #12022 and #12008, it turns out the current zstd
implementation is quite nonportable, and results in various
configurations of ondisk header that only each platform can read.
So I've added a test which contains a dataset with a file written by
Linux/x86_64 and one written by FBSD/ppc64.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12030
As of the Linux 5.9 kernel a fallthrough macro has been added which
should be used to anotate all intentional fallthrough paths. Once
all of the kernel code paths have been updated to use fallthrough
the -Wimplicit-fallthrough option will because the default. To
avoid warnings in the OpenZFS code base when this happens apply
the fallthrough macro.
Additional reading: https://lwn.net/Articles/794944/
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12441
This is a follow up patch for PR #12515 which addresses some
additional ZTS tests which are unreliable are should explicitly
wait for the required zvols to be available.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: @Theo13111
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12553
Issue #11854 has been resolved, so we can remove the exceptions for it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#12527
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ka Ho Ng <khng@FreeBSD.org>
Sponsored-by: The FreeBSD Foundation
Closes#12458
The ZTS block_device_wait helper function should use -e when waiting
for a file to appear since it will be either a block special device
or a symlink. This didn't cause any failures but when a device path
was specified the function would wait longer than needed.
Additionally update the most flakey test cases to pass the file path
to block_device_wait to try and improve the test reliability. The
udev behavior on Fedora in particular can result in frequent false
positives.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12515
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#12432
Many things has changed since previous default was set many years ago.
Nowadays 8KB does not allow adequate compression or even decent space
efficiency on many of pools due to 4KB disk physical block rounding,
especially on RAIDZ and DRAID. It effectively limits write throughput
to only 2-3GB/s (250-350K blocks/s) due to sync thread, allocation,
vdev queue and other block rate bottlenecks. It keeps L2ARC expensive
despite many optimizations and dedup just unrealistic.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes#12406
The redacted_send tests make use of a $tmpdir variable, except in
redacted_send/redacted_panic the variable is never defined.
Use $TEST_BASE_DIR instead.
Clean up the stream file after the test.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#12455
The path is not optional on FreeBSD.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#12453
- Bail out early if we're running the perf tests and forget to
specify disks.
- Allow perf tests to run with any number of disks.
- Remove weekly vs. nightly settings
- Move variables with common values to perf.shlib
- Use zinject to clear the ARC over export/import
- Fix dbuf cache size calculation
When the meaning of `dbuf_cache_max_bytes` changed, the performance
test that covers the dbuf cache started to fail. The test would try to
write files for the test using the max possible size of the cache,
inevitably filling the pool and failing. This change uses
`dbuf_cache_shift` to correctly calculate the dbuf cache size.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: John Kennedy <john.kennedy@delphix.com>
Closes#12408
In l2arc_add_vdev() first decide whether the device is eligible for
L2ARC rebuild or whole device trim and then add it to the list of cache
devices. Otherwise l2arc_feed_thread() might already start writing on
the device invalidating previous content as l2ad_hand = l2ad_start.
However l2arc_rebuild_vdev() needs the device present in the cache
device list to figure out its l2arc_dev_t. Fix this by moving most of
l2arc_rebuild_vdev() in a new function l2arc_rebuild_dev() which does
not need to search in the cache device list.
In contrast to l2arc_add_vdev() we do not have to worry about
l2arc_feed_thread() invalidating previous content when onlining a
cache device. The device parameters (l2ad*) are not cleared when
offlining the device and writing new buffers will not invalidate
all previous content. In worst case only buffers that have not had
their log block written to the device will be lost.
Retire persist_l2arc_00{4,5,8} tests since they cover code already
covered by the remaining ones. Test persist_l2arc_006 is renamed to
persist_l2arc_004 and persist_l2arc_007 is renamed to persist_l2arc_005.
Fix a typo in persist_l2arc_004, and remove an assertion that is not
always true from l2arc_arcstats_pos. Also update an assertion in
persist_l2arc_005 and explain why in a comment.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#12365
These were mostly used to annotate do {} while(0)s
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12201
This includes a simplification of mkbusy and format correctness in zhack
and ztest
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #12201
This enables ZED to auto-online vdevs that are not wholedisk managed by
ZFS.
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Unlike most other properties the 'compatibility' property is stored
in the pool config object and not the DMU_OT_POOL_PROPS object.
This had the advantage that the compatibility information is available
without needing to fully import the pool (it can be read with zdb).
However, this means we need to make sure to update both the copy of
the config in the MOS and the cache file. This wasn't being done.
This commit adds a call to spa_async_request() to ensure the copy of
the config in the cache file gets updated as well as the one stored
in the pool. This same change is made for the 'comment' property
which suffers from the same inconsistency.
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Colm Buckley <colm@tuatha.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12261Closes#12276
zstreamdump was replaced with "zstream dump"; let's stop using the
old name, compat symlink or no.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12277
The receive-o-x_props_override test case reliably fails on the
FreeBSD main builders (but not on Linux), until the root cause is
understood add this test to the FreeBSD exception list.
On Linux the alloc_class_012_pos test case may occasionally fail.
This is a known false positive which has also been added to the
Linux exception list until the test can be made entirely reliable.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12272
There are at least two interpretations of basename(3),
in addition to both functions being allowed to /both/ return a static
buffer (unsuitable in multi-threaded environments) /and/ raze the input
(which encourages overallocations, at best)
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12105
Commit 86b5f4c12 added a new zfs_clone_livelist_dedup.ksh test case
but didn't include it in the Makefile.am. This results in the test
not being included in the dist tarball so it's never run by the CI.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #12224
Update the logic to handle the dedup-case of consecutive
FREEs in the livelist code. The logic still ensures that
all the FREE entries are matched up with a respective
ALLOC by keeping a refcount for each FREE blkptr that we
encounter and ensuring that this refcount gets to zero
by the time we are done processing the livelist.
zdb -y no longer panics when encountering double frees
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes#11480Closes#12177
On FreeBSD 14, these two tests started erroring out like the
objects they're attempting to examine don't exist.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12165
This checks every file it checked (and a few more),
but explicitly instead of "if it works it works" best-effort
(which wasn't that good anyway)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#10512Closes#12101
The change correctly handles BrokenPipeError and improves the
associated tests.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12037Closes#12036
verify_slog_support no longer applies to ZFS since slog support
is always available.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes#12092
According to POSIX.1, "vfork() has the same effect as fork(2),
except that the behavior is undefined if the process created by vfork()
either modifies any data other than a variable of type pid_t
used to store the return value from vfork(), [...],
or calls any other function before successfully calling _exit(2)
or one of the exec(3) family of functions."
These do all three, and work by pure chance
(or maybe they don't, but we blisfully don't know).
Either way: bad idea to call vfork() from C,
unless you're the standard library, and POSIX.1-2008 removes it entirely
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12015
This change addresses two distinct scenarios which are possible
when performing a sequential resilver to a dRAID pool with vdevs
that contain silent unknown damage. Which in this circumstance
took the form of the devices being intentionally overwritten with
zeros. However, it could also result from a device returning incorrect
data while a sequential resilver was in progress.
Scenario 1) A sequential resilver is performed while all of the
dRAID vdevs are ONLINE and there is silent damage present on the
vdev being resilvered. In this case, nothing will be repaired
by vdev_raidz_io_done_reconstruct_known_missing() because
rc->rc_error isn't set on any of the raid columns. To address
this vdev_draid_io_start_read() has been updated to always mark
the resilvering column as ESTALE for sequential resilver IO.
Scenario 2) Multiple columns contain silent damage for the same
block and a sequential resilver is performed. In this case it's
impossible to generate the correct data from parity unless all of
the damaged columns are being sequentially resilvered (and thus
only good data is used to generate parity). This is as expected
and there's nothing which can be done about it. However, we need
to be careful not to make to situation worse. Since we can't
verify the data is actually good without a checksum, we must
only repair the devices which are being sequentially resilvered.
Otherwise, an incorrect repair to a device which previously
contained good data could effectively lock in the damage and
make reconstruction impossible. A check for this was added to
vdev_raidz_io_done_verified() along with a new test case.
Lastly, this change updates the redundancy_draid_spare1 and
redundancy_draid_spare3 test cases to be more representative
of normal dRAID replacement operation. Specifically, what we
care about is that the scrub run after a sequential resilver
does not find additional blocks which need repair. This would
indicate the sequential resilver failed to rebuild a section of
one of the devices. Note also the tests were switched to using
the verify_pool() function which still checks for checksum errors.
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12061
The redundancy_draid.ksh and redundancy_raidz.ksh tests were updated
by commit 93c8e91fe to additionally verify self-healing. This
additional check increased the run time which can now occasionally
exceed the default maximum timeout in the CI environment. To prevent
this from causing failures increase the default timeout for the
redundancy test cases.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12043
Commit d1d4769 takes into account the encryption key version to
decide if the local_mac could be zeroed out. However, this could lead
to failure mounting encrypted datasets created with intermediate
versions of ZFS encryption available in master between major releases.
In order to prevent this situation revert d1d4769 pending a more
comprehensive fix which addresses the mount failure case.
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11294
Issue #12025
Issue #12300Closes#12033
Add support for http and https to the keylocation properly to
allow encryption keys to be fetched from the specified URL.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #9543Closes#9947Closes#11956
The following seven tests been observed to occasionally fail during
CI testing. This commit adds them to the list of known somewhat
flaky test cases.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12023
When dRAID performs a normal read operation only the data columns
in the raid map are read from disk. This is enough information to
calculate the checksum, verify it, and return the needed data to the
application. It's only in the event of a checksum failure that the
additional parity and any empty columns must be read since they are
required for parity reconstruction.
Reading these additional columns is handled by vdev_raidz_read_all()
which calls vdev_draid_map_alloc_empty() to expand the raid_map_t
and submit IOs for the missing columns. This all works correctly,
but it fails to account for any "short" columns. These are data
columns which are padded with a empty skip sector at the end.
Since that empty sector is not needed for a normal read it's not
read when columns is first read from disk. However, like the parity
and empty columns the skip sector is needed to perform reconstruction.
The fix is to mark any "short" columns as never being read by clearing
the rc_tried flag when expanding the raid_map_t. This will cause
the entire column to re-read from disk in the event of a checksum
failure allowing the self-healing functionality to repair the block.
Note that this only effects the self-healing feature because when
scrubbing a pool the parity, data, and empty columns are all read
initially to verify their contents. Furthermore, only blocks which
contain "short" columns would be effected, and only when the memory
backing the skip sector wasn't already zeroed out.
This change extends the existing redundancy_raidz.ksh test case to
verify self-healing (as well as resilver and scrub). Then applies
the same test case to dRAID with a slightly modified version of
the test script called redundancy_draid.ksh. The unused variable
combrec was also removed from both test cases.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12010
Afterward, git grep ZoL matches:
* README.md: * [ZoL Site](https://zfsonlinux.org)
- Correct
* etc/default/zfs.in:# ZoL userland configuration.
- Changing this would induce a needless upgrade-check,
if the user has modified the configuration;
this can be updated the next time the defaults change
* module/zfs/dmu_send.c: * ZoL < 0.7 does not handle [...]
- Before 0.7 is ZoL, so fair enough
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11956
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11970
Both the zpool_initialize_import_export and checkpoint_discard_busy
test cases a known to occasionally fail. Add them to the list of
known possible failures and reference the appropriate issue on the
tracker.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11949
Receiving datasets while blanket inheriting properties like zfs
receive -x mountpoint can generally be desirable, e.g. to avoid
unexpected mounts on backup hosts.
Currently this will fail to receive zvols due to the mountpoint
property being applicable to filesystems only. This limitation
currently requires operators to special-case their minds and tools
for zvols.
This change gets rid of this limitation for inherit (-x) by
Spiting up the dataset type handling: Warnings for inheriting (-x),
errors for overriding (-o).
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes#11416Closes#11840Closes#11864
- Add additional logging to provide more information about why the
test failed. This including logging more of the individual commands
and the contents and differences of the record files on failure.
- Updated get_vdevs() to properly exclude all top-level vdevs
including raidz3 and draid[1-3].
- Replaced gnudd with dd. This is the only remaining place in the
test suite gnudd is used and it shouldn't be needed.
- The refill_test_env function expects the pool as the first argument
but never sets the pool variable.
- Only fill the test pools to 50% of capacity instead of 75% to help
speed up the tests.
- Fix replace_missing_devs() calculation, MINDEVSIZE should be
MINVDEVSIZE.
- Fix damage_devs() so it overwrites almost all of the device so
we're guaranteed to damage filesystem blocks.
- redundancy_stripe.ksh should not use log_mustnot to check if the
pool is healthy since the return value may be misinterpreted.
Just perform a normal conditional check and log the failure.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11906
It's been observed in the CI that the required 25% of obsolete bytes
in the mapping can be to high a threshold for this test resulting in
condensing never being triggered and a test failure. To prevent these
failures make the existing zfs_condense_indirect_obsolete_pct tuning
available so the obsolete percentage can be reduced from 25% to 5%
during this test.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11869
The fault/auto_spare_shared, l2arc/persist_l2arc_007_pos, and
alloc_class/alloc_class_013_pos test cases are not entirely reliable
and may occasionally fail resulting in a false positive in the CI.
Add these tests to known list of possible failures until they can
be made 100% reliable.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11890
A tentative implementation and discussion was done in #5285.
According to it a send --skip-missing|-s flag has been added.
In a replication stream, when there are snapshots missing in
the hierarchy, if -s is provided print a warning and ignore
dataset (and its children) instead of throwing an error
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pablo Correa Gómez <ablocorrea@hotmail.com>
Closes#11710
Kill the removal operation on every platform, not just Linux.
The test has been fixed and is now stable on FreeBSD.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11856
As described in #11854, zhack is occasionally segfaulting on FreeBSD.
Debugging this is proving to be tricky. To avoid false positives in
the CI add entries for the tests that use zhack in zts-report to
accept that they may occasionally fail on FreeBSD.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Issue #11854Closes#11855
Just as delay zevents can flood the zevent pipe when a vdev becomes
unresponsive, so do the deadman zevents.
Ratelimit deadman zevents according to the same tunable as for delay
zevents.
Enable deadman tests on FreeBSD and add a test for deadman event
ratelimiting.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11786
get_clones_string currently returns an empty string for filesystem
snapshots which have no clones. This breaks parsable `zfs get` output as
only three columns are output, instead of 4.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Fiddaman <github@m.fiddaman.uk>
Co-authored-by: matt <matt@fiddaman.net>
Closes#11837
The pool_checkpoint tests may incorrectly fail because several of
them invoke zdb for an imported pool. In this scenario it's not
unexpected for zdb to fail if the pool is modified. To resolve
this these zdb checks are now done after the pool has been exported.
Additionally, the default cleanup functions assumed the pool would
be imported when they were run. If this was not the case they're
exit early and fail to cleanup all of the test state causing
subsequent tests to fail. Add a check to only destroy the pool
when it is imported.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11832
Correct an assortment of typos throughout the code base.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes#11774
200ms time-out is relatively long, but if we already hit the cap,
then we'll likely be able to spawn multiple new jobs when we wake up
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11807
Add inheritance/inherit_001_pos to the maybe fails on FreeBSD list.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11830
Nothing bad happens if a prefix of your pool name matches a disk name.
This is a bit of a silly restriction at this point.
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes#11781Closes#11813
Commit 235a85657 introduced a regression in evaluation of POSIX modes
that require group DENY entries in the internal ZFS ACL. An example
of such a POSX mode is 007. When write_implies_delete_child is set,
then ACE_WRITE_DATA is added to `wanted_dirperms` in prior to calling
zfs_zaccess_common(). This occurs is zfs_zaccess_delete().
Unfortunately, when zfs_zaccess_aces_check hits this particular DENY
ACE, zfs_groupmember() is checked to determine whether access should be
denied, and since zfs_groupmember() always returns B_TRUE on Linux and
so this check is failed, resulting ultimately in EPERM being returned.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Andrew Walker <awalker@ixsystems.com>
Closes#11760
This change adds a new test that covers a bug fix in the binary search
in the redacted send resume logic that causes a kernel panic.
The bug was fixed in https://github.com/openzfs/zfs/pull/11297.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Palash Gandhi <palash.gandhi@delphix.com>
Closes#11764
Create a new section of tests to run with acltype=off.
For now the only test we have is for the DOS mode READONLY attribute on
FreeBSD.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11734
You can't use user_run to eval ksh functions defined in libtest unless
you include libtest in the user shell.
Fix xattr_003_neg by:
* include libtest in the user shell
* *then* run get_xattr
* assert this fails
* use variables for filenames so they don't change in the user's shell
* don't log the contents of /etc/passwd
* cleanup all byproducts
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11185
The current user_run often does not work as expected. Commands are run
in a different shell, with a different environment, and all output is
discarded.
Simplify user_run to retain the current environment, eliminate eval,
and feed the command string into ksh. Enhance the logging for
user_run so we can see out and err.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11185
Importing a pool using the cachefile is ideal to reduce the time
required to import a pool. However, if the devices associated with
a pool in the cachefile have changed, then the import would fail.
This can easily be corrected by doing a normal import which would
then read the pool configuration from the labels.
The goal of this change is make importing using a cachefile more
resilient and auto-correcting. This is accomplished by having
the cachefile import logic automatically fallback to reading the
labels of the devices similar to a normal import. The main difference
between the fallback logic and a normal import is that the cachefile
import logic will only look at the device directories that were
originally used when the cachefile was populated. Additionally,
the fallback logic will always import by guid to ensure that only
the pools in the cachefile would be imported.
External-issue: DLPX-71980
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes#11716
A few deadman tunables ended up in the wrong sysctl node.
Move them to vfs.zfs.deadman.*
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11715
events_002 exercises the ZED, ensuring that it neither misses events,
nor reporting events twice.
On slow test hardware, some of the timeouts are insufficient to allow
the ZED to properly settle. Conversely, on fast hardware these same
timeouts are too long, unnecessarily slowing the test run.
Instead of using a fixed timeout, wait for the expected final event
before returning. Additionally, wait with a timeout for unexpected
events to avoid missing them if they show up late.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes#11703
* Restore original kern.corefile value after the test.
* Don't leave behind a frozen pool.
* Clean up leftover vdev files.
* Make zpool_002_pos and zpool_003_pos consistent in their handling of
core files while here.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11694
When a device which is actively trimming or initializing becomes
FAULTED, and therefore no longer writable, cancel the active
TRIM or initialization. When the device is merely taken offline
with `zpool offline` then stop the operation but do not cancel it.
When the device is brought back online the operation will be
resumed if possible.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Vipin Kumar Verma <vipin.verma@hpe.com>
Signed-off-by: Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Closes#11588
Several of the TRIM tests were based of the initialize tests and
then adapted for TRIM. The zpool_trim_start_and_cancel_pos.ksh
test was intended to be one such test but it was overlooked and
actually never adapted. Update it accordingly.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11649
The behavior of a NULL fromsnap was inadvertently changed for a doall
send when the send/recv logic in libzfs was updated. Restore the
previous behavior by correcting send_iterate_snap() to include all
the snapshots in the nvlist for this case.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cedric Maunoury <cedric.maunoury@gmail.com>
Closes#11608
Fix regression seen in issue #11545 where checksum errors
where not being counted or showing up in a zpool event.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes#11609
Property to allow sets of features to be specified; for compatibility
with specific versions / releases / external systems. Influences
the behavior of 'zpool upgrade' and 'zpool create'. Initial man
page changes and test cases included.
Brief synopsis:
zpool create -o compatibility=off|legacy|file[,file...] pool vdev...
compatibility = off : disable compatibility mode (enable all features)
compatibility = legacy : request that no features be enabled
compatibility = file[,file...] : read features from specified files.
Only features present in *all* files will be enabled on the
resulting pool. Filenames may be absolute, or relative to
/etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d (/etc
checked first).
Only affects zpool create, zpool upgrade and zpool status.
ABI changes in libzfs:
* New function "zpool_load_compat" to load and parse compat sets.
* Add "zpool_compat_status_t" typedef for compatibility parse status.
* Add ZPOOL_PROP_COMPATIBILITY to the pool properties enum
* Add ZPOOL_STATUS_COMPATIBILITY_ERR to the pool status enum
An initial set of base compatibility sets are included in
cmd/zpool/compatibility.d, and the Makefile for cmd/zpool is
modified to install these in $pkgdatadir/compatibility.d and to
create symbolic links to a reasonable set of aliases.
Reviewed-by: ericloewe
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes#11468
There are two issues that don't allow ZFS to be compiled using uClibc.
`backtrace()`, and `program_invocation_short_name` as a `const`.
This patch adds uClibc to the conditionals in the same way there are
already for Glibc for `backtrace()`; and removes the external param
`program_invocation_short_name` because its only used here for the
whole project.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com>
Closes#11600
All tests need to be included in the Makefiles.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes#11541
While you can use zdb -R poolname vdev:offset:[<lsize>/]<psize>[:flags]
to extract individual DVAs from a vdev, it would be handy for be able
copy an entire file out of the pool.
Given a file or object number, add support to copy the contents to a
file. Useful for debugging and recovery.
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes#11027
If there is no scsi_debug module, then this test
must be skipped, in this case cleanup routine should
be prepared for absent pool.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes#11534
When scrubbing, (non-sequential) resilvering, or correcting a checksum
error using RAIDZ parity, ZFS should heal any incorrect RAIDZ parity by
overwriting it. For example, if P disks are silently corrupted (P being
the number of failures tolerated; e.g. RAIDZ2 has P=2), `zpool scrub`
should detect and heal all the bad state on these disks, including
parity. This way if there is a subsequent failure we are fully
protected.
With RAIDZ2 or RAIDZ3, a block can have silent damage to a parity
sector, and also damage (silent or known) to a data sector. In this
case the parity should be healed but it is not.
The problem can be noticed by scrubbing the pool twice. Assuming there
was no damage concurrent with the scrubs, the first scrub should fix all
silent damage, and the second scrub should be "clean" (`zpool status`
should not report checksum errors on any disks). If the bug is
encountered, then the second scrub will repair the silently-damaged
parity that the first scrub failed to repair, and these checksum errors
will be reported after the second scrub. Since the first scrub repaired
all the damaged data, the bug can not be encountered during the second
scrub, so subsequent scrubs (more than two) are not necessary.
The root cause of the problem is some code that was inadvertently added
to `raidz_parity_verify()` by the DRAID changes. The incorrect code
causes the parity healing to be aborted if there is damaged data
(`rc_error != 0`) or the data disk is not present (`!rc_tried`). These
checks are not necessary, because we only call `raidz_parity_verify()`
if we have the correct data (which may have been reconstructed using
parity, and which was verified by the checksum).
This commit fixes the problem by removing the incorrect checks in
`raidz_parity_verify()`.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#11489Closes#11510
- refactor cleanup routines into common kshlib zpool_export_cleanup func
- don't require physical disks to test, just use files
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will@firepipe.net>
Closes#11518
Instead of just failing, indicate the expected and actual value and
source as a NOTE. Tests using this failed in an earlier version of
the changeset and this information helped find the cause.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will@firepipe.net>
Closes#11517
There is an identical definition in zfs_set_common.kshlib already.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Will Andrews <will@firepipe.net>
Closes#11516
This avoids globbing together multiple lines in the log, if you happen
to specify LOGAPI_DEBUG because you want to see it.
Signed-off-by: Will Andrews <will@firepipe.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11515
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11503
Provide a basic test coverage for io_uring I/O.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#11497
In FreeBSD the struct uio was just a typedef to uio_t. In order to
extend this struct, outside of the definition for the struct uio, the
struct uio has been embedded inside of a uio_t struct.
Also renamed all the uio_* interfaces to be zfs_uio_* to make it clear
this is a ZFS interface.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes#11438
Prior to util-linux 2.36.2, if a file or directory in the
current working directory was named 'dataset' then mount(8)
would prepend the current working directory to the dataset.
Eventually, we should be able to drop this workaround.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes#11295Closes#11462
As described in #11445, the kernel interface kernel_{read,write} no
longer act on special devices. In the ZTS, zfs send and receive are
tested by piping to these devices, leading to spurious failures (for
positive tests) and may mask errors (for negative tests).
Until a more permanent mechanism to address this deficiency is
developed, clean up the output from the ZTS by avoiding directly piping
to or from /dev/null and /dev/zero.
For /dev/zero input, simply use a pipe: `cat </dev/zero |` .
However, for /dev/null output, the shell semantics for pipe failures
means that zfs send error codes will be masked by the successful
`| cat >/dev/null` command execution. In that case, use a temporary
file under $TEST_BASE_DIR for output in favor.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes#11478
The zfs_rollback_001 test modifies files in a temporary, test dataset
repeatedly. Before each iteration, any preexisting dataset is removed,
after unmounted with umount -f, if necessary.
Add a short delay after the forced unmount, avoiding a race that can
prevent zfs destroy from succeeding, leading to a test failure.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes#11451
if pool root is not mounted, then zpool umount in next test will leave
dataset mountpoint directory around and next zfs mount -a will fail
with error: cannot mount '/testpool': directory is not empty
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes#11417
Build error on illumos with gcc 10 did reveal:
In function 'dmu_objset_refresh_ownership':
../../common/fs/zfs/dmu_objset.c:857:25: error: implicit conversion
from 'boolean_t' to 'ds_hold_flags_t' {aka 'enum ds_hold_flags'}
[-Werror=enum-conversion]
857 | dsl_dataset_disown(ds, decrypt, tag);
| ^~~~~~~
cc1: all warnings being treated as errors
libzfs_input_check.c: In function 'zfs_ioc_input_tests':
libzfs_input_check.c:754:28: error: implicit conversion from
'enum dmu_objset_type' to 'enum lzc_dataset_type'
[-Werror=enum-conversion]
754 | err = lzc_create(dataset, DMU_OST_ZFS, NULL, NULL, 0);
| ^~~~~~~~~~~
cc1: all warnings being treated as errors
The same issue is present in openzfs, and also the same issue about
ds_hold_flags_t, which currently defines exactly one valid value.
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes#11406
Consider the test to be a success as long as the initializing pattern
is found at least once per metaslab. This indicates that at least
part of the free space was initialized. Ideally we'd check that the
pattern was written to all free space but that's much trickier so this
check is a reasonable compromise.
Using a here-string to feed the loop in this test causes an empty
string to still trigger the loop so we miss the `spacemaps=0` case.
Pipe into the loop instead.
While here, we can use `zpool wait -t initialize $TESTPOOL` to wait for
the pool to initialize.
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11365
The space in special devices is not included in spa_dspace (or
dsl_pool_adjustedsize(), or the zfs `available` property). Therefore
there is always at least as much free space in the normal class, as
there is allocated in the special class(es). And therefore, there is
always enough free space to remove a special device.
However, the checks for free space when removing special devices did not
take this into account. This commit corrects that.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#11329
Use the correct return type for getopt otherwise clang complains
about tautological-constant-out-of-range-compare.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes#11359
When removing and subsequently reattaching a vdev, CKSUM errors may
occur as vdev_indirect_read_all() reads from all children of a mirror
in case of a resilver.
Fix this by checking whether a child is missing the data and setting a
flag (ic_error) which is then checked in vdev_indirect_repair() and
suppresses incrementing the checksum counter.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#11277
Follow up fix for 0cb40fa3. Remove unused variables, don't source
unused libs and add missed cleanup.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes#11311
Canonicalization, the source of the trouble, was disabled in 9000a9f.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes#11295
Run zfs-tests with sanity.run for brief results. Timeouts
are rare, so minimize false positives by increasing the
default from 60 to 180 seconds.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes#11304
Otherwise trim may finish before progress checks.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes#11296
When running in the CI the zpool_import_012_pos test case occasionally
takes longer than the maximum 600 seconds. When this happens the test
case is considered to have failed but always completes a few minutes
latter. Since the logs suggest nothing has actually failed this commit
increases timeout and removes the exception.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11286
Occasionally an out of memory error is hit by this test case
when mounting the filesystems. Try and reduce the likelihood
of this occurring by reducing the thread count from 100 to 50.
It also has the advantage of slightly speeding up the test.
cannot mount 'testpool/testfs3/79': Cannot allocate memory
filesystem successfully created, but not mounted
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11283
When sending raw encrypted datasets the user space accounting is present
when it's not expected to be. This leads to the subsequent mount failure
due a checksum error when verifying the local mac.
Fix this by clearing the OBJSET_FLAG_USERACCOUNTING_COMPLETE and reset
the local mac. This allows the user accounting to be correctly updated
on first mount using the normal upgrade process.
Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#10523Closes#11221
`zpool create -n` fails to list cache and spare vdevs.
`zpool add -n` fails to list spare devices.
`zpool split -n` fails to list `special` and `dedup` labels.
`zpool add -n` and `zpool split -n` shouldn't list hole devices.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes#11122Closes#11167
Add -u option to 'zfs create' that prevents file system from being
automatically mounted. This is similar to the 'zfs receive -u'.
Authored by: pjd <pjd@FreeBSD.org>
FreeBSD-commit: freebsd/freebsd@35c58230e2
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Ported-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11254
This run file contains a subset of functional tests which exercise
as much functionality as possible while still executing relatively
quickly. The included tests should take no more than a few seconds
each to run at most. This provides a convenient way to sanity test a
change before committing to a full test run which takes several hours.
$ ./scripts/zfs-tests.sh -r sanity
...
Results Summary
PASS 813
Running Time: 00:14:42
Percent passed: 100.0%
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11271
For encrypted receives, where user accounting is initially disabled on
creation, both 'zfs userspace' and 'zfs groupspace' fails with
EOPNOTSUPP: this is because dmu_objset_id_quota_upgrade_cb() forgets to
set OBJSET_FLAG_USERACCOUNTING_COMPLETE on the objset flags after a
successful dmu_objset_space_upgrade().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#9501Closes#9596
This patch adds a new top-level vdev type called dRAID, which stands
for Distributed parity RAID. This pool configuration allows all dRAID
vdevs to participate when rebuilding to a distributed hot spare device.
This can substantially reduce the total time required to restore full
parity to pool with a failed device.
A dRAID pool can be created using the new top-level `draid` type.
Like `raidz`, the desired redundancy is specified after the type:
`draid[1,2,3]`. No additional information is required to create the
pool and reasonable default values will be chosen based on the number
of child vdevs in the dRAID vdev.
zpool create <pool> draid[1,2,3] <vdevs...>
Unlike raidz, additional optional dRAID configuration values can be
provided as part of the draid type as colon separated values. This
allows administrators to fully specify a layout for either performance
or capacity reasons. The supported options include:
zpool create <pool> \
draid[<parity>][:<data>d][:<children>c][:<spares>s] \
<vdevs...>
- draid[parity] - Parity level (default 1)
- draid[:<data>d] - Data devices per group (default 8)
- draid[:<children>c] - Expected number of child vdevs
- draid[:<spares>s] - Distributed hot spares (default 0)
Abbreviated example `zpool status` output for a 68 disk dRAID pool
with two distributed spares using special allocation classes.
```
pool: tank
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
slag7 ONLINE 0 0 0
draid2:8d:68c:2s-0 ONLINE 0 0 0
L0 ONLINE 0 0 0
L1 ONLINE 0 0 0
...
U25 ONLINE 0 0 0
U26 ONLINE 0 0 0
spare-53 ONLINE 0 0 0
U27 ONLINE 0 0 0
draid2-0-0 ONLINE 0 0 0
U28 ONLINE 0 0 0
U29 ONLINE 0 0 0
...
U42 ONLINE 0 0 0
U43 ONLINE 0 0 0
special
mirror-1 ONLINE 0 0 0
L5 ONLINE 0 0 0
U5 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
L6 ONLINE 0 0 0
U6 ONLINE 0 0 0
spares
draid2-0-0 INUSE currently in use
draid2-0-1 AVAIL
```
When adding test coverage for the new dRAID vdev type the following
options were added to the ztest command. These options are leverages
by zloop.sh to test a wide range of dRAID configurations.
-K draid|raidz|random - kind of RAID to test
-D <value> - dRAID data drives per group
-S <value> - dRAID distributed hot spares
-R <value> - RAID parity (raidz or dRAID)
The zpool_create, zpool_import, redundancy, replacement and fault
test groups have all been updated provide test coverage for the
dRAID feature.
Co-authored-by: Isaac Huang <he.huang@intel.com>
Co-authored-by: Mark Maybee <mmaybee@cray.com>
Co-authored-by: Don Brady <don.brady@delphix.com>
Co-authored-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mmaybee@cray.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#10102
The custom zpl_show_devname() helper should translate spaces in
to the octal escape sequence \040. The getmntent(2) function
is aware of this convention and properly translates the escape
character back to a space when reading the fsname.
Without this change the `zfs mount` and `zfs unmount` commands
incorrectly detect when a dataset with a name containing spaces
is mounted.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11182Closes#11187
The microzap hash can sometimes be zero for single digit snapnames.
The zap cursor can then have a serialized value of two (for . and ..),
and skip the first entry in the avl tree for the .zfs/snapshot directory
listing, and therefore does not return all snapshots.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: Tony Perkins <tperkins@datto.com>
Closes#11039
Convert dynamic allocation to static buffer, simplify parse_dataset
function return path. Add tests specific to the mount helper.
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes#11098
Add a new test case which corrupts all level 1 block in a file.
Then verifies that corruption is detected and repaired.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11141
The second part of list_file_blocks transforms the object description
output by zdb -ddddd $ds $objnum into a stream of lines of the form
"level path offset length" for the indirect blocks in the given file.
The current code only works for the first copy of L0 blocks. L1 and
L2 indirect blocks have more than one copy on disk.
Add one more -d to the zdb command so we get all block copies and
rewrite the transformation to match more than L0 and output all DVAs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11141
The first part of list_file_blocks transforms the pool configuration
output by zdb -C $pool into shell code to set up a shell variable,
VDEV_MAP, that maps from vdev id to the underlying vdev path. This
variable is a simple indexed array. However, the vdev id in a DVA is
only the id of the top level vdev.
When the pool is mirrored, the top level vdev is a mirror and its
children are the mirrored devices. So, what we need is to map from
the top level vdev id to a list of the underlying vdev paths.
ist_file_blocks does not need to work for raidz vdevs, so we can
disregard that case.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11141
The expected variance for this test case was originally set at 10%
based on local testing. Additional testing via the CI has show it
can be as large as 11%. Increase the expected maximum to 12% to
prevent this test from incorrectly failing.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11148
The events_001_pos.ksh test case can fail because it's possible,
and correct, for the config_sync event to be posted after the last
"expected" event. To accommodate this the run_and_verify() function
has been updated to wait for all non-history events, not just the
last event. This does not increase the run time of the test as
long as all the events do get generated.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11147
Refer to the correct section or alternative for FreeBSD and Linux.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11132
Previously, xattr_004_pos would create files with xattrs on both
tmpfs and ext2, and then copy them to zfs to verify that their
xattrs were preserved. However tmpfs doesn't support xattrs.
This was never noticed until Fedora 33. In Fedora 32 and older,
/tmp was on the root partition (like ext4), whereas on Fedora 33
/tmp is actually tmpfs. That caused this test to fail on Fedora 33.
This fix updates the test to only create the file on ext2, not tmpfs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#11133
The current l2_misses accounting behavior treats all reads to pools
without a configured l2arc as an l2arc miss, IFF there is at least
one other pool on the system which does have an l2arc configured.
This makes it extremely hard to tune for an improved l2arc hit/miss
ratio because this ratio will be modulated by reads from pools which
do not (and should not) have l2arc devices; its upper limit will
depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools.
This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is
the only relevant one) where the target spa doesn't have an l2arc.
Includes new test - l2arc_l2miss_pos.ksh
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Adam Moss <c@yotes.com>
Closes#10921
ZED will log zevents summaries to the syslog, however the log entries
tend to drop event details that can be useful for diagnosis. This is
especially true for ereport events, like io, checksum, and delay.
Update the all-syslog.sh script to log additional event information.
Add an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc
for choosing GUIDs over names for pool and vdev.
Change the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event
events. These events tend to be frequent, convey no meaningful info,
and are already logged in the zpool history.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes#10967
This is a follow up fix for commit 0fdd6106bb. The VERIFY is
only true when we haven't hit an error code path. See added
test case for a reproducer.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes#11048