Commit Graph

1308 Commits

Author SHA1 Message Date
Brian Behlendorf
75dda92dc3
ZTS: resilver_restart_001.ksh restore defaults
Update resilver_restart_001.ksh to restore the default
resilver_defer_percent when the test completes.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16618
2024-10-08 09:35:13 -07:00
JKDingwall
0b4dcbe5b4
Fix generation of kernel uevents for snapshot rename on linux
`zvol_rename_minors()` needs to be given the full path not just the
snapshot name.  Use code removed in a0bd735ad as a guide
to providing the necessary values.

Add ZTS check for /dev changes after snapshot rename.  After
renaming a snapshot with 'snapdev=visible' ensure that the /dev
entries are updated to reflect the rename.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: James Dingwall <james@dingwall.me.uk>
Closes #14223 
Closes #16600
2024-10-06 14:36:33 -07:00
Pavel Snajdr
0d77e738e6
Defer resilver only when progress is above a threshold
Restart a resilver from scratch, if the current one in progress is
below a new tunable, zfs_resilver_defer_percent (defaulting to 10%).

The original rationale for deferring additional resilvers, when there is
already one in progress, was to help achieving data redundancy sooner
for the data that gets scanned at the end of the resilver.

But in case the admin wants to attach multiple disks to a single vdev,
it wasn't immediately obvious the admin is supposed to run
`zpool resilver` afterwards to reset the deferred resilvers and start
a new one from scratch.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #15810
2024-10-04 10:41:17 -07:00
Rob Norris
224393a321
feature: large_microzap
In a4b21eadec we added the zap_micro_max_size tuneable to raise the size
at which "micro" (single-block) ZAPs are upgraded to "fat" (multi-block)
ZAPs. Before this, a microZAP was limited to 128KiB, which was the old
largest block size. The side effect of raising the max size past 128KiB
is that it be stored in a large block, requiring the large_blocks
feature.

Unfortunately, this means that a backup stream created without the
--large-block (-L) flag to zfs send would split the microZAP block into
smaller blocks and send those, as is normal behaviour for large blocks.
This would be received correctly, but since microZAPs are limited to the
first block in the object by definition, the entries in the later blocks
would be inaccessible. For directory ZAPs, this gives the appearance of
files being lost.

This commit adds a feature flag, large_microzap, that must be enabled
for microZAPs to grow beyond 128KiB, and which will be activated the
first time that occurs. This feature is later checked when generating
the stream and if active, the send operation will abort unless
--large-block has also been requested.

Changing the limit still requires zap_micro_max_size to be changed. The
state of this flag effectively sets the upper value for this tuneable,
that is, if the feature is disabled, the tuneable will be clamped to
128KiB.

A stream flag is also added to ensure that the receiver also activates
its own feature flag upon receiving the stream. This is not strictly
necessary to _use_ the received microZAP, since it doesn't care how
large its block is, but it is required to send the microZAP object on,
otherwise the original problem occurs again.

Because it's difficult to reliably distinguish a microZAP from a fatZAP
from outside the ZAP code, and because it seems unlikely that most
users are affected (a fairly niche tuneable combined with what should be
an uncommon use of send), and for the sake of expediency, this change
activates the feature the first time a microZAP grows to use a large
block, and is never deactivated after that. This can be improved in the
future.

This commit changes nothing for existing pools that already have large
microZAPs. The feature will not be retroactively applied, but will be
activated the next time a microZAP grows past the limit.

Don't use large_blocks feature for enable/disable tests.  The
large_microzap depends on large_blocks, so it gets enabled as a
dependency, breaking the test. Instead use feature "longname", which has
the exact same feature characteristics.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16593
2024-10-02 20:47:11 -07:00
Brian Behlendorf
d34d4f97a8
snapdir: add 'disabled' value to make .zfs inaccessible
In some environments, just making the .zfs control dir hidden from sight
might not be enough. In particular, the following scenarios might
warrant not allowing access at all:
- old snapshots with wrong permissions/ownership
- old snapshots with exploitable setuid/setgid binaries
- old snapshots with sensitive contents

Introducing a new 'disabled' value that not only hides the control dir,
but prevents access to its contents by returning ENOENT solves all of
the above.

The new property value takes advantage of 'iuv' semantics ("ignore
unknown value") to automatically fall back to the old default value when
a pool is accessed by an older version of ZFS that doesn't yet know
about 'disabled' semantics.

I think that technically the zfs_dirlook change is enough to prevent
access, but preventing lookups and dir entries in an already opened .zfs
handle might also be a good idea to prevent races when modifying the
property at runtime.

Add zfs_snapshot_no_setuid parameter to control whether automatically
mounted snapshots have the setuid mount option set or not.

this could be considered a partial fix for one of the scenarios
mentioned in desired.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Co-authored-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #3963
Closes #16587
2024-10-02 09:12:02 -07:00
Sanjeev Bagewadi
20232ecfaa Support for longnames for files/directories (Linux part)
This patch adds the ability for zfs to support file/dir name up to 1023
bytes. This number is chosen so we can support up to 255 4-byte
characters. This new feature is represented by the new feature flag
feature@longname.

A new dataset property "longname" is also introduced to toggle longname
support for each dataset individually. This property can be disabled,
even if it contains longname files. In such case, new file cannot be
created with longname but existing longname files can still be looked
up.

Note that, to my knowledge native Linux filesystems don't support name
longer than 255 bytes. So there might be programs not able to work with
longname.

Note that NFS server may needs to use exportfs_get_name to reconnect
dentries, and the buffer being passed is limit to NAME_MAX+1 (256). So
NFS may not work when longname is enabled.

Note, FreeBSD vfs layer imposes a limit of 255 name lengh, so even
though we add code to support it here, it won't actually work.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #15921
2024-10-01 13:40:27 -07:00
Don Brady
141368a4b6
Restrict raidz faulted vdev count
Specifically, a child in a replacing vdev won't count when assessing
the dtl during a vdev_fault()

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16569
2024-10-01 09:12:11 -07:00
Tino Reichardt
b1958b531b
ZTS: Replace MD5 and SHA256 wit XXH128
For data integrity checks as done in ZTS, the verification for
unintended data corruption with xxhash128 should be a lot faster
and perfectly usable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16577
2024-09-28 09:24:05 -07:00
Brian Behlendorf
834b90fb81
ZTS: Fix zpool_import_hostid_changed_unclean_export
Update the test case to freeze the pool then export it to better
simulate a hard failure.  This is preferable to copying the vdev
while the pool's imported since with a copy we're not guaranteed
the on-disk state will be consistent.  That can in turn result
in a pool import failure and a spurious test failure.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16578
2024-09-28 09:18:21 -07:00
Brian Behlendorf
ac8837d4d6
ZTS: Update deadman_sync threshold
Lower the minimum number of expected deadman events from 4 to 3.  All
that is strictly required is a single event to consider the test a
pass.  However, since I've never seen a count of less than 3 reported
by the CI that should be sufficient.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16575
2024-09-27 09:14:13 -07:00
Brian Behlendorf
ab1b87e747
ZTS: Fix zpool_import_hostid_changed_cachefile_unclean_export
Update the test case to freeze the pool then export it to better
simulate a hard failure.  This is preferable to copying the vdev
while the pool's imported since with a copy we're not guaranteed
the on-disk state will be consistent.  That can in turn result
in a pool import failure and a spurious test failure.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16570
2024-09-26 09:58:11 -07:00
Brian Behlendorf
b2ca5105b7
ZTS: Fix zpool_reguid Makefile.am and tests
The zpool_reguid tests were not being included the dist tarball
resulting in them not running.  This is reported as a "failed
verification" warning by the CI.  Add the tests to the correct
Makefile.am.

Additionally, remove the usage of 'bc -e <expr>' from the tests.
This option is only supported by the FreeBSD version of bc.

Update the test case to reflect the 0 is not a valid GUID.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16559
2024-09-25 07:50:04 -07:00
George Melikov
e419a63bf4
xattr dataset prop: change defaults to sa
It's the main recommendation to set xattr=sa
even in man pages, so let's set it by default.

xattr=sa don't use feature flag, so in the worst
case we'll have non-readable xattrs by other
non-openzfs platforms.

Non-overridden default `xattr` prop of existing pools
will automatically use `sa` after this commit too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #15147
2024-09-23 09:50:48 -07:00
Umer Saleem
aeb79f2b29 ZTS: Fix skipping over comment lines in zpool_create.shlib
In zpool_create.shlib, check_feature_set iterates over all features
mentioned in provided compatibility file to check if only those
features are enabled on the pool.

This commit fixes skipping over comment lines correctly. Otherwise,
the test case fails as comment lines are also treated as feature names
by check_feature_set function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15909
2024-09-21 10:29:20 -07:00
Tino Reichardt
4bf6a2ab87
ZTS: use openssl for md5digest and sha256digest
On larger files this should improve the speed.

Sample values of my system:

[mcmilk@xz]$ time dd if=/dev/zero bs=128k count=1k | sha256sum
254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e452b97453917  -
real    0m1,050s
user    0m0,985s
sys     0m0,153s

[mcmilk@xz]$ time dd if=/dev/zero bs=128k count=1k | openssl sha256 -r
254bcc3fc4f27172636df4bf32de9f107f620d559b20d760197e452b97453917 *stdin
real    0m0,254s
user    0m0,206s
sys     0m0,160s

I think cli_root/zdb/zdb_backup.ksh runs also an FreeBSD and I needed to
include the sysutils/coreutils package for the FreeBSD tests within the
QEMU patchset.

This could be reverted, when this pull request gets upstream

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16543
2024-09-19 15:53:57 -07:00
Don Brady
ddf5f34f06
Avoid fault diagnosis if multiple vdevs have errors
When multiple drives are throwing errors, it is likely not
a drive failing but rather a failure above the drives, like
a controller.  The active cases context of the drive's peers
is now considered when making a diagnosis.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16531
2024-09-18 11:36:48 -07:00
Tino Reichardt
c4d1a19b33 ZTS: increase timeout of mmap_sync_001_pos
On load the test needs sometimes a bit more time then just one second.
Doubling the time will help on the QEMU based testings.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16537
2024-09-17 12:03:08 -07:00
Tino Reichardt
5cb3e2861e ZTS: fix raidz_expand_001_pos and raidz_expand_002_pos
Sometimes the pool may start an auto scrub.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16537
2024-09-17 12:02:58 -07:00
Tino Reichardt
4999f49513 ZTS: fix zpool_status_008_pos test on qemu vm's
The test needs some adjusting within the timings.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Co-authored-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #16537
2024-09-17 12:01:13 -07:00
Brian Atkinson
a10e552b99
Adding Direct IO Support
Adding O_DIRECT support to ZFS to bypass the ARC for writes/reads.

O_DIRECT support in ZFS will always ensure there is coherency between
buffered and O_DIRECT IO requests. This ensures that all IO requests,
whether buffered or direct, will see the same file contents at all
times. Just as in other FS's , O_DIRECT does not imply O_SYNC. While
data is written directly to VDEV disks, metadata will not be synced
until the associated  TXG is synced.
For both O_DIRECT read and write request the offset and request sizes,
at a minimum, must be PAGE_SIZE aligned. In the event they are not,
then EINVAL is returned unless the direct property is set to always (see
below).

For O_DIRECT writes:
The request also must be block aligned (recordsize) or the write
request will take the normal (buffered) write path. In the event that
request is block aligned and a cached copy of the buffer in the ARC,
then it will be discarded from the ARC forcing all further reads to
retrieve the data from disk.

For O_DIRECT reads:
The only alignment restrictions are PAGE_SIZE alignment. In the event
that the requested data is in buffered (in the ARC) it will just be
copied from the ARC into the user buffer.

For both O_DIRECT writes and reads the O_DIRECT flag will be ignored in
the event that file contents are mmap'ed. In this case, all requests
that are at least PAGE_SIZE aligned will just fall back to the buffered
paths. If the request however is not PAGE_SIZE aligned, EINVAL will
be returned as always regardless if the file's contents are mmap'ed.

Since O_DIRECT writes go through the normal ZIO pipeline, the
following operations are supported just as with normal buffered writes:
Checksum
Compression
Encryption
Erasure Coding
There is one caveat for the data integrity of O_DIRECT writes that is
distinct for each of the OS's supported by ZFS.
FreeBSD - FreeBSD is able to place user pages under write protection so
          any data in the user buffers and written directly down to the
	  VDEV disks is guaranteed to not change. There is no concern
	  with data integrity and O_DIRECT writes.
Linux - Linux is not able to place anonymous user pages under write
        protection. Because of this, if the user decides to manipulate
	the page contents while the write operation is occurring, data
	integrity can not be guaranteed. However, there is a module
	parameter `zfs_vdev_direct_write_verify` that controls the
	if a O_DIRECT writes that can occur to a top-level VDEV before
	a checksum verify is run before the contents of the I/O buffer
        are committed to disk. In the event of a checksum verification
	failure the write will return EIO. The number of O_DIRECT write
	checksum verification errors can be observed by doing
	`zpool status -d`, which will list all verification errors that
	have occurred on a top-level VDEV. Along with `zpool status`, a
	ZED event will be issues as `dio_verify` when a checksum
	verification error occurs.

ZVOLs and dedup is not currently supported with Direct I/O.

A new dataset property `direct` has been added with the following 3
allowable values:
disabled - Accepts O_DIRECT flag, but silently ignores it and treats
	   the request as a buffered IO request.
standard - Follows the alignment restrictions  outlined above for
	   write/read IO requests when the O_DIRECT flag is used.
always   - Treats every write/read IO request as though it passed
           O_DIRECT and will do O_DIRECT if the alignment restrictions
	   are met otherwise will redirect through the ARC. This
	   property will not allow a request to fail.

There is also a module parameter zfs_dio_enabled that can be used to
force all reads and writes through the ARC. By setting this module
parameter to 0, it mimics as if the  direct dataset property is set to
disabled.

Reviewed-by: Brian Behlendorf <behlendorf@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Co-authored-by: Mark Maybee <mark.maybee@delphix.com>
Co-authored-by: Matt Macy <mmacy@FreeBSD.org>
Co-authored-by: Brian Behlendorf <behlendorf@llnl.gov>
Closes #10018
2024-09-14 13:47:59 -07:00
Rob Norris
b3b7491615 build: rename FORCEDEBUG_CPPFLAGS to LIBZPOOL_CPPFLAGS
This is just a very small attempt to make it more obvious that these
flags aren't optional for libzpool-using programs, by not making it seem
like there's an option to say "well, I don't _want_ to force debugging".

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Issue #16476
Closes #16477
2024-08-27 12:53:27 -07:00
Mateusz Piotrowski
6be8bf5552
zpool: Provide GUID to zpool-reguid(8) with -g (#16239)
This commit extends the zpool-reguid(8) command with a -g flag, which
allows the user to specify the GUID to set.

This change also adds some general tests for zpool-reguid(8).

Sponsored-by: Wasabi Technology, Inc.
Sponsored-by: Klara, Inc.

Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-26 09:27:24 -07:00
Rob Norris
a1902f4950 ddt: block scan until log is flushed, and flush aggressively
The dedup log does not have a stable cursor, so its not possible to
persist our current scan location within it across pool reloads.
Beccause of this, when walking (scanning), we can't treat it like just
another source of dedup entries.

Instead, when a scan is wanted, we switch to an aggressive flushing
mode, pushing out entries older than the scan start txg as fast as we
can, before starting the scan proper.

Entries after the scan start txg will be handled via other methods; the
DDT ZAPs and logs will be written as normal, and blocks not seen yet
will be offered to the scan machinery as normal.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15895
2024-08-16 12:03:43 -07:00
Rob Norris
cd69ba3d49 ddt: dedup log
Adds a log/journal to dedup. At the end of txg, instead of writing the
entry directly to the ZAP, instead its adding to an in-memory tree and
appended to an on-disk object. The on-disk object is only read at
import, to reload the in-memory tree.

Lookups first go the the log tree before going to the ZAP, so
recently-used entries will remain close by in memory. This vastly
reduces overhead from dedup IO, as it will not have to do so many
read/update/write cycles on ZAP leaf nodes.

A flushing facility is added at end of txg, to push logged entries out
to the ZAP. There's actually two separate "logs" (in-memory tree and
on-disk object), one active (recieving updated entries) and one flushing
(writing out to disk). These are swapped (ie flushing begins) based on
memory used by the in-memory log trees and time since we last flushed
something.

The flushing facility monitors the amount of entries coming in and being
flushed out, and calibrates itself to try to flush enough each txg to
keep up with the ingest rate without competing too much with other IO.
Multiple tuneables are provided to control the flushing facility.

All the histograms and stats are update to accomodate the log as a
separate entry store. zdb gains knowledge of how to count them and dump
them. Documentation included!

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15895
2024-08-16 12:03:35 -07:00
Rob Norris
2b131d7345 ZTS: tests for dedup legacy/FDT tables
Very basic coverage to make sure things appear to work, have the right
format on disk, and pool upgrades and mixed table types work as
expected.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15892
2024-08-16 12:00:58 -07:00
Rob Norris
db2b1fdb79 ddt: add FDT feature and support for legacy and new on-disk formats
This is the supporting infrastructure for the upcoming dedup features.

Traditionally, dedup objects live directly in the MOS root. While their
details vary (checksum, type and class), they are all the same "kind" of
thing - a store of dedup entries.

The new features are more varied than that, and are better thought of as
a set of related stores for the overall state of a dedup table.

This adds a new feature flag, SPA_FEATURE_FAST_DEDUP. Enabling this will
cause new DDTs to be created as a ZAP in the MOS root, named
DDT-<checksum>. The is used as the root object for the normal type/class
store objects, but will also be a place for any storage required by new
features.

This commit adds two new fields to ddt_t, for version and flags. These
are intended to describe the structure and features of the overall dedup
table, and are stored as-is in the DDT root. In this commit, flags are
always zero, but the intent is that they can be used to hang optional
logic or state onto for new dedup features. Version is always 1.

For a "legacy" dedup table, where no DDT root directory exists, the
version will be 0.

ddt_configure() is expected to determine the version and flags features
currently in operation based on whether or not the fast_dedup feature is
enabled, and from what's available on disk. In this way, its possible to
support both old and new tables.

This also provides a migration path. A legacy setup can be upgraded to
FDT by creating the DDT root ZAP, moving the existing objects into it,
and setting version and flags appropriately. There's no support for that
here, but it would be straightforward to add later and allows the
possibility that newer features could be applied to existing dedup
tables.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: iXsystems, Inc.
Closes #15892
2024-08-16 11:58:59 -07:00
Jitendra Patidar
d2ccc21552
Fix projid accounting for xattr objects
zpool upgraded with 'feature@project_quota' needs re-layout of SA's
to fix the SA_ZPL_PROJID at SA_PROJID_OFFSET (128). Its necessary for
the correct accounting of object usage against its projid.
Old object (created before upgrade) when gets a projid assigned, its
SA gets re-layout via sa_add_projid(). If object has xattr dir, SA
of xattr dir also gets re-layout. But SA re-layout of xattr objects
inside a xattr dir is not done.

Fix zfs_setattr_dir() to re-layout SA's on xattr objects, when setting
projid on old xattr object (created before upgrade).

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes #16355
Closes #16356
2024-08-14 17:59:19 -07:00
Tony Hutter
d06de4f007
ZTS: Use /dev/urandom instead of /dev/random
Use /dev/urandom so we never have to wait on entropy.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16442
2024-08-14 12:27:07 -07:00
Mateusz Piotrowski
24e6585e76 libzfs.h: Set ZFS_MAXPROPLEN and ZPOOL_MAXPROPLEN to ZAP_MAXVALUELEN
So far, the values of ZFS_MAXPROPLEN and ZPOOL_MAXPROPLEN were equal to
MAXPATHLEN, which is 1024 on FreeBSD and 4096 on Linux. This wasn't
ideal. Some of the surprising outcomes of this implementation are:

1. When creating a pool user property with zpool-set(8), libzfs makes
   sure that the length of the property's value is less than
   ZFS_MAXPROPLEN. However, the ZFS kernel module does not do that.
   Instead, it checks the length against ZAP_MAXVALUELEN. As a result,
   it is possible to create a property the length of which is going to
   be larger than zpool(8) is ready to read.
2. A pool user property created on Linux is too big to be read on
   FreeBSD.

This change sets both ZFS_MAXPROPLEN and ZPOOL_MAXPROPLEN to
ZAP_MAXVALUELEN, which is 8192 at the moment.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16248
2024-08-08 15:23:40 -07:00
Mateusz Piotrowski
7ceb9ad630 tests: user_property_001_pos: Remove unnecessary evals
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16248
2024-08-08 15:23:15 -07:00
Mateusz Piotrowski
b38fccc646 tests: user_property: Clarify comments
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #16248
2024-08-08 15:21:52 -07:00
Mark Johnston
0ccd4b9d01 ZTS: Add a test to verify that copy_file_range obeys RLIMIT_FSIZE
Signed-off-by: Mark Johnston <markj@FreeBSD.org>

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-08-08 10:44:13 -07:00
Tino Reichardt
bd949b10be
ZTS: small fix for SEEK_DATA/SEEK_HOLE tests (#16413)
Some libc's like uClibc lag the proper definition of SEEK_DATA
and SEEK_HOLE. Since we have only two files in ZTS which use
these definitons, let's define them by hand:

```
#ifndef SEEK_DATA
#define SEEK_DATA 3
#endif
#ifndef SEEK_HOLE
#define SEEK_HOLE 4
#endif
```

There should be no failures, because:
- FreeBSD has support for SEEK_DATA/SEEK_HOLE since FreeBSD 8
- Linux has it since Linux 3.1
- the libc will submit the parameters unchanged to the kernel

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2024-08-07 09:52:37 -07:00
Allan Jude
cbcb522439
Fix the names of some FreeBSD sysctls in include/tunables.cfg (#16395)
Sponsored-by: Klara, Inc.

Signed-off-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2024-08-06 16:36:55 -07:00
Tino Reichardt
90a02d7063
ZTS: fix io_uring test on RHEL 9 variants (#16411)
Simplify the test, by using the variable "$PLATFORM_ID" in favor
of "$REDHAT_SUPPORT_PRODUCT_VERSION".

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-08-06 16:30:11 -07:00
Tony Hutter
dab810014e ZTS: Add zfs/zpool JSON sanity tests
Run basic JSON validation tests on the new `zfs|zpool -j` output.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Umer Saleem <usaleem@ixsystems.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16217
2024-08-06 12:47:15 -07:00
Rob Norris
6c7d41a643 ZTS: remove skips for zvol_misc tests
Last commit should fix the underlying problem, so these should be
passing reliably again.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16364
2024-08-06 12:08:24 -07:00
Tino Reichardt
e9f51ebd94
ZTS: fix zfs_copies_006_pos test on Ubuntu 20.04 (#16409)
This test was failing before:
- FAIL cli_root/zfs_copies/zfs_copies_006_pos (expected PASS)

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-08-05 16:18:07 -07:00
Tino Reichardt
8d4ad5adc7
ZTS: fix history_007_pos test on Ubuntu 24.04 (#16410)
The timezone "US/Mountain" isn't supported on newer linux versions.
Using the correct timezone "America/Denver" like it's done in FreeBSD
will fix this. Older Linux distros should behave also okay with this.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
2024-08-05 16:17:23 -07:00
Jitendra Patidar
d60debbf59
Fix sa_add_projid to lookup and update SA_ZPL_DXATTR (avoid DXATTR loss) (#16288)
sa_add_projid() gets called via zfs_setattr() for setting project id
on old file/dir, which were created before upgrading to project quota
feature. This function does lookup for all possible SA and update them
all together along with project ID at needed fixed offset. But its
missing lookup and update of SA_ZPL_DXATTR, effectively it losses
SA_ZPL_DXATTR.

Closes #16287
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
2024-07-31 18:41:49 -07:00
Allan Jude
62e7d3c89e
ddt: add support for prefetching tables into the ARC
This change adds a new `zpool prefetch -t ddt $pool` command which
causes a pool's DDT to be loaded into the ARC. The primary goal is to
remove the need to "warm" a pool's cache before deduplication stops
slowing write performance. It may also provide a way to reload portions
of a DDT if they have been flushed due to inactivity.

Sponsored-by: iXsystems, Inc.
Sponsored-by: Catalogics, Inc.
Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Will Andrews <will.andrews@klarasystems.com>
Signed-off-by: Fred Weigel <fred.weigel@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Will Andrews <will.andrews@klarasystems.com>
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Closes #15890
2024-07-26 09:16:18 -07:00
Allan Jude
c7ada64bb6
ddt: dedup table quota enforcement
This adds two new pool properties:
- dedup_table_size, the total size of all DDTs on the pool; and
- dedup_table_quota, the maximum possible size of all DDTs in the pool

When set, quota will be enforced by checking when a new entry is about
to be created. If the pool is over its dedup quota, the entry won't be
created, and the corresponding write will be converted to a regular
non-dedup write. Note that existing entries can be updated (ie their
refcounts changed), as that reuses the space rather than requiring more.

dedup_table_quota can be set to 'auto', which will set it based on the
size of the devices backing the "dedup" allocation device. This makes it
possible to limit the DDTs to the size of a dedup vdev only, such that
when the device fills, no new blocks are deduplicated.

Sponsored-by: iXsystems, Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Sean Eric Fagan <sean.fagan@klarasystems.com>
Closes #15889
2024-07-25 09:47:36 -07:00
Alexander Motin
82f281ad99
ZTS: Make do_vol_test() more deterministic (#16379)
- Explicitly disable compression since mkfile uses a zero buffer.
 - Explicitly sync file systems instead of waiting for timeout.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-24 09:33:30 -07:00
Don Brady
fb6d8cf229
Add some missing vdev properties (#16346)
Sponsored-by: Klara, Inc.
Sponsored-By: Wasabi Technology, Inc.

Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-23 16:34:09 -07:00
Tino Reichardt
37275fd109
ZTS: Change cp_stress to fit timings (#16369)
cp_stress is getting killed on the new QEMU-based github runners
we're developing. The problem is that the Linux based runners
should do 10 RUNS, where the FreeBSD based runners only have 3
RUNS to succeed.

This patch removes this different handling of Linux and FreeBSD.
The cp_stress test is running fine in around 2 minutes now.

Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-22 14:03:22 -07:00
Rob Norris
393b7ad695 zts: test single-disk pool resumes properly after disk pull
A single disk pool should suspend when its disk fails and hold the IO.
When the disk is returned, the pool should return and the IO be
reissued, leaving everything in good shape.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
2024-07-17 10:03:32 -07:00
Rob Norris
cbd95a950a
ZTS: handle FreeBSD version numbers correctly (#16340)
FreeBSD patchlevel versions are optional and, if present, in a different
location in the version string.

Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2024-07-12 10:58:03 -07:00
Rob Norris
10de12e9ed icp: reorganise SHA2 digest mechanisms
sha2_mech_type_t serves double-duty, as the list of MAC providers and
also the algo type for direct callers to SHA2Init. Until we disentangle
that, reorganise it to make the separation more clear. While we're
there, remove the digest mechs we don't use.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16209
2024-05-31 15:13:23 -07:00
Tony Hutter
a301dc364c
ZTS: Fix redacted_send failures on FreeBSD
We're seeing failures for redacted_deleted and redacted_mount
on FreeBSD 13-15:

    09:58:34.74 diff: /dev/fd/3: No such file or directory
    09:58:34.74 ERROR: diff /dev/fd/3 /dev/fd/4 exited 2

The test was trying to diff the file listings between two directories to
see if they are the same.  The workaround is to do a string comparison
of the directory listings instead of using `diff`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16224
2024-05-31 15:11:00 -07:00
Brian Behlendorf
6b95031f56
zed: Add deadman-slot_off.sh zedlet
Optionally turn off disk's enclosure slot if an I/O is hung
triggering the deadman.

It's possible for outstanding I/O to a misbehaving SCSI disk to
neither promptly complete or return an error.  This can occur due
to retry and recovery actions taken by the SCSI layer, driver, or
disk.  When it occurs the pool will be unresponsive even though
there may be sufficient redundancy configured to proceeded without
this single disk.

When a hung I/O is detected by the kmods it will be posted as a
deadman event.  By default an I/O is considered to be hung after
5 minutes.  This value can be changed with the zfs_deadman_ziotime_ms
module parameter.  If ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN is set
the disk's enclosure slot will be powered off causing the outstanding
I/O to fail.  The ZED will then handle this like a normal disk failure.
By default ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN is not set.

As part of this change `zfs_deadman_events_per_second` is added
to control the ratelimitting of deadman events independantly of
delay events.  In practice, a single deadman event is sufficient
and more aren't particularly useful.

Alphabetize the zfs_deadman_* entries in zfs.4.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16226
2024-05-29 10:46:41 -07:00
Rob N
d0aa9dbccf
Use memset to zero stack allocations containing unions
C99 6.7.8.17 says that when an undesignated initialiser is used, only
the first element of a union is initialised. If the first element is not
the largest within the union, how the remaining space is initialised is
up to the compiler.

GCC extends the initialiser to the entire union, while Clang treats the
remainder as padding, and so initialises according to whatever
automatic/implicit initialisation rules are currently active.

When Linux is compiled with CONFIG_INIT_STACK_ALL_PATTERN,
-ftrivial-auto-var-init=pattern is added to the kernel CFLAGS. This flag
sets the policy for automatic/implicit initialisation of variables on
the stack.

Taken together, this means that when compiling under
CONFIG_INIT_STACK_ALL_PATTERN on Clang, the "zero" initialiser will only
zero the first element in a union, and the rest will be filled with a
pattern. This is significant for aes_ctx_t, which in
aes_encrypt_atomic() and aes_decrypt_atomic() is initialised to zero,
but then used as a gcm_ctx_t, which is the fifth element in the union,
and thus gets pattern initialisation. Later, it's assumed to be zero,
resulting in a hang.

As confusing and undiscoverable as it is, by the spec, we are at fault
when we initialise a structure containing a union with the zero
initializer. As such, this commit replaces these uses with an explicit
memset(0).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16135
Closes #16206
2024-05-24 19:00:29 -07:00
Don Brady
975a13259b Add support for parallel pool exports
Changed spa_export_common() such that it no longer holds the
spa_namespace_lock for the entire duration and instead sets
spa_export_thread to indicate an import is in progress on the
spa.  This allows for an export to a diffent pool to proceed
in parallel while an export is still processing potentially
long operations like spa_unload_log_sm_flush_all().

Calls like spa_lookup() and spa_vdev_enter() that rely on
the spa_namespace_lock to serialize them against a concurrent
export, now wait for any in-progress export thread to complete
before proceeding.

The 'zpool import -a' sub-command also provides multi-threaded
support, using a thread pool to submit the exports in parallel.

Sponsored-By: Klara Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #16153
2024-05-14 08:57:41 -07:00
Don Brady
c3f2f1aa2d
vdev probe to slow disk can stall mmp write checker
Simplify vdev probes in the zio_vdev_io_done context to
avoid holding the spa config lock for a long duration.

Also allow zpool clear if no evidence of another host
is using the pool.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15839
2024-04-29 14:35:53 -07:00
Allan Jude
5044c4e3ff
Fast Dedup: ZAP Shrinking
This allows ZAPs to shrink. When there are two empty sibling leafs,
one of them is collapsed and its storage space is reused.
This improved performance on directories that at one time contained
a large number of files, but many or all of those files have since
been deleted.

This also applies to all other types of ZAPs as well.

Sponsored-by: iXsystems, Inc.
Sponsored-by: Klara, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com>
Closes #15888
2024-04-24 14:51:21 -07:00
Seth Troisi
cdae59e153
ZTS: user_namespace_004.ksh avoid error in cleanup if unsupported
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Seth Troisi <sethtroisi@google.com>
Closes #16114
2024-04-22 10:47:44 -07:00
George Wilson
c183d164aa
Parallel pool import
This commit allow spa_load() to drop the spa_namespace_lock so
that imports can happen concurrently. Prior to dropping the
spa_namespace_lock, the import logic will set the spa_load_thread
value to track the thread which is doing the import.

Consumers of spa_lookup() retain the same behavior by blocking
when either a thread is holding the spa_namespace_lock or the
spa_load_thread value is set. This will ensure that critical
concurrent operations cannot take place while a pool is being
imported.

The zpool command is also enhanced to provide multi-threaded support
when invoking zpool import -a.

Lastly, zinject provides a mechanism to insert artificial delays
when importing a pool and new zfs tests are added to verify parallel
import functionality.

Contributions-by: Don Brady <don.brady@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #16093
2024-04-22 09:42:38 -07:00
Rob Norris
26d49fec5f tests/quota: consistently clear quota property between tests
When run in isolation, quota_005_pos would fail in cleanup because it
would attempt restore the previous quota, which was 0, and so get an
error (because you can't set quota to '0', you have to use 'none').

It worked as part of the quota tag set because the previous tests did
not clean up their quota, so there was always a non-zero quota to return
to.

This adds a simple quota reset function, and has all quota tests run it
at cleanup. For the ones that weren't cleaning up, they now do, and for
quota_005_pos, which was trying to do the right thing, it now just
resets it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16097
2024-04-19 16:06:56 -07:00
Rob Norris
f75574cbaa tests/quota_005_pos: use a long int for doubling the quota size
When run in isolation, quota_005_pos would see an empty ~300G dataset.
Doubling it's space overflows a int32, which meant it was trying to then
set the quota to a negative value, and would fail.

When run as part of the quota tests, the filesystem appears to have
stuff in it, and so a lower available space, which doesn't overflow, and
so succeeds.

The bare minimum fix seems to be to use a int64 for the available space,
so it can be comfortably doubled. Here it is.

(Also a typo fix and a tiny bit of cleanup).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16097
2024-04-19 16:06:37 -07:00
Rob N
4725e543be
zinject: "no-op" error injection
When injected, this causes the matching IO to appear to succeed, but the
actual work is never submitted to the physical device. This can be used
to simulate a write-back cache servicing a write, but the backing device
has failed and the cache cannot complete the operation in the
background.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16085
2024-04-15 13:52:20 -07:00
Umer Saleem
a100a195fa
Add support for zfs mount -R <filesystem>
This commit adds support for mounting a dataset along with all of
it's children with '-R' flag for zfs mount. There can be scenarios
where we want to mount all datasets under one hierarchy instead of
mounting all datasets present on system with '-a' flag.

'-R' flag should work on all root and non-root datasets. Usage
information and man page has been updated for zfs mount. A test
for verifying the behavior for '-R' flag is also added.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16015
2024-04-11 15:10:24 -07:00
Rob Norris
1bf649cb0a vdev_disk: fix alignment check when buffer has non-zero starting offset
If a linear buffer spans multiple pages, and the first page has a
non-zero starting offset, the checker would not include the offset, and
so would think there was an alignment gap at the end of the first page,
rather than at the start.

That is, for a 16K buffer spread across five pages with an initial 512B
offset:

    [.XXXXXXX][XXXXXXXX][XXXXXXXX][XXXXXXXX][XXXXXXX.]

It would be interpreted as:

    [XXXXXXX.][XXXXXXXX]...

And be rejected as misaligned.

Since it's already a linear ABD, the "linearising" copy would just reuse
the buffer as-is, and the second check would failing, tripping the
VERIFY in vdev_disk_io_rw().

This commit fixes all this by including the offset in the check for
end-of-page alignment.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16076
2024-04-11 14:43:27 -07:00
Rob Norris
bc27c49404 tests: add test for vdev_disk page alignment check
This provides a test driver and a set of test vectors for the page
alignment check callback function vdev_disk_check_pages_cb().

Because there's no good facility for exposing this function to a
userspace test right now, for now I'm just duplicating the function and
adding commentary to remind people to keep them in sync.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16076
2024-04-11 14:42:46 -07:00
Rob Norris
756e10b0a1 tests: simple zinject disk fault arg check
Just making sure the valid values for disk faults are accepted.
Obviously we can do a lot more, but this will do to get us started.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15953
2024-04-03 16:06:43 -07:00
George Wilson
b1e46f869e
Add ashift validation when adding devices to a pool
Currently, zpool add allows users to add top-level vdevs that have
different ashifts but doing so prevents users from being able to
perform a top-level vdev removal. Often times consumers may not realize
that they have mismatched ashifts until the top-level removal fails.

This feature adds ashift validation to the zpool add command and will
fail the operation if the sector size of the specified vdev does not
match the existing pool. This behavior can be disabled by using the -f
flag. In addition, new flags have been added to provide fine-grained
control to disable specific checks. These flags
are:

--allow-in-use
--allow-ashift-mismatch
--allow-replicaton-mismatch

The force flag will disable all of these checks.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #15509
2024-03-29 13:15:56 -06:00
Robert Evans
e39e20b6dc
ZTS: fix flakiness in cp_files_002_pos
Fix RANDOM to not return zero.

Overwriting with `dd ... count=0` does not test anything.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16029
2024-03-27 14:59:16 -07:00
Alexander Motin
0c8eb974ff BRT: Check pool clone stats in more tests
This should allow to catch some leaks, if those happen.

While there fix some cosmetic issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007
2024-03-27 14:47:06 -07:00
Alexander Motin
b403427624 BRT: Fix tests to work on non-empty pools
It should not normally happen, but if it does, better to not fail
everything for no good reason, or it may be hard to debug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007
2024-03-27 14:46:48 -07:00
Tony Hutter
d6a3d3f12a
ZTS: Skip cross-fs bclone tests if FreeBSD < 14.0
Skip cross filesystem block cloning tests on FreeBSD if running
less than version 14.0.  Cross filesystem copy_file_range() was
added in FreeBSD 14.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15901
2024-02-16 08:59:56 -08:00
Kevin Greene
79c6dffa6b
Allowing PERFPOOL to be defined by zfs-test users
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kevin Greene <kevin.greene@delphix.com>
Closes #15868
2024-02-09 10:02:46 -08:00
Don Brady
cbe882298e
Add slow disk diagnosis to ZED
Slow disk response times can be indicative of a failing drive. ZFS
currently tracks slow I/Os (slower than zio_slow_io_ms) and generates
events (ereport.fs.zfs.delay).  However, no action is taken by ZED,
like is done for checksum or I/O errors.  This change adds slow disk
diagnosis to ZED which is opt-in using new VDEV properties:
  VDEV_PROP_SLOW_IO_N
  VDEV_PROP_SLOW_IO_T

If multiple VDEVs in a pool are undergoing slow I/Os, then it skips
the zpool_vdev_degrade().

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15469
2024-02-08 09:19:52 -08:00
Cameron Harr
0823388752
Add 'zpool status -e' flag to see unhealthy vdevs
When very large pools are present, it can be laborious to find
reasons for why a pool is degraded and/or where an unhealthy vdev
is. This option filters out vdevs that are ONLINE and with no errors
to make it easier to see where the issues are. Root and parents of
unhealthy vdevs will always be printed.

Testing:
ZFS errors and drive failures for multiple vdevs were simulated with
zinject.

Sample vdev listings with '-e' option
- All vdevs healthy
    NAME        STATE     READ WRITE CKSUM
    iron5       ONLINE       0     0     0

- ZFS errors
    NAME        STATE     READ WRITE CKSUM
    iron5       ONLINE       0     0     0
      raidz2-5  ONLINE       1     0     0
        L23     ONLINE       1     0     0
        L24     ONLINE       1     0     0
        L37     ONLINE       1     0     0

- Vdev faulted
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     FAULTED      0     0     0  too many errors

- Vdev faults and data errors
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-1  DEGRADED     0     0     0
        L2      FAULTED      0     0     0  too many errors
      raidz2-5  ONLINE       1     0     0
        L23     ONLINE       1     0     0
        L24     ONLINE       1     0     0
        L37     ONLINE       1     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     FAULTED      0     0     0  too many errors

- Vdev missing
    NAME        STATE     READ WRITE CKSUM
    iron5       DEGRADED     0     0     0
      raidz2-6  DEGRADED     0     0     0
        L67     UNAVAIL      3     1     0

- Slow devices when -s provided with -e
    NAME        STATE     READ WRITE CKSUM  SLOW
    iron5       DEGRADED     0     0     0     -
      raidz2-5  DEGRADED     0     0     0     -
        L10     FAULTED      0     0     0     0  external device fault
        L51     ONLINE       0     0     0    14

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #15769
2024-02-07 09:12:12 -08:00
Brian Behlendorf
6dccdf501e
BRT: Fix FICLONE/FICLONERANGE shortened copy
On Linux the ioctl_ficlonerange() and ioctl_ficlone() system calls
are expected to either fully clone the specified range or return an
error.  The range may be for an entire file.  While internally ZFS
supports cloning partial ranges there's no way to return the length
cloned to the caller so we need to make this all or nothing.

As part of this change support for the REMAP_FILE_CAN_SHORTEN flag
has been added.  When REMAP_FILE_CAN_SHORTEN is set zfs_clone_range()
will return a shortened range when encountering pending dirty records.
When it's clear zfs_clone_range() will block and wait for the records
to be written out allowing the blocks to be cloned.

Furthermore, the file range lock is held over the region being cloned
to prevent it from being modified while cloning.  This doesn't quite
provide an atomic semantics since if an error is encountered only a
portion of the range may be cloned.  This will be converted to an
error if REMAP_FILE_CAN_SHORTEN was not provided and returned to the
caller.  However, the destination file range is left in an undefined
state.

A test case has been added which exercises this functionality by
verifying that `cp --reflink=never|auto|always` works correctly.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15728
Closes #15842
2024-02-05 16:44:45 -08:00
Brian Behlendorf
435b173fd9
ZTS: Apply zfs_bclone_enabled to bclone tests
If block cloning is disabled by default then enable it when running
the bclone tests.  Follow up to #15529.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15796
2024-01-22 16:14:08 -08:00
Tino Reichardt
a0b2a93c41
fix: variable type with zfs-tests/cmd/clonefile.c
Compiling on arm64 freebsd-13.2 and arm64 almalinux-8 brings currently
this error:

```
  CC       tests/zfs-tests/cmd/clonefile.o
tests/zfs-tests/cmd/clonefile.c:166:43: error: result of comparison of \
constant -1 with expression of type 'char' is always true \
[-Werror,-Wtautological-constant-out-of-range-compare]
        while ((c = getopt(argc, argv, "crfdq")) != -1) {
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^  ~~
1 error generated.
gmake[2]: *** [Makefile:8675: tests/zfs-tests/cmd/clonefile.o] Error 1
```

Fix: use correct variable type `int`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #15783
2024-01-17 09:06:14 -08:00
Pawel Jakub Dawidek
f45dd90f34
Fix cloning into mmaped and cached file.
If the destination file is mmaped and the mmaped region was already
read, so it is cached, we need to update mmaped pages after successful
clone using update_pages().

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Pointed out by: Ka Ho Ng <khng@freebsd.org>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15772
2024-01-17 08:51:07 -08:00
Rob N
f0bf7a247d
Linux 6.7 compat: zfs_setattr fix atime update
In db4fc559c I messed up and changed this bit of code to set the inode
atime to an uninitialised value, when actually it was just supposed to
loading the atime from the inode to be stored in the SA. This changes it
to what it should have been.

Ensure times change by the right amount Previously, we only checked 
if the times changed at all, which missed a bug where the atime was 
being set to an undefined value.

Now ensure the times change by two seconds (or thereabouts), ensuring
we catch cases where we set the time to something bonkers

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15762
Closes #15773
2024-01-16 14:01:17 -08:00
Umer Saleem
995734ed12
ZTS: Test for clone, mmap and write for block cloning
For block cloning, if we mmap the cloned file and write from the
map into the file, it triggers a panic in dbuf_redirty() on Linux.

The same scenario causes data corruption on FreeBSD. Both these
issues are fixed under PR#15656 and PR#15665.

It would be good to add a test for this scenario in ZTS. The test
program and issue was produced by @robn.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15717
2024-01-16 13:15:10 -08:00
Stefan Lendl
66670ba9f0
fix(mount): do not truncate shares not zfs mount
When running zfs share -a resetting the exports.d/zfs.exports makes
sense the get a clean state.
Truncating was also called with zfs mount which would not populate the
file again.
Add test to verify shares persist after mount -a.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Stefan Lendl <s.lendl@proxmox.com>
Closes #15607 
Closes #15660
2024-01-12 12:05:11 -08:00
Brian Behlendorf
c4fa674367
Enable block_cloning tests on FreeBSD
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15749
2024-01-12 11:57:13 -08:00
Pawel Jakub Dawidek
4cf4bc7334
Block cloning tests.
The test mostly focus on testing various corner cases.
The tests take a long time to run, so for the common.run runfile
we randomly select a hundred tests.
To run all the bclone tests, bclone.run runfile should be used.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes #15631
2023-12-26 12:01:53 -08:00
Umer Saleem
dbda45160f
Test LWB buffer overflow for block cloning
PR#15634 removes 128K into 2x68K LWB split optimization, since it
was found to cause LWB buffer overflow while trying to write 128KB
TX_CLONE_RANGE record with 1022 block pointers into 68KB buffer,
with multiple VDEVs ZIL.

This commit adds a test for this particular scenario by writing
maximum sizes TX_CLONE_RANE record with 1022 block pointers into
68KB buffer, with two SLOG devices.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15672
2023-12-15 14:18:27 -08:00
Brian Behlendorf
f4b97c1e00
ZTS: Update raidz_expand_005_pos.ksh
Align the raidz_expand_005_pos test with the raidz_expand_004_pos test
and only verify no errors were reported.  Allow scrub repair IO.

Reviewed-by: Don Brady <don.brady@klarasystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15663
2023-12-12 09:56:19 -08:00
Tony Hutter
b1748eaee0
ZTS: Add dirty dnode stress test
Add a test for the dirty dnode SEEK_HOLE/SEEK_DATA bug described in
https://github.com/openzfs/zfs/issues/15526

The bug was fixed in https://github.com/openzfs/zfs/pull/15571 and
was backported to 2.2.2 and 2.1.14.  This test case is just to
make sure it does not come back.

seekflood.c originally written by Rob Norris.

Reviewed-by: Graham Perrin <grahamperrin@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15608
2023-12-11 09:59:59 -08:00
Brian Behlendorf
9f6ad4dcb6
ZTS: Add zpool_import_status.ksh to Makefile.am
The zpool_import_status.ksh test case was not being run because
it was not included in the Makefile.am.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15655
2023-12-11 09:40:32 -08:00
Brian Behlendorf
8ad73bf449
ZTS: Disable io_uring test on CentOS 9
The io_uring test fails on CentOS 9 with the following fio error.
Disable the test for the benefit of the CI until this can be fully
investigated.  This basic test passes as expected on newer kernels.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15636
2023-12-08 17:31:31 -08:00
Ameer Hamza
2ebb9a4811 ZTS: Add test cases for block cloning replay
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15614
2023-12-08 16:39:14 -08:00
Ameer Hamza
5ff43969e7 ZTS: block_cloning: Use numeric sort for get_same_blocks
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15614
2023-12-08 16:38:45 -08:00
Don Brady
687e4d7f9c
Extend import_progress kstat with a notes field
Detail the import progress of log spacemaps as they can take a very
long time.  Also grab the spa_note() messages to, as they provide
insight into what is happening

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Closes #15539
2023-12-05 14:27:56 -08:00
oromenahar
c7b6119268
Allow block cloning across encrypted datasets
When two datasets share the same master encryption key, it is safe
to clone encrypted blocks. Currently only snapshots and clones
of a dataset share with it the same encryption key.

Added a test for:
- Clone from encrypted sibling to encrypted sibling with
  non encrypted parent
- Clone from encrypted parent to inherited encrypted child
- Clone from child to sibling with encrypted parent
- Clone from snapshot to the original datasets
- Clone from foreign snapshot to a foreign dataset
- Cloning from non-encrypted to encrypted datasets
- Cloning from encrypted to non-encrypted datasets

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Original-patch-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Signed-off-by: Kay Pedersen <mail@mkwg.de>
Closes #15544
2023-12-05 11:03:48 -08:00
VaibhavB
7d68900af3
ZTS: get_persistent_disk_name can return truncated names
Instead of using only the 3rd element return the entire string after 
the split to handle device names with dashes.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vaibhav Bhanawat <vaibhav.bhanawat@delphix.com>
Closes #15567
2023-11-29 09:34:29 -08:00
Rob Norris
213d682967 zdb: show BRT statistics and dump its contents
Same idea as the dedup stats, but for block cloning.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15541
2023-11-27 13:35:07 -08:00
Tony Hutter
3551a32e5e
ZTS: Fix zfs_load-key failures on F39
The zfs_load-key tests were failing on F39 due to their use of the
deprecated ssl.wrap_socket function.  This commit updates the test to
instead use ssl.SSLContext() as described in
https://stackoverflow.com/a/65194957.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15534
Closes #15550
2023-11-27 13:24:37 -08:00
Tony Hutter
a94860a6de
ZTS: Fix 'could not unmount datasets' on Alma 9 (#15542)
Many tests are failing on AlmaLinux 9 because ZTS could not destroy the
pool in cleanup.  This was due to $PWD being set to '.' instead of the
expected full path.  This patch sets $PWD to the full path.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
2023-11-20 16:07:32 -08:00
Rich Ercolani
03e9caaec0
Add a tunable to disable BRT support.
Copy the disable parameter that FreeBSD implemented, and extend it to
work on Linux as well, until we're sure this is stable.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #15529
2023-11-16 11:35:22 -08:00
Alexander Motin
35da345160
L2ARC: Restrict write size to 1/4 of the device
PR #15457 exposed weird logic in L2ARC write sizing. If it appeared
bigger than device size, instead of liming write it reset all the
system-wide tunables to their default.  Aside of being excessive,
it did not actually help with the problem, still allowing infinite
loop to happen.

This patch removes the tunables reverting logic, but instead limits
L2ARC writes (or at least eviction/trim) to 1/4 of the capacity.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15519
2023-11-14 13:47:57 -08:00
Umer Saleem
40fccc423a ZTS: Test for all known zpool feature sets
zpool_create_features_007_pos only tested for compat-2020 feature
set. It would be useful to test for all known features sets. If
any additional feature is found enabled that is not present in
compatibility list or feature set, it should be caught and
reported earlier.

This commit also removes encryption from openzfsonosx-1.8.1
compatibility list. Encryption enables bookmark_v2, since it is
a dependency of encryption, but not listed in openzfsonoxx-1.8.1
compatibility list.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15505
2023-11-09 10:58:23 -08:00
Don Brady
5caeef02fa
RAID-Z expansion feature
This feature allows disks to be added one at a time to a RAID-Z group,
expanding its capacity incrementally.  This feature is especially useful
for small pools (typically with only one RAID-Z group), where there
isn't sufficient hardware to add capacity by adding a whole new RAID-Z
group (typically doubling the number of disks).

== Initiating expansion ==

A new device (disk) can be attached to an existing RAIDZ vdev, by
running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank
raidz2-0 sda`.  The new device will become part of the RAIDZ group.  A
"raidz expansion" will be initiated, and the new device will contribute
additional space to the RAIDZ group once the expansion completes.

The `feature@raidz_expansion` on-disk feature flag must be `enabled` to
initiate an expansion, and it remains `active` for the life of the pool.
In other words, pools with expanded RAIDZ vdevs can not be imported by
older releases of the ZFS software.

== During expansion ==

The expansion entails reading all allocated space from existing disks in
the RAIDZ group, and rewriting it to the new disks in the RAIDZ group
(including the newly added device).

The expansion progress can be monitored with `zpool status`.

Data redundancy is maintained during (and after) the expansion.  If a
disk fails while the expansion is in progress, the expansion pauses
until the health of the RAIDZ vdev is restored (e.g. by replacing the
failed disk and waiting for reconstruction to complete).

The pool remains accessible during expansion.  Following a reboot or
export/import, the expansion resumes where it left off.

== After expansion ==

When the expansion completes, the additional space is available for use,
and is reflected in the `available` zfs property (as seen in `zfs list`,
`df`, etc).

Expansion does not change the number of failures that can be tolerated
without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after
expansion).

A RAIDZ vdev can be expanded multiple times.

After the expansion completes, old blocks remain with their old
data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but
distributed among the larger set of disks.  New blocks will be written
with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been
expanded once to 6-wide, has 4 data to 2 parity).  However, the RAIDZ
vdev's "assumed parity ratio" does not change, so slightly less space
than is expected may be reported for newly-written blocks, according to
`zfs list`, `df`, `ls -s`, and similar tools.

Sponsored-by: The FreeBSD Foundation
Sponsored-by: iXsystems, Inc.
Sponsored-by: vStack
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Authored-by: Matthew Ahrens <mahrens@delphix.com>
Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com>
Contributions-by: Stuart Maybee <stuart.maybee@comcast.net>
Contributions-by: Thorsten Behrens <tbehrens@outlook.com>
Contributions-by: Fmstrat <nospam@nowsci.com>
Contributions-by: Don Brady <dev.fs.zfs@gmail.com>
Signed-off-by: Don Brady <dev.fs.zfs@gmail.com>
Closes #15022
2023-11-08 10:19:41 -08:00
Tony Hutter
05c4710e89 Revert "zvol: Temporally disable blk-mq"
This reverts commit aefb6a2bd6.

aefb6a2bd temporally disabled blk-mq until we could fix a fix for

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15439
2023-10-24 14:41:25 -07:00
VaibhavB
de7b1ae30a
run-zts test procfs/pool_state failed with uncorrectable I/O failure
Once we trigger the zpool scrub, all zpool/zfs command gets stuck for 
180 seconds. Post 180 seconds zpool/zfs commands gets start executing 
however few more seconds(10s) it take to update the status. hence 
sleeping for 200 seconds so that we get the correct status.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: vaibhav.bhanawat <vaibhav.bhanawat@delphix.com>
Closes #15364
2023-10-20 11:57:39 -07:00
dennisfriedrichsen
0d6cec418e
Fix typo in tests/zfs-tests/tests/functional/cli_user/misc/misc.cfg
Reviewed-by: Rob N <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com>
Closes #15417
2023-10-20 11:52:13 -07:00