Currently there is only a dedup ratio reported via pool properties.
If dedup is enabled only for some datasets, it is impossible to say
how much space the ratio actually covers. Fix this by introducing
dedupused/dedupsaved pool properties, similar to earlier added
block cloning ones. Combined with work to expose allocation classes
stats, it should give user-space enough visibility to correlate
`zpool list` and `zfs list` space numbers.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ryan Moeller <ryan.moeller@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#18245
Added vdev property to disable the vdev scheduler.
The intention behind this property is to improve IOPS
performance when using o_direct.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: MigeljanImeri <ImeriMigel@gmail.com>
Closes#17358
While technically its not a problem to clone between datasets with
different properties, it might create expectation of new properties
being applied during data move, while actually it won't happen.
For copies and checksum it may mean incorrect safety expectations.
For dedup, compression and special_small_blocks -- performance and
space usage. New zfs_bclone_strict_properties tunable controls it.
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#18180
Rewrite of cloned and snapshotted blocks can allocate additional
space, that may be undesired. In some cases it may have sense
to still rewrite snapshotted blocks, expecting the snapshots to
rotate with time, freeing space. In other cases rewrite of cloned
blocks may be acceptable, despite persistent space usage increase.
For this reason add them as separate flags to `zfs rewrite`.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#18179
Add l2arc_dwpd_limit, remove l2arc_write_boost, update related tunables.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes#18093
This implemented support for having multiple datasets unlocked and
mounted when a session is opened.
Example: `homes=rpool/home,tank/users`
Extra unit tests have been added
A man page documents have been added `man 8 pam_zfs_key`. A few
references to the new man page have also been added in other documents.
Signed-off-by: Dennis Vestergaard Værum <github@varum.dk>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes#18077
Add a read-only dataset property, snapshots_changed_nsecs, which
exposes the nanosecond resolution version of snapshots_changed.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Wolfgang Hoschek <wolfgang.hoschek@mac.com>
Closes#17998Closes#18031
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#18080
With higher throughput and lower latency of modern devices ZFS can
happily live with pretty short (fractions of a second) TXGs. But
the two decade old multi-second minimal time limits can almost stop
payload writes by extending TXGs beyond dirty data limits of ARC
ability to amortize it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#18060
"zdb -r -O pool/dataset obj-id destination" will copy
the file with object-id obj-id to the named destination;
without -O it'll still be interpreted as a pathname.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Sean Eric Fagan <sean.fagan@klarasystems.com>
Closes#16307
Previous code effectively enforced that all async free ZIOs were
_issued_ within the TXG timeout. But they could take forever to
complete, especially if the required metadata were not in ARC.
This patch introduces periodic waits every 2000 ZIOs, which should
give at least somewhat reasonable TXG timings even for single HDD
pools with empty ARC. And makes them complete within half of the
TXG timeout, since we might still need time to sync DDT and BRT.
While there, change zfs_max_async_dedup_frees semantics to include
also clone and gang blocks, which are similar. Bump the default
value from set long ago to be more forgiving to block cloning
(still not having logs and benefiting from large TXGs), now that
we have better working time limits. The limit now is a possible
amount of dirty data produced by BRT updates.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#18043
We've observed a number of cases when pool import stuck for many
minutes due to large async destroy trying to load DDT or BRT from
HDD pool. While proper destroy dosage is a separate problem,
lets give import process a chance to complete before that at all.
It may be not enough if there is a lot of ZIL to replay, but that
is harder to cover, since those are in separate syscalls.
Code investigation shown that we already have this mechanism used
for scrub/resilver, so this patch converts SCAN_IMPORT_WAIT_TXGS
into a tunable and applies it to async destroys also.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#18033
Before parallel eviction implementation zfs_arc_evict_batch_limit
caused loop exits after evicting 10 headers. The cost of it is not
big and well motivated. Now though taskq task exit after the same
10 headers is much more expensive. To cover the context switch
overhead of taskq introduce another level of batching, controlled
by zfs_arc_evict_batches_limit tunable, used only for parallel
eviction.
My tests including 36 parallel reads with 4KB recordsize that shown
1.4GB/s (~460K blocks/s) before with heavy arc_evict_lock contention,
now show 6.5GB/s (~1.6M blocks/s) without arc_evict_lock contention.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17970
Introduce a new vdev property `VDEV_PROP_SLOW_IO_REPORTING` that
allows users to disable notifications for slow devices.
This prevents ZED and/or ZFSD from degrading the pool due to slow
I/O.
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@FreeBSD.org>
Closes 17477
Implement BRT (Block Reference Table) prefetch functionality similar
to existing DDT prefetch. This allows preloading BRT metadata into
ARC to improve performance for block cloning operations and frees
of earlier cloned blocks.
Make -t parameter optional. When omitted, prefetch all supported
metadata types (both DDT and BRT now).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17890
According to my observations, BRT ZAPs are typically compressible
3:1 for data and 2:1 for indirects. With ashift=12, typical these
days, it means increasing the block sizes to 8KB we may get most
of possible compression, reducing on-disk and in-ARC BRT footprint
in half by the cost of some compression/decompression overhead,
but without real write inflation, only some dirty data increase.
Increase to 32KB similar to DDT could further increase compression
and storage efficiency, but at the cost of write inflation and
much bigger dirty data increase, which we can not properly control
now. So lets leave this for a time when BRT log gets implemented.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17916
Update description of zpool import --rewind-to-checkpoint in
man/man7/zpoolconcepts.7 to explain that rewinding automatically
discards a checkpoint.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com>
Closes#12646Closes#17918
Free issue threads might block waiting for synchronous DDT, BRT or
GANG header reads. So unlike other taskqs using ZTI_SCALE to scale
with number of CPUs, here we also need some amount of threads to
potentially saturate pool reads. I am not sure we always want the
96 threads we had before ZTI_SCALE introduction at #11966 on small
systems, but lets make it at least 32.
While here, make free taskqs configurable, similar to read and
write ones.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17903
We've heard anecdotes that suggest some
confusion/surprise/disappointment that a changed recordsize is not
applied during rewrite. Until such time as we actually can do that, we
can at least explicitly mention it at something that doesn't work.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#17898
Add an introductory sentance explaining why the reader may want to use
this command, and establishing the requirement that the jail must be
running. Move other requirements from the description of the subcommands
to follow this for flow and structure. Move the caveat that this is for
FreeBSD down to a cannonical CAVEATS section, and crossreference Linux's
equivelant functionality. Mention that this utility can not be used to
delegate the root directory of the jail to that section also.
Reported by: Jan Brankamp <crest@rlwinm.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alexander Ziaee <ziaee@FreeBSD.org>
Closes#17883
Make a minor update to the 'zpool remove' man page to clarify both
raidz and draid pools do not support removal, and change sector to
ashift which is what we actually care about.
Update the big theory comment in vdev_removal.c to accurately reflect
which types of vdevs can be removed. Furthermore, I've added some
discussion for the casual reader to briefly explain the top-level
vdev removal restrictions. This has been a common area of confusion
and it's not intuitive where they come from without understanding
the implementation details.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17847
When counting blocks to generate block size histograms (`-bb`), accept a
`--class=` argument (as a comma-separated list of either "normal",
"special", "dedup" or "other") to only consider blocks that belong to
these metaslab classes.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes#16999
When counting blocks to generate block size histograms (`-bb`), accept a
`--bin=` argument to force placing blocks into all three bins based on
*this* size.
E.g. with `--bin=lsize`, a block with lsize=512K, psize=128K, asize=256K
will be placed into the "512K" bin in all three output columns. This
way, by looking at the "512K" row the user will be able to determine
how well was ZFS able to compress blocks of this logical size.
Conversely, with `--bin=psize`, by looking at the "128K" row the user
will be able to determine how much overhead was incurred for storage
of blocks of this physical size.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes#16999
This adds a pause to the ZIO pipeline in the ready stage for
matching I/O (data, dnode, or raw bookmark).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes#17787
Update documentation to use the correct terminology.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: trick2011 <trick2011@users.noreply.github.com>
Closes#17734Closes#17755
Traditionally, unused dentries would be cached in the dentry cache until
the associated entry is no longer on disk. The cached dentry continues
to hold an inode reference, causing the inode to be pinned (see previous
commit).
Here we implement the dentry op d_delete, which is roughly analogous to
the drop_inode superblock op, and add a zfs_delete_dentry tunable to
control its behaviour. By default it continues the traditional
behaviour, but when the tunable is enabled, we signal that an unused
dentry should be freed immediately, releasing its inode reference, and
so allowing that inode to be deleted if no longer in use.
Sponsored-by: Klara, Inc.
Sponsored-by: Fastmail Pty Ltd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#17746
Traditionally, unused inodes would be held on the superblock inode cache
until the associated on-disk file is removed or the kernel requests
reclaim. On filesystems with millions of rarely-used files, this can be
a lot of unusable memory.
Here we implement the superblock drop_inode method, and add a
zfs_delete_inode tunable to control its behaviour. By default it
continues the traditional behaviour, but when the tunable is enabled, we
signal that the inode should be deleted immediately when the last
reference is dropped, rather than cached. This releases the associated
data to the dbuf cache and ARC, allowing them to be reclaimed normally.
Sponsored-by: Klara, Inc.
Sponsored-by: Fastmail Pty Ltd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#17746
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#17747
The current description is somewhat difficult to parse through, and in
some cases is a little unclear as to the behavior.
Split it into a paragraphs based on the three distinct behaviors you
may get: prompt, file URL, HTTP(S) URL. The descriptions of the file
and HTTP(s) behavior seems fine, but prompt is a little vague- expand
on it and make it clear that the behavior is actively based on whether
the inquisitor of key-data is provided with a tty for stdin or not.
Also clarify *why* one shouldn't "place keys which should be kept secret
on the command line" and note that you *have* to supply the key via
stdin if it's a raw key, just to be sure.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes#17742
A new `zfs allow` permissions that ONLY allows sending replication
streams in raw (encrypted) mode, so encrypted data will not be
decrypted as part of the replication process.
Sponsored-by: Klara, Inc.
Sponsored-by: Karakun AG
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Co-authored-by: JT Pennington <jt.pennington@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes#17543
Historically, ZED has blindly spawned off zedlets in parallel and never
worried about their completion order. This means that you can
potentially have zedlets for event number 2 starting before zedlets for
event number 1 had finished. Most of the time this is fine, and it
actually helps a lot when the system is getting spammed with hundreds
of events.
However, there are times when you want your zedlets to be executed
in sequence with the event ID. That is where synchronous zedlets
come in.
ZED will wait for all previously spawned zedlets to finish before
running a synchronous zedlet. Synchronous zedlets are guaranteed to be
the only zedlet running. No other zedlets may run in parallel with a
synchronous zedlet. Users should be careful to only use synchronous
zedlets when needed, since they decrease parallelism.
To make a zedlet synchronous, simply add a "-sync-" immediately
following the event name in the zedlet's file name:
EVENT_NAME-sync-ZEDLETNAME.sh
For example, if you wanted a synchronous statechange script:
statechange-sync-myzedlet.sh
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#17335
A single slow responding disk can affect the overall read
performance of a raidz group. When a raidz child disk is
determined to be a persistent slow outlier, then have it
sit out during reads for a period of time. The raidz group
can use parity to reconstruct the data that was skipped.
Each time a slow disk is placed into a sit out period, its
`vdev_stat.vs_slow_ios count` is incremented and a zevent
class `ereport.fs.zfs.delay` is posted.
The length of the sit out period can be changed using the
`raid_read_sit_out_secs` module parameter. Setting it to
zero disables slow outlier detection.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Contributions-by: Don Brady <don.brady@klarasystems.com>
Contributions-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#17227
When attempting to debug performance problems on large systems, one of
the major factors that affect performance is free space
fragmentation. This heavily affects the allocation process, which is an
area of active development in ZFS. Unfortunately, fragmenting a large
pool for testing purposes is time consuming; it usually involves filling
the pool and then repeatedly overwriting data until the free space
becomes fragmented, which can take many hours. And even if the time is
available, artificial workloads rarely generate the same fragmentation
patterns as the natural workloads they're attempting to mimic.
This patch has two parts. First, in zdb, we add the ability to export
the full allocation map of the pool. It iterates over each vdev,
printing every allocated segment in the ms_allocatable range tree. This
can be done while the pool is online, though in that case the allocation
map may actually be from several different TXGs as new ones are loaded
on demand.
The second is a new subcommand for zhack, zhack metaslab leak (and its
supporting kernel changes). This is a zhack subcommand that imports a
pool and then modified the range trees of the metaslabs, allowing the
sync process to write them out normall. It does not currently store
those allocations anywhere to make them reversible, and there is no
corresponding free subcommand (which would be extremely dangerous); this
is an irreversible process, only intended for performance testing. The
only way to reclaim the space afterwards is to destroy the pool or roll
back to a checkpoint.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes#17576
They will become zarcsummary and zarcstat in 2.4.0.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes#16357Closes#17695
The sorting logic is all in cmd/zfs/zfs_iter.c. I borrowed
where I could from the comments in the source code, but please
note that the comment to zfs_sort() is a little imprecise, or at
least incomplete, because it doesn't give any indication of the
chronological sort that will be used by default for snapshots in
zfs_compare().
While adding this description, I took the liberty to copy-edit
the rest of the file lightly.
In those edits, I've removed "If specified, you can list
property information by the absolute pathname or the relative
pathname" because, in context, it seems more confusing than
helpful.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Shawn Bayern <sbayern@law.fsu.edu>
Closes#15713Closes#15869
Back in 2014 the zfs_autoimport_disable module option was added to
control whether the kmods should load the pool configs from the cache
file on module load. The default value since that time has been for
the kernel to not process the cache file.
Detecting and importing pools during boot is now controlled outside
of the kmod on both Linux and FreeBSD. By all accounts this has been
working well and we can remove this dormant code on the kernel side.
The spa_config_load() function is has been moved to userspace, it is
now only used by libzpool. Additionally, the spa_boot_init() hook
which was used by FreeBSD now looks to be used and was removed.
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#17618
When dumping indirect blocks, attempt to print corrupt block pointers
rather than abort the program. When corruption is detected zdb will
exit with an error code of 3.
Sponsored by: ConnectWise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Alek Pinchuk <alek.pinchuk@connectwise.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes#17166
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17592
Based on previous commit this implements `zfs rewrite -P` flag,
making ZFS to keep blocks logical birth times while rewriting
files. It should exclude the rewritten blocks from incremental
sends, snapshot diffs, etc. Snapshots space usage same time will
reflect the additional space usage from newly allocated blocks.
Since this begins to use new "rewrite" flag in the block pointers,
this commit introduces a new read-compatible per-dataset feature
physical_rewrite. It must be enabled for the command to not fail,
it is activated on first use and deactivated on deletion of the
last affected dataset.
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17565
This feature enables tracking of when TXGs are committed to disk,
providing an estimated timestamp for each TXG.
With this information, it becomes possible to perform scrubs based
on specific date ranges, improving the granularity of data
management and recovery operations.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes#16853
During original block cloning implementation a mistake was made,
making BRT ZAP entries an array of 8 1-byte entries instead of 1
entry of 8 bytes. This makes the pools non-endian-safe.
This commit introduces a new read-compatible pool feature
"com.truenas:block_cloning_endian", fixing the endianness issue
for new pools while maintaining compatibility with existing ones.
The feature is automatically activated when creating the first BRT
ZAP (ensuring we don't activate it on pools that already have BRT
entries in the old format). When active, BRT entries are stored
as single 8-byte values.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17572