Over the time many of DMU functions got flags argument to control
prefetch, caching, etc. Few functions though left without it, even
though closer look shown that many of them do not require prefetch
due to their access pattern. This patch adds the flags argument to
dmu_write(), dmu_buf_hold_array() and dmu_buf_hold_array_by_bonus(),
passing DMU_READ_NO_PREFETCH where applicable.
I am going to also pass DMU_UNCACHEDIO to some of them later.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17872
Fixes#10544.
According to the manpage, zdb -A should
ignore all assertions. But it currently
does not do that. This commit fixes
this bug.
Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17825
Before this change ashift property was applied only to a leaf
vdevs. As result, it worked only as a minimal value for parent
vdevs, since bigger physical_ashift value reported by any child
could be used instead when deciding parent's ashift, as if the
ashift property was never set.
This change explicitly passes ZPOOL_CONFIG_ASHIFT to all vdevs,
allowing override for parents only if the passed value is below
logical_ashift and so unacceptable.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17826
Previously, a bin included all blocks _starting_ from given size
(e.g., a "4K" bin would include all blocks within the [4K; 8K) region).
This is counter-intuitive and does not match the typical use-case of the
block histogram (that is, to estimate disk usage considering how ZFS'
block allocation works). In other words, if I'm looking at the "4K" row,
I'm interested in records that _fit into_ a 4K block.
Adjust the binning strategy such that a bin includes all blocks _up to_
given size, such that e.g. a "4K" bin would include all blocks within
the (2K; 4K] region.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes#16999
When counting blocks to generate block size histograms (`-bb`), accept a
`--class=` argument (as a comma-separated list of either "normal",
"special", "dedup" or "other") to only consider blocks that belong to
these metaslab classes.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes#16999
When counting blocks to generate block size histograms (`-bb`), accept a
`--bin=` argument to force placing blocks into all three bins based on
*this* size.
E.g. with `--bin=lsize`, a block with lsize=512K, psize=128K, asize=256K
will be placed into the "512K" bin in all three output columns. This
way, by looking at the "512K" row the user will be able to determine
how well was ZFS able to compress blocks of this logical size.
Conversely, with `--bin=psize`, by looking at the "128K" row the user
will be able to determine how much overhead was incurred for storage
of blocks of this physical size.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes#16999
We are adding more long-only options, so use an enum for all of them
to avoid manually numbering these constants.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes#16999
In "all pools" mode, pool_iter_refresh() will call zpool_iter(), which
will call zpool_refresh_stats() before calling add_pool(). If we already
have the pool, this is a different handle, so we just release it and
return. Back in pool_iter_refresh(), we then call zpool_stats_refresh()
again for our handle on the same pool.
All together, this means we're doing two ZFS_IOC_POOL_STATS calls into
the kernel for every pool in the system. This isn't wrong, but it does
double the pressure on global locks.
Instead, we add a new function zpool_refresh_stats_from_handle() that
simply copies the pool config and state from one handle to another, and
use it to update our handle before we release it in add_pool(), so we
only have one call per pool per interval.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#17807
zpool_iter() passes the callback a new instance of zpool_handle_t each
time, so the existing handle in the pool_list AVL never actually gets a
refresh. Internally, that means its zpool_config is never updated, and
the old config is never moved to zpool_old_config. As a result,
print_iostat() never sees any updated config, and so repeats the first
line forever.
This is the simplest workaround: just don't mark existing pools as
refreshed. pool_list_refresh() will see this and refresh them.
The downside is a second call to ZFS_IOC_POOL_STATS for existing pools,
because zpool_iter() just called it for the handle we threw away.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#17807
When skipping the boot row (with -y), the early loop meant we weren't
updating the "last_npools" count. That means the count never advanced
past zero, so cb_iteration was always reset to 0, leading to it being
"stuck" on the boot line, printing the header and nothing else forever.
Updating the pool counter on every loop sorts that out: it advances,
cb_iteration moves properly, and normal rows are printed.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#17807
If zfs_mount_and_share() fails, the error propagates to zfs create/clone
commands despite successful operation. If create/clone operations were
successful, there's no point in making zfs_mount_and_share() failures
fatal.
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17799
This adds a pause to the ZIO pipeline in the ready stage for
matching I/O (data, dnode, or raw bookmark).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes#17787
When running zpool iostat in interval mode, it would not notice any new
pools created or imported, and would forget any destroyed or exported,
so would not notice if they came back. This leads to outputting "no
pools available" every interval until killed.
It looks like this was at least intended to work; the comment above
zpool_do_iostat() indicates that it is expected to "deal with pool
creation/destruction" and that pool_list_update() would detect new
pools. That call however was removed in 3e43edd2c5, though its unclear
if that broke this behaviour and it wasn't noticed, or if it never
worked, or if something later broke it. That said, the lack of
pool_list_update() is only part of the reason it doesn't work properly.
The fundamental problem is that the various things involved in
refreshing or updating the list of pools would aggressively ignore,
remove, skip or fail on pools that stop existing, or that already exist.
Mostly this meant that once a pool is removed from the list, it will
never be seen again. Restoring pool_list_update() to the
zpool_do_iostat() loop only partially fixes this - it would find "new"
pools again, but only in the "all pools" (no args) mode, and because its
iterator callback add_pool() would abort the iterator if it already has
a pool listed, it would only add pools if there weren't any already.
So, this commit reworks the structure somewhat. pool_list_update()
becomes pool_list_refresh(), and will ensure the state of all pools in
the list are updated. In the "all pools" mode, it will also add new
pools and remove pools that disappear, but when a fixed list of pools is
used, the list doesn't change, only the state of the pools within it.
The rest of the commit is adjusting things for this much simpler
structure. Regardless of the mode in use, pool_list_refresh() will
always do the right thing, so the driver code can just get on with the
display.
Now that pools can appear and disappear, I've made it so the header (if
enabled) is re-printed when the list changes, so that its easier to see
what's happening if the column widths change.
Since this is all rather complicated, I've included tests for the "all
pools" and "set of pools" modes.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#17786
This change adds support for ZFS_KEYFORMAT_RAW to zdb_derive_key in
zdb.c. The implementation reads the raw key from the file specified
by the -K option which is consistent with how raw keys are handled in
the other parts of ZFS, along with a check to ensure that the keyfile
doesn't have too many bytes.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Patrick Xia <patrickx@google.com>
Closes#17783
Provide an interface to retrieve the lowest and highest minimum
allocation size for the normal allocation class. This can be used
by external consumers of the DMU to estimate potential wasted
capacity when setting the recordsize for an object.
The new "min_alloc" and "max_alloc" keys are added to the pool
configuration and used by default_volblocksize() to warn when
an ineffecient block size is requested. For older kmods which
don't yet include the new keys fallback to the previous logic.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#17758
Three cases were discovered where 'zpool add' would fail to
warn when adding vdevs to a pool with a mismatched replication
level. These are:
1. When a pool contains mixed file and disk vdevs.
2. When a pool contains an active dRAID distributed spare
3. When a pool contains an active hot spare
The lack of warnings are caused by get_replication() assessing
the current pool configuration an inconsistent and disabling
the mismatched replication check for the new pool configuration
after 'zpool add'. This change updates get_replication() to
be slightly more tolerant in the non-fatal case.
The zpool_add_010_pos.ksh test case was split in to separate
tests: zpool_add_warn_create.ksh, pool_add_warn_degraded.ksh,
and zpool_add_warn_removal. These test were extended to
include coverage for dRAID pools and the three scenarios
described above.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#17780
Spacemap entry might be too big to fit into a block pointer ashift.
We hit an assertion trying to run `zdb -bvy` on a large pool. But
it seems the code does not really need size there, since we only
need to search for a range of offsets, so setting it to zero should
just make btree return position just before the first entry. I
suspect the previous code could actually miss the first entry
due to this if its size was smaller.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17764
Update documentation to use the correct terminology.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: trick2011 <trick2011@users.noreply.github.com>
Closes#17734Closes#17755
This is breaking the build on FreeBSD/i386. Originally committed
downstream as https://github.com/freebsd/freebsd-src/commit/2d76470b701
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Sponsored by: ConnectWise
Closes#17705
This commit fixes a likely regression introduced by 64db435 where the
checksum repair functionality (`-c` or default behavior) will perform
checks and access data associated with the newer undetach (`-u`)
functionality, resulting in a failure when an uberblock's TXG is not 0
as required by `-u` but not `-c`
Additionally, code is refactored for better separation of tasks.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: buzzingwires <buzzingwires@outlook.com>
Closes#17732
A new `zfs allow` permissions that ONLY allows sending replication
streams in raw (encrypted) mode, so encrypted data will not be
decrypted as part of the replication process.
Sponsored-by: Klara, Inc.
Sponsored-by: Karakun AG
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Co-authored-by: JT Pennington <jt.pennington@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes#17543
Historically, ZED has blindly spawned off zedlets in parallel and never
worried about their completion order. This means that you can
potentially have zedlets for event number 2 starting before zedlets for
event number 1 had finished. Most of the time this is fine, and it
actually helps a lot when the system is getting spammed with hundreds
of events.
However, there are times when you want your zedlets to be executed
in sequence with the event ID. That is where synchronous zedlets
come in.
ZED will wait for all previously spawned zedlets to finish before
running a synchronous zedlet. Synchronous zedlets are guaranteed to be
the only zedlet running. No other zedlets may run in parallel with a
synchronous zedlet. Users should be careful to only use synchronous
zedlets when needed, since they decrease parallelism.
To make a zedlet synchronous, simply add a "-sync-" immediately
following the event name in the zedlet's file name:
EVENT_NAME-sync-ZEDLETNAME.sh
For example, if you wanted a synchronous statechange script:
statechange-sync-myzedlet.sh
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#17335
When attempting to debug performance problems on large systems, one of
the major factors that affect performance is free space
fragmentation. This heavily affects the allocation process, which is an
area of active development in ZFS. Unfortunately, fragmenting a large
pool for testing purposes is time consuming; it usually involves filling
the pool and then repeatedly overwriting data until the free space
becomes fragmented, which can take many hours. And even if the time is
available, artificial workloads rarely generate the same fragmentation
patterns as the natural workloads they're attempting to mimic.
This patch has two parts. First, in zdb, we add the ability to export
the full allocation map of the pool. It iterates over each vdev,
printing every allocated segment in the ms_allocatable range tree. This
can be done while the pool is online, though in that case the allocation
map may actually be from several different TXGs as new ones are loaded
on demand.
The second is a new subcommand for zhack, zhack metaslab leak (and its
supporting kernel changes). This is a zhack subcommand that imports a
pool and then modified the range trees of the metaslabs, allowing the
sync process to write them out normall. It does not currently store
those allocations anywhere to make them reversible, and there is no
corresponding free subcommand (which would be extremely dangerous); this
is an irreversible process, only intended for performance testing. The
only way to reclaim the space afterwards is to destroy the pool or roll
back to a checkpoint.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes#17576
If the target already exists, lt will fail. Force it to recreate the
symlinks.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#17702
They will become zarcsummary and zarcstat in 2.4.0.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes#16357Closes#17695
Add an openzfs-2.4 compatibility file for the next release.
While there are no compatibility difference between Linux and
FreeBSD for 2.4 symlinks for the -linux and -freebsd names are
created for any scripts expecting that convention.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: ofthesun9 <olivier@ofthesun.net>
Closes#17672Closes#17673
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes#17665
The concurrent execution of feature_sync() can lead to a panic due
to an unprotected update of the feature refcount. Resolve this by
using the spa->spa_feat_stats_lock to synchronize the update of the
refcount.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes#17184Closes#17632
When requested to dump metaslabs only for specific vdev, apply the
filter also to log spacemaps to reduce the output. Unfortunately
filtering by metaslab numbers is more difficult so leave those.
While there, tune the output formatting.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17643
When dumping indirect blocks, attempt to print corrupt block pointers
rather than abort the program. When corruption is detected zdb will
exit with an error code of 3.
Sponsored by: ConnectWise
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Alek Pinchuk <alek.pinchuk@connectwise.com>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes#17166
This converts the body of a ZED slack notification from
plain text to code block style to help with readability.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: René Wirnata <rene.wirnata@pandascience.net>
Closes#17610
If the ZIL runs into trouble, it calls txg_wait_synced(), which blocks
on suspend. We want it to not block on suspend, instead returning an
error. On the surface, this is simple: change all calls to
txg_wait_synced_flags(TXG_WAIT_SUSPEND), and then thread the error
return back to the zil_commit() caller.
Handling suspension means returning an error to all commit waiters. This
is relatively straightforward, as zil_commit_waiter_t already has
zcw_zio_error to hold the write IO error, which signals a fallback to
txg_wait_synced_flags(TXG_WAIT_SUSPEND), which will fail, and so the
waiter can now return an error from zil_commit().
However, commit waiters are normally signalled when their associated
write (LWB) completes. If the pool has suspended, those IOs may not
return for some time, or maybe not at all. We still want to signal those
waiters so they can return from zil_commit(). We have a list of those
in-flight LWBs on zl_lwb_list, so we can run through those, detach them
and signal them. The LWB itself is still in-flight, but no longer has
attached waiters, so when it returns there will be nothing to do.
(As an aside, ITXs can also supply completion callbacks, which are
called when they are destroyed. These are directly connected to LWBs
though, so are passed the error code and destroyed there too).
At this point, all ZIL waiters have been ejected, so we only have to
consider the internal state. We potentially still have ITXs that have
not been committed, LWBs still open, and LWBs in-flight. The on-disk ZIL
is in an unknown state; some writes may have been written but not
returned to us. We really can't rely on any of it; the best thing to do
is abandon it entirely and start over when the pool returns to service.
But, since we may have IO out that won't return until the pool resumes,
we need something for it to return to.
The simplest solution I could find, implemented here, is to "crash" the
ZIL: accept no new ITXs, make no further updates, and let it empty out
on its normal schedule, that is, as txgs complete and zil_sync() and
zil_clean() are called. We set a "restart txg" to three txgs in the
future (syncing + TXG_CONCURRENT_STATES), at which point all the
internal state will have been cleared out, and the ZIL can resume
operation (handled at the top of zil_clean()).
This commit adds zil_crash(), which handles all of the above:
- sets the restart txg
- capture and signal all waiters
- zero the header
zil_crash() is called when txg_wait_synced_flags(TXG_WAIT_SUSPEND)
returns because the pool suspended (ESHUTDOWN).
The rest of the commit is just threading the errors through, and related
housekeeping.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#17398
ITX callbacks are used to signal that something can be cleaned up after
a itx is committed. Presently that's only used when syncing out mapped
pages (msync()) to mark dirty pages clean.
This extends the callback interface so it can be passed an error, and
take a different cleanup action if necessary.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#17398
This changes zil_commit() to have an int return, and updates all callers
to check it. There are no corresponding internal changes yet; it will
always return 0.
Since zil_commit() is an indication that the caller _really_ wants the
associated data to be durability stored, I've annotated it with the
__warn_unused_result__ compiler attribute (via __must_check), to emit a
warning if it's ever ussd without doing something with the return code.
I hope this will mean we never misuse it in the future.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#17398
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/Closes#17591
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/Closes#17591
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/Closes#17591
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/Closes#17591
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/Closes#17591
Based on previous commit this implements `zfs rewrite -P` flag,
making ZFS to keep blocks logical birth times while rewriting
files. It should exclude the rewritten blocks from incremental
sends, snapshot diffs, etc. Snapshots space usage same time will
reflect the additional space usage from newly allocated blocks.
Since this begins to use new "rewrite" flag in the block pointers,
this commit introduces a new read-compatible per-dataset feature
physical_rewrite. It must be enabled for the command to not fail,
it is activated on first use and deactivated on deletion of the
last affected dataset.
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17565
During regular block writes ZFS sets both logical and physical
birth times equal to the current TXG. During dedup and block
cloning logical birth time is still set to the current TXG, but
physical may be copied from the original block that was used.
This represents the fact that logically user data has changed,
but the physically it is the same old block.
But block rewrite introduces a new situation, when block is not
changed logically, but stored in a different place of the pool.
From ARC, scrub and some other perspectives this is a new block,
but for example for user applications or incremental replication
it is not. Somewhat similar thing happen during remap phase of
device removal, but in that case space blocks are still acounted
as allocated at their logical birth times.
This patch introduces a new "rewrite" flag in the block pointer
structure, allowing to differentiate physical rewrite (when the
block is actually reallocated at the physical birth time) from
the device reval case (when the logical birth time is used).
The new functionality is not used at this point, and the only
expected change is that error log is now kept in terms of physical
physical birth times, rather than logical, since if a block with
logged error was somehow rewritten, then the previous error does
not matter any more.
This change also introduces a new TRAVERSE_LOGICAL flag to the
traverse code, allowing zfs send, redact and diff to work in
context of logical birth times, ignoring physical-only rewrites.
It also changes nothing at this point due to lack of those writes,
but they will come in a following patch.
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes#17565
This feature enables tracking of when TXGs are committed to disk,
providing an estimated timestamp for each TXG.
With this information, it becomes possible to perform scrubs based
on specific date ranges, improving the granularity of data
management and recovery operations.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes#16853