This adds two new pool properties:
- dedup_table_size, the total size of all DDTs on the pool; and
- dedup_table_quota, the maximum possible size of all DDTs in the pool
When set, quota will be enforced by checking when a new entry is about
to be created. If the pool is over its dedup quota, the entry won't be
created, and the corresponding write will be converted to a regular
non-dedup write. Note that existing entries can be updated (ie their
refcounts changed), as that reuses the space rather than requiring more.
dedup_table_quota can be set to 'auto', which will set it based on the
size of the devices backing the "dedup" allocation device. This makes it
possible to limit the DDTs to the size of a dedup vdev only, such that
when the device fills, no new blocks are deduplicated.
Sponsored-by: iXsystems, Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Sean Eric Fagan <sean.fagan@klarasystems.com>
Closes#15889
- Explicitly disable compression since mkfile uses a zero buffer.
- Explicitly sync file systems instead of waiting for timeout.
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
This is a follow-on to 156a64161b
that ignores UBSAN errors in sa.c.
Thank you @thwalker3 for the fix.
Original-patch-by: @thwalker3
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#16278Closes#16330
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
There has been a bugzilla PR#147881 requesting this
for a long time (14 years!). It extends the syntax of
the ZFS shanenfs property (for FreeBSD only) to allow
multiple sets of options for different hosts/nets,
separated by ';'s.
Signed-off-by: Rick Macklem <rmacklem@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Sponsored-by: Klara, Inc.
Sponsored-By: Wasabi Technology, Inc.
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
For files smaller than recordsize, it's most likely that they don't have
L1 blocks. However, current calculation will always return at least 1 L1
block.
In this change, we check dnode level to figure out if it has L1 blocks
or not, and return 0 if it doesn't. This will reduce the chance of
unnecessary throttling when deleting a large number of small files.
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
cp_stress is getting killed on the new QEMU-based github runners
we're developing. The problem is that the Linux based runners
should do 10 RUNS, where the FreeBSD based runners only have 3
RUNS to succeed.
This patch removes this different handling of Linux and FreeBSD.
The cp_stress test is running fine in around 2 minutes now.
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
BRT refcounts are stored as eight uint8_ts rather than a single
uint64_t. This means that za_first_integer is only the first byte, so
max 256. This fixes it by doing a lookup for the whole value.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Current output:
> receiving correctivefull stream of a into b
New output:
> receiving corrective full stream of a into b
Signed-off-by: glibg10b <56197853+glibg10b@users.noreply.github.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
The commit b192a2c (Remove avl_size field from struct avl_tree) uses a
def _KERNEL to decide to include avl_pad or not, but this _KERNEL is
defined in sys/sysmacros.h. If avl.h and sysmacros.h are not included
in the right order, it can cause a headache when working on a zfs
related kernel module.
Add sysmacros.h in avl_impl.h to fix. sysmacros.h is also removed
from spa.h as it's reduntant.
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Co-authored-by: Youzhong Yang <yyang@mathworks.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
These are used for DDT and BRT stores. There's limited information
available to produce meaningful output, but at least we can put
something on screen rather than crashing.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
After c3f2f1aa2, vdev_fault_wanted is set on a vdev after a probe fails.
An end-of-txg async task is charged with actually faulting the vdev.
In a single-disk pool, the probe failure will degrade the last disk, and
then suspend the pool. However, vdev_fault_wanted is not cleared. After
the pool returns, the transaction finishes and the async task runs and
faults the vdev, which suspends the pool again.
The fix is simple: when reopening a vdev, clear the async fault flag. If
the vdev is still failed, the startup probe will quickly notice and
degrade/suspend it again. If not, all is well!
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
A single disk pool should suspend when its disk fails and hold the IO.
When the disk is returned, the pool should return and the IO be
reissued, leaving everything in good shape.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
Changed zfs_k(un)map_atomic to zfs_k(un)map_local
Signed-off-by: Jason Lee <jasonlee@lanl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Update the META file to reflect compatibility with the 6.9
kernel.
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
This helper was introduced long ago, in 5.16. Since 6.10, bd_inode no
longer exists, but the helper has been updated, so detect it and use it
in all versions where it is available.
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Linux 6.10 change kmem_cache_alloc to be a macro, rather than a
function, such that the old #undef for it in spl-kmem-cache.c would
remove its definition completely, breaking the build.
This inverts the model used before. Rather than always defining the
kmem_cache_* macro, then undefining then inside spl-kmem-cache.c,
instead we make a special tag to indicate we're currently inside
spl-kmem-cache.c, and not defining those in macros in the first place,
so we can use the kernel-supplied kmem_cache_* functions to implement
spl_kmem_cache_*, as we expect.
For all other callers, we create the macros as normal and remove access
to the kernel's own conflicting names.
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Linux has started moving to a model where instead of applying block
queue limits through individual modification functions, a complete
limits structure is built up and applied atomically, either when the
block device or open, or some time afterwards. As of 6.10 this
transition appears only partly completed.
This commit matches that model within OpenZFS in a way that should work
for past and future kernels. We set up a queue limits structure with any
limits that have had their modification functions removed. For newer
kernels that can have limits applied at block device open
(HAVE_BLK_ALLOC_DISK_2ARG), we have a conversion function to turn the
OpenZFS queue limits structure into Linux's queue_limits structure,
which can then be passed in. For older kernels, we provide an
application function that just calls the old functions for each limit in
the structure.
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Artix Linux is systemd free distribution based on Arch Linux, with
openrc dinit runit s6 as init alternatives. This patch will make
init scripts installation work the way Gentoo Linux with openrc.
The scripts tweaking for other init will be left to packager.
Signed-off-by: Yongming Zhao <ming.zym@gmail.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
zfs_zstd_cache_reap_now is issued every second.
zstd_mempool_reap checks for both pool existence and buffer count, but
that's still 2 func calls which are trivially avoidable.
With clang it even avoids pushing the stack pointer (but still suffers
the mispredict due to a forward jump, not modified in case someone is
using zstd):
<+0>: cmpq $0x0,0x0(%rip) # <zfs_zstd_cache_reap_now+8>
<+8>: je 0x217de4 <zfs_zstd_cache_reap_now+36>
<+10>: push %rbp
<+11>: mov %rsp,%rbp
<+14>: mov 0x0(%rip),%rdi # <zfs_zstd_cache_reap_now+21>
<+21>: call 0x217df0 <zstd_mempool_reap>
<+26>: mov 0x0(%rip),%rdi # <zfs_zstd_cache_reap_now+33>
<+33>: pop %rbp
<+34>: jmp 0x217df0 <zstd_mempool_reap>
<+36>: ret
Preferably the call would not be made to begin with if zstd is not used,
but this retains all the logic confined to zstd code.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
In the commit of the head_errlog feature we introduced a bug in
dsl_dataset_promote_sync(): we may dereference origin_head and hds, both
dereferencing ddpa after calling promote_sync() on ddpa.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#16272Closes#16273
On fedora 40, on the 6.9.4 kernel (in updates-testing), assign_str
expands to a "do {<stuff> } while(0)" loop. Without this semicolon,
the while(0) is unterminated, causing a cascade of useless errors.
With this semicolon, it compiles fine. It also compiles fine on 6.8.11
(the previous kernel). I have not tested earlier kernels than that, but
at worst it should add a pointless semicolon.
All other instances in the source tree are already terminated with
semicolons.
Signed-off-by: Daniel Berlin <dberlin@dberlin.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
This commit fixes what is probably a copy-paste mistake. The
`dracut.zfs` manpage claims that the `bootfs.rollback` option executes
`zfs snapshot -Rf`. `zfs snapshot` does not have a `-R` option. `zfs
rollback` does.
Signed-off-by: Alphan Yılmaz <alphanyilmaz@gmail.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
FreeBSD patchlevel versions are optional and, if present, in a different
location in the version string.
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
This freeuio() interface was introduced to FreeBSD recently. For now
it simply calls free(), so this change has no effect. However, this
may not always be true, and in CheriBSD this change is required.
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brooks Davis <brooks.davis@sri.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
You can use the UBSAN_SANITIZE_* Kbuild options to exclude certain
kernel objects from the UBSAN checks. We previously excluded
zap_micro.o with:
UBSAN_SANITIZE_zap_micro.o := n
For some reason that didn't work for the 6.9 kernel, which wants us
to use:
UBSAN_SANITIZE_zfs/zap_micro.o := n
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#16278Closes#16330
In several functions, we use a flag variable to track whether
zv_suspend_lock is held. This flag was not getting reset in a
particular case where we need to retry the underlying operation,
resulting in a lock leak. Make sure to update the flag where necessary.
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
The ZFS module parameter name is zfs_prefetch_disable, not
zfs_disable_prefetch.
Signed-off-by: Peter Doherty <peterd@acranox.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reason: nvlist_free() tries to free sth. which isn't allocted
Solution: init this variable with NULL
Closes#16311
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
This way we can avoid making assumptions about the SDT probe
implementation. No functional change intended.
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Sponsored-by: Klara, Inc.
Sponsored-By: Wasabi Technology, Inc.
Signed-off-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
The 6.9 kernel behaves differently in how it releases block devices. In
the common case it will async release the device only after the return
to userspace. This is different from the 6.8 and older kernels which
release the block devices synchronously. To get around this, call
add_disk() from a workqueue so that the kernel uses a different
codepath to release our zvols in the way we expect. This stops
zfs_allow_010_pos from hanging.
Fixes: #16089
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Previously the dkms build left some unwanted files
in `/usr/lib/modules` which could cause package
managers to not properly clean up old kernels.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Martin Wagner <martin.wagner.dev@gmail.com>
Closes#16221Closes#16241
Otherwise if zfs is unloaded and reroot is being used it trips over a
stale pointer.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: Rubicon Communications, LLC ("Netgate")
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#16242
Recent UMA changes repurposed the use of UMA_MD_SMALL_ALLOC in a way
that breaks arc_available_memory on -CURRENT. This change
ensures that arc_available_memory uses the new symbol
while maintaining compatibility with older FreeBSD releases.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Bojan Novković <bnovkov@FreeBSD.org>
Closes#16230
With seq x -1 z and x is less than z FreeBSD seq will print the error:
$ seq 1 -1 2
seq: needs positive increment
Hide this error. Alternatively $COMP_CWORD could be checked for < 2.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Derek Schrock <dereks@lifeofadishwasher.com>
Closes#16234
This fixes FreeBSD build failure with clang-18 after 23a489a got merged.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes#16252
If a pool is created with the cache file located in a non-default
path /etc/default/zpool.cache, removed, or the cachefile property
is set to none, zdb fails to show the pool unless we specify the
cache file or use the -e option. This PR automates this process.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes#16071
We don't build illumos-crypto for FreeBSD.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16209
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16209
Nothing calls it through the KCF interface, so this is all unused.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16209
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16209
sha2_mech_type_t serves double-duty, as the list of MAC providers and
also the algo type for direct callers to SHA2Init. Until we disentangle
that, reorganise it to make the separation more clear. While we're
there, remove the digest mechs we don't use.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16209
For whatever reason, we call digest mechanisms directly, not through the
KCF digest provider. So we can remove those entry points entirely.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16209
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16209
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16209
Still retaining the struture, for now.
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16209
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes#16209