For small objects the kernel's slab implementation is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.
This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).
To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to 16K for all architectures.
This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#12152Closes#11429Closes#11574Closes#12150
Also mark all printf-like funxions in libzfs_impl.h as printf-like
and add --no-show-locs to storeabi, in hopes diffs will make more sense
in future
This removes these symbols from libzfs:
D nfs_only
T SHA256Init
T SHA2Final
T SHA2Init
T SHA2Update
T SHA384Init
T SHA512Init
D share_all_proto
D smb_only
T zfs_is_shared_proto
W zpool_mount_datasets
W zpool_unmount_datasets
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12048
Exporting names this short can easily cause nasty collisions with user code.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12050
In `zpool_load_compat()`:
* initialize `l_features[]` with a loop rather than a static
initializer.
* don't redefine system constants; use private names instead
Rationale here:
When an array is initialized using a static {foo}, only the specified
members are initialized to the provided values, the rest are
initialized to zero. While B_FALSE is of course zero, it feels
unsafe to rely on this being true forever, so I'm inclined to sacrifice
a few microseconds of runtime here and initialize using a loop.
When looking for the correct combination of system constants to use
(in open() and mmap()), I prefer to use private constants rather than
redefining system ones; due to the small chance that the system
ones might be referenced later in the file. So rather than defining
O_PATH and MAP_POPULATE, I use distinct constant names.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Colm Buckley <colm@tuatha.org>
Closes#12156
zfs_arc_overflow_shift was never a parameter:
ca0bf58d65 ("Illumos 5497 - lock
contention on arcs_mtx") is the only result in
git log -Soverflow_shift, and it wasn't exposed then, nor is it now
zfs_read_chunk_size was renamed to zfs_vnops_read_chunk_size in
e53d678d4a ("Share zfs_fsync, zfs_read,
zfs_write, et al between Linux and FreeBSD")
zio_decompress_fail_fraction was never a parameter: it was added in
c3bd3fb4ac ("OpenZFS 9403 - assertion
failed in arc_buf_destroy()") as a developer aid for setting in zdb, but
it's a dangerous test tunable and has no place in public documentation,
(not to mention that it obviously doesn't work):
> Although this did uncover a few low priority issues, this
unfortuantely also causes ztest to ASSERT in many locations where the
code is working correctly since it is designed to fail on IO errors.
Developers can manually set this variable with the '-o' option to find
and debug issues.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12157
Both were removed in 4fbdb10c7b ("remove
kmem_cache module parameter KMC_EXPIRE_AGE")
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12157
On FreeBSD 14, these two tests started erroring out like the
objects they're attempting to examine don't exist.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12165
It turns out that sometimes, evidently only when run inside the
ZTS handler, arc_summary3 | head > /dev/null will die with ENOTCONN,
and ruin the test run.
Added handling for that.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12160
Made function names start on a new line. Added a blank line between
functions. This helps when grepping for functions.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#12137
There used to be a warning after upgrading a zpool in FreeBSD, so users
won't forget to update the boot loader that pool is booted from.
This change brings this warning back, but only if the bootfs property
is set on the pool, which should be sufficient for the vast majority of
FreeBSD installations. People running something custom are most likely
aware of what to do after an upgrade in their specific environment.
Functionality is implemented in an OS specific helper function.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Michael Gmelin <grembo@FreeBSD.org>
Signed-off-by: Michael Gmelin <grembo@FreeBSD.org>
Closes#12099Closes#12104
The additional iter advance is incorrect, as copy_from_iter() has
already done the right thing. This will result in the following
warning being printed to the console as of the 5.12 kernel.
Attempted to advance past end of bvec iter
This change should have been included with #11378 when a
similar change was made on the read side.
Suggested-by: @siebenmann
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Issue #11378Closes#12041Closes#12155
make_gitrev.sh actually breaks checkbashisms' parser,
which /insists/ that the end-of-line " is actually a string start
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12101
This checks every file it checked (and a few more),
but explicitly instead of "if it works it works" best-effort
(which wasn't that good anyway)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#10512Closes#12101
This *will fail* when remounted by the real root
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12148
Accidentally introduced by commit dd00925e8d.
Force-install the zstreamdump link, this is a supported configuration
and the install should not fail if it needs to overwrite an existing
file.
Also cd to work around some funny platforms as noted in AC_PROG_LN_S doc
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12143
Also yeet pci_slot since it doesn't seem to exist?
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12125
Also remove note about the OS/Net consolidation, now the illumos gate
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12125
The spacing on zhack feature stat pool is a bit iffy(?)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12125
Also rip out the section about potentially including in the OpenZFS
distribution and simplify -e description
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12125
Also re-add articles left out by the slav who wrote this
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12125
Also slim down the description a tad
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12125
I fixed a few typos, but avoided changing anything beyond that;
the sould of the document should be preserved
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12125
These are used by userspace, so should live in a public header
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#12116
This change introduces long options for ztest. It builds the usage
message as well as the long_options array from a single table. It also
adds #defines for the default values.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Manoj Joseph <manoj.joseph@delphix.com>
Closes#12117
While Libera doesn't yet have a webchat client, we should at least
direct them to the right network. Once a webchat client is available,
we can direct them to it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#12127
configure on s390x has a key check fail with an error about
a variable being used uninitialized. So let's initialize it.
Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12126
wmsum counters are a reduced version of aggsum counters, optimized for
write-mostly scenarios. They do not provide optimized read functions,
but instead allow much cheaper add function. The primary usage is
infrequently read statistic counters, not requiring exact precision.
The Linux implementation is directly mapped into percpu_counter KPI.
The FreeBSD implementation is directly mapped into counter(9) KPI.
In user-space due to lack of better implementation mapped to aggsum.
Unfortunately neither Linux percpu_counter nor FreeBSD counter(9)
provide sufficient functionality to completelly replace aggsum, so
it still remains to be used for several hot counters.
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#12114
Previously, ZFS scaled maxinflight_bytes based on total number of
disks in the pool. A 3-wide mirror was receiving a queue depth of 3
disks, which it should not, since it reads from all the disks inside.
For wide raidz the situation was slightly better, but still a 3-wide
raidz1 received a depth of 3 disks instead of 2.
The new code counts only unique data disks, i.e. 1 disk for mirrors
and non-parity disks for raidz/draid. For draid the math is still
imperfect, since vdev_get_nparity() returns number of parity disks
per group, not per vdev, but still some better than it was.
This should slightly reduce scrub influence on payload for some pool
topologies by avoiding excessive queuing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closing #12046
Just like #12087, the set_acl signature changed with all the bolted-on
*userns parameters, which disabled set_acl usage, and caused #12076.
Turn zpl_set_acl into zpl_set_acl and zpl_set_acl_impl, and add a
new configure test for the new version.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12076Closes#12093
A plain rewrite of the shell version, and generates identical
units, save for replacing some empty lines with nothing, having fewer
meaningless spaces in After=s and different spacing in the lock scripts,
for a clean git diff -w
This is a gain of anywhere from 0m0.336s vs 0m0.022s (15.27x)
to 0m0.202s vs 0m0.006s (33.67x), depending on the hardware,
a.k.a. from "absolutely unusable" to "perfectly fine"
This also properly deals with canmount=noauto units across multiple
pools
See PR for detailed timings (of an early version) and diffs
Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: InsanePrawn <insane.prawny@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Issue #11915Closes#11917
In case of AIO failure, we should probably fallback to the old
behavior and still work.
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Alan Somers <asomers@gmail.com>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12032Closes#12040