Note that this only tracks sizes as requested by the caller.
Actual allocated space will almost always be bigger (e.g., rounded up to
the next power of 2 or page size). Additionally the allocated buffer may
be holding other areas hostage. Nonetheless, this is a starting point
for tracking memory usage in zstd.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#11129
While evaluating other assembler implementations it turns out that
the precomputed hash subkey tables vary in size, from 8*16 bytes
(avx2/avx512) up to 48*16 bytes (avx512-vaes), depending on the
implementation.
To be able to handle the size differences later, allocate
`gcm_Htable` dynamically rather then having a fixed size array, and
adapt consumers.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes#11102
While preparing #9749 some .cfi_{start,end}proc directives
were missed. Add the missing ones.
See upstream https://github.com/openssl/openssl/commit/275a048f
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes#11101
AT_BENEATH was merged to stable/12, where kern_unlinkat takes a
non-const path. DECONST the path passed to kern_unlinkat in the
case where AT_BENEATH is defined.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11139
The original xuio zero copy functionality has always been unused
on Linux and FreeBSD. Remove this disabled code to avoid any
confusion and improve readability.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#11124
L2ARC devices of several terabytes filled with 4KB blocks may take 15
minutes to rebuild. Due to the way L2ARC log reading is implemented
it is quite likely that for all that time rebuild thread will never
sleep. At least on FreeBSD kernel threads have absolute priority and
can not be preempted by threads with lower priorities. If some thread
is also bound to that specific CPU it may not get any CPU time for all
the 15 minutes.
Reviewed-by: Cedric Berger <cedric@precidata.com>
Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes#11116
Refer to the correct section or alternative for FreeBSD and Linux.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11132
ZFS always waits for the write completion before flushing the cache.
That is why it does not require explicit ordering fences around it,
which are pretty difficult to implement for NVMe, since one has no
internal concept of strict request ordering.
This was already removed from FreeBSD once, but got resurrected
by mistake during OpenZFS merge.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes#11130
The port removed provisions for zfs_znode_move but the cleanup missed
this bit. To quote the original:
[snip]
list_insert_tail(&zfsvfs->z_all_znodes, zp);
membar_producer();
/*
* Everything else must be valid before assigning z_zfsvfs makes the
* znode eligible for zfs_znode_move().
*/
zp->z_zfsvfs = zfsvfs;
[/snip]
In the current code it is immediately followed by unlock which issues
the same fence, thus plays no role in correctness.
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#11115
The allocator does not provide the functionality to begin with.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#11114
The Linux kernel MODULE_LICENSE macro only recognizes a handful of
license strings and "MIT" is not one of the them. Update the macro
to use "Dual MIT/GPL" which is recognized and what the kernel expects
MIT licensed modules to use.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11112Closes#11113
These kstats are often expensive to compute so we want to avoid them
unless specifically requested.
The following kstats are affected by this change:
kstat.zfs.${pool}.multihost
kstat.zfs.${pool}.misc.state
kstat.zfs.${pool}.txgs
kstat.zfs.misc.fletcher_4_bench
kstat.zfs.misc.vdev_raidz_bench
kstat.zfs.misc.dbufs
kstat.zfs.misc.dbgmsg
In FreeBSD 13, sysctl(8) has been updated to still list the
names/description/type of skipped sysctls so they are still
discoverable.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11099
It's even documented already.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11094
The zfs_fsync, zfs_read, and zfs_write function are almost identical
between Linux and FreeBSD. With a little refactoring they can be
moved to the common code which is what is done by this commit.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#11078
The current l2_misses accounting behavior treats all reads to pools
without a configured l2arc as an l2arc miss, IFF there is at least
one other pool on the system which does have an l2arc configured.
This makes it extremely hard to tune for an improved l2arc hit/miss
ratio because this ratio will be modulated by reads from pools which
do not (and should not) have l2arc devices; its upper limit will
depend on the ratio of reads from l2arc'd pools and non-l2arc'd pools.
This PR prevents ARC reads affecting l2arc stats (n.b. l2_misses is
the only relevant one) where the target spa doesn't have an l2arc.
Includes new test - l2arc_l2miss_pos.ksh
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Adam Moss <c@yotes.com>
Closes#10921
This was removed in a reorganization of directories preparing for the
merge of FreeBSD support, 006e9a4088 by mmacy. While llvm is perfectly
happy with the nonexistent -I directory, the gcc6 and gcc9 we can elect
to use as cross-toolchains both trip over it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes#11077
zfs_onexit_os.c was not deleted when it was removed from the build
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#11079
Otherwise lookup can fail with EOPNOTSUPP or panic.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#11066
The removal of a vdev in the normal class would fail if there was a
special or deup vdev that had a different ashift than the vdevs in
the normal class.
Moved the initialization of spa_min_ashift / spa_max_ashift from
vdev_open so that it occurs after the vdev allocation bias was
initialized (i.e. after vdev_load).
Caveat -- In order to remove a special/dedup vdev it must have the
same ashift as the normal pool vdevs. This could perhaps be lifted
in the future (i.e. for the case where there is ample space in any
surviving special class vdevs)
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes#9363Closes#9364Closes#11053
This is a follow up fix for commit 0fdd6106bb. The VERIFY is
only true when we haven't hit an error code path. See added
test case for a reproducer.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes#11048
After a side-effectful call like add or remove, references to range
segs stored in btrees can no longer be used safely. We move the
remove call to just before the reinsertion call so that the seg
remains valid for as long as we need it.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes#11044Closes#11056
The 32-bit counter eventually wraps to 0 which is a sentinel for invalid
id.
Make it 64-bit on LP64 platforms and 0-check otherwise.
Note: Linux counterpart uses id stored per queue instead of a global.
I did not check going that way is feasible with the goal being the
minimal fix doing the job.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#11059
The acltype property is currently hidden on FreeBSD and does not
reflect the NFSv4 style ZFS ACLs used on the platform. This makes it
difficult to observe that a pool imported from FreeBSD on Linux has a
different type of ACL that is being ignored, and vice versa.
Add an nfsv4 acltype and expose the property on FreeBSD.
Make the default acltype nfsv4 on FreeBSD.
Setting acltype to an unhanded style is treated the same as setting
it to off. The ACLs will not be removed, but they will be ignored.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10520
In FreeBSD, there are three compile environments that are supported:
user land, the kernel and the bootloader / standalone. Adjust the
headers to compile in the standalone environment. Limit kernel-only
items from view when _STANDALONE is defined.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Warner Losh <imp@FreeBSD.org>
Closes#10998
Currently streams are only freed when:
- They have no referencing zfetch and and their I/O references
go to zero.
- They are more than 2s old and a new I/O request comes in on
the same zfetch.
This means that we will leak unreferenced streams when their zfetch
structure is freed.
This change checks the reference count on a stream at zfetch free
time. If it is zero we free it immediately. If it has remaining
references we allow the prefetch callback to free it at I/O
completion time.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#11052
The zstd code assumes that if you are on aarch64, you have NEON
instructions. This is not necessarily true. In a boot loader, where
you might not have the VFP properly initialized, these instructions
may not be available. It's also an error to include arm_neon.h when
the NEON insturctions aren't enabled. Change the guards for using the
NEON instructions from __aarch64__ to __ARM_NEON which is the standard
symbol for knowing if they are available.
__ARM_NEON is the proper symbol, defined in ARM C Language Extensions
Release 2.1 (https://developer.arm.com/documentation/ihi0053/d/). Some
sources suggest __ARM_NEON__, but that's the obsolete spelling from
prior versions of the standard.
Updated based on zstd pull request https://github.com/facebook/zstd/pull/2356
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Warner Losh <imp@bsdimp.com>
Closes#11055
FreeBSD had this value tunable before the switch to the new OpenZFS.
The tunable name has changed, breaking legacy compat.
Restore legacy compat for this tunable, properly expose the tunable
with the new name on all platforms, and document it in
zfs-module-parameters(5).
While here, clean up the documentation for zfetch_max_distance a bit.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11038
Code cleanup, a follow up commit to 4d55ea81.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Ryan Moeller <ryan@freqlabs.com>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes#11020
The value of zp is used without having been initialized under some
conditions. Initialize the pointer to NULL.
Add a regression test case using chown in acl/posix. However, this is
not enough because the setup sets xattr=sa, which means zfs_setattr_dir
will not be called. Create a second group of acl tests in acl/posix-sa
duplicating the acl/posix tests with symlinks, and remove xattr=sa from
the original acl/posix tests. This provides more coverage for the
default xattr=on code.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10043Closes#11025
This change updates the documentation to refer to the project
as OpenZFS instead ZFS on Linux. Web links have been updated
to refer to https://github.com/openzfs/zfs. The extraneous
zfsonlinux.org web links in the ZED and SPL sources have been
dropped.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11007
A missing semicolon between kmoddir variable declaration and the
uninstall for loop caused modules_uninstall-Linux to fail with:
Syntax error: "do" unexpected
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Jacob Adams <jacob@tookmund.com>
Closes#11032
When running libzpool with the Undefined Behavior Sanitizer (ubsan)
enabled, a zpool create causes a run-time error:
module/zfs/vdev_label.c:600:14: runtime error: shift exponent 64 is
too large for 64-bit type 'long long unsigned int'`
in vdev_config_generate()
Fix is to convert vdev_removal_max_span to its base-2 logarithm, using
highbit64(), and then compare the "shifts".
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Chuck Tuffli <ctuffli@gmail.com>
Closes#9744Closes#11024
With procfs_list kstats implemented for FreeBSD, dbufs are now exposed
as kstat.zfs.misc.dbufs.
On FreeBSD, dbufstats can use the sysctl instead of procfs when no
input file has been given.
Enable the dbufstats tests on FreeBSD.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes#11008
Code cleanup. Sort includes, remove duplicates, and drop
some extra blank lines in kmod_core.c.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes#11000
Instead of relying on arbitrary timers after pool export/import or cache
device off/online rely on arcstats. This makes the L2ARC tests more
robust. Also cleanup some functions related to persistent L2ARC.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#10983
We were missing an include for kernel FPU functions, breaking the build
on FreeBSD 12.1-RELEASE. This was apparently being pulled in from
elsewhere on stable/12 and head.
Sorted the other includes in these files while here.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#11005
In C, const indicates to the reader that mutation will not occur.
It can also serve as a hint about ownership.
Add const in a few places where it makes sense.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <freqlabs@FreeBSD.org>
Closes#10997
This causes "zfs send -vt ..." to fail with:
cannot resume send: Unknown error 1030
It turns out that some of the name/value pairs in the verification
list for zfs_ioc_send_space(), zfs_keys_send_space, had the wrong
name, so the ioctl got kicked out in zfs_check_input_nvpairs().
Update the names accordingly.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: John Poduska <jpoduska@datto.com>
Closes#10978
The kernel seq_read() helper function expects ->next() to update
the passed position even there are no more entries. Failure to
do so results in the following warning being logged.
seq_file: buggy .next function procfs_list_seq_next [spl]
did not update position index
Functionally there is no issue with the way procfs_list_seq_next()
is implemented and the warning is harmless. However, we want to
silence this some what scary incorrect warning. This commit
updates the Linux procfs code to advance the position even for
the last entry.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#10984Closes#10996
The request number is out of bounds of the platform table.
Subtract the starting offset to get the correct subscript.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10994
`dbuf_stats_hash_table_data` can take much longer than it needs to
by repeatedly bzeroing its buffer when in fact the buffer only needs
to be NULL terminated.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#10993
In non regular use cases allocated memory might stay persistent in memory
pool. This small patch checks every minute if there are old objects which
can be released from memory pool.
Right now with regular use, the pool is checked for old objects on each
allocation attempt from this pool. so basically polling by its use. Now
consider what happens if someone writes a lot of files and stops use of
the volume or even unmounts it. So the code will no longer check if
objects can be released from the pool. Already allocated objects will
still stay in pool cache. this is no big issue for common use. But
someone discovered this issue while doing tests. personally i know this
behavior and I'm aware of it. Its no big issue. just a enhancement
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Closes#10938Closes#10969
When an invalid incremental send is requested where the "to" ds is
before the "from" ds, make sure to drop the reference to the pool
and the dataset before returning the error.
Add an assert on FreeBSD to make sure we don't hold any locks after
returning from an ioctl.
Add some test coverage.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes#10919
The Linux kernel MODULE_LICENSE macro only recognizes a handful of
license strings and "BSD" is not one of the them. Update the macro
to use "Dual BSD/GPL" which is recognized and what the kernel expects
BSD licensed module to use.
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#10982Closes#10992
The current dmu_zfetch code implicitly assumes that I/Os complete
within min_sec_reap seconds. With async dmu and a readonly workload
(and thus no exponential backoff in operations from the "write
throttle") such as L2ARC rebuild it is possible to saturate the drives
with I/O requests. These are then effectively compounded with prefetch
requests.
This change reference counts streams and prevents them from being
recycled after their min_sec_reap timeout if they still have
outstanding I/Os.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes#10900
Prefetching of dnodes in dbuf_read() can cause significant mutex
contention for some workloads and isn't very helpful. This is
because we already get 32 dnodes for each block read, and when
iterating over a directory we prefetch the dnodes in the directory.
Disable this prefetching to prevent the lock contention.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Submitted-by: Adam Moss <c@yotes.com>
Submitted-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Adam Moss <c@yotes.com>
Closes#10877Closes#10953