Commit Graph

1613 Commits

Author SHA1 Message Date
Ryan Moeller
57940b435c
Share some code for spa deadman tunables
We need to do the same thing to update all spas on any OS for these
tunables, so let's share the code.

While here let's match the types of the literals initializing the
variables with the type of the variable.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9964
2020-02-10 13:11:30 -08:00
Attila Fülöp
31b160f0a6
ICP: Improve AES-GCM performance
Currently SIMD accelerated AES-GCM performance is limited by two
factors:

a. The need to disable preemption and interrupts and save the FPU
state before using it and to do the reverse when done. Due to the
way the code is organized (see (b) below) we have to pay this price
twice for each 16 byte GCM block processed.

b. Most processing is done in C, operating on single GCM blocks.
The use of SIMD instructions is limited to the AES encryption of the
counter block (AES-NI) and the Galois multiplication (PCLMULQDQ).
This leads to the FPU not being fully utilized for crypto
operations.

To solve (a) we do crypto processing in larger chunks while owning
the FPU. An `icp_gcm_avx_chunk_size` module parameter was introduced
to make this chunk size tweakable. It defaults to 32 KiB. This step
alone roughly doubles performance. (b) is tackled by porting and
using the highly optimized openssl AES-GCM assembler routines, which
do all the processing (CTR, AES, GMULT) in a single routine. Both
steps together result in up to 32x reduction of the time spend in
the en/decryption routines, leading up to approximately 12x
throughput increase for large (128 KiB) blocks.

Lastly, this commit changes the default encryption algorithm from
AES-CCM to AES-GCM when setting the `encryption=on` property.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Jason King <jason.king@joyent.com>
Reviewed-By: Tom Caputi <tcaputi@datto.com>
Reviewed-By: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #9749
2020-02-10 12:59:50 -08:00
Brian Behlendorf
795699a6cc Linux 5.6 compat: timestamp_truncate()
The timestamp_truncate() function was added, it replaces the existing
timespec64_trunc() function.  This change renames our wrapper function
to be consistent with the upstream name and updates the compatibility
code for older kernels accordingly.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9956
Closes #9961
2020-02-07 11:04:32 -08:00
Brian Behlendorf
0dd7364853 Linux 5.6 compat: struct proc_ops
The proc_ops structure was introduced to replace the use of of the
file_operations structure when registering proc handlers.  This
change creates a new kstat_proc_op_t typedef for compatibility
which can be used to pass around the correct structure.

This change additionally adds the 'const' keyword to all of the
existing proc operations structures.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9961
2020-02-07 11:03:53 -08:00
Alexander Motin
e0ce98d57c
Reduce number of atomic_add() calls in aggsum
Previous code used 4 atomics to do aggsum_flush_bucket() and 2 more to
re-borrow after the flush.  But since asc_borrowed and asc_delta are
accessed only while holding asc_lock, it makes no any sense to modify
as_lower_bound and as_upper_bound in multiple steps.  Instead of that
the new code uses only 2 atomics in all the cases, one per as_*_bound
variable.  I think even that is overkill, simple atomic store and
load could be used here, since all modifications are done under the
as_lock, but there are no such primitives in ZFS code now.

While there, make borrow code consider previous borrow value, so that
on mixed request patterns reduce chance of needing to borrow again if
much larger request follows tiny one that needed borrow.

Also reduce as_numbuckets from uint64_t to u_int.  It makes no sense
to use so large division operation on every aggsum_add().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9930
2020-02-06 13:21:06 -08:00
Alexander Motin
cbd8f5b759
Few microoptimizations to dbuf layer
Move db_link into the same cache line as db_blkid and db_level.
It allows significantly reduce avl_add() time in dbuf_create() on
systems with large RAM and huge number of dbufs per dnode.

Avoid few accesses to dbuf_caches[].size, which is highly congested
under high IOPS and never stays in cache for a long time.  Use local
value we are receiving from zfs_refcount_add_many() any way.

Remove cache_size_bytes_max bump from dbuf_evict_one().  I don't see
a point to do it on dbuf eviction after we done it on insertion in
dbuf_rele_and_unlock().

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #9931
2020-02-05 11:08:44 -08:00
Matthew Macy
cccbed9f98
Convert dbuf dirty record record list to a list_t
Additionally pull in state machine comments about
upcoming async cow work.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9902
2020-02-05 11:07:19 -08:00
Ryan Moeller
8c4987c489
Restore aclmode and remove acltype on FreeBSD
This replaces the placeholder ZFS_PROP_PRIVATE with ZFS_PROP_ACLMODE,
matching what is done in the NFSv4 ACLs PR (#9709).

On FreeBSD we hide ZFS_PROP_ACLTYPE, while on Linux we hide
ZFS_PROP_ACLMODE.

The tests already assume this arrangement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #9913
2020-02-04 08:40:07 -08:00
Matthew Ahrens
ec21397127
async zvol minor node creation interferes with receive
When we finish a zfs receive, dmu_recv_end_sync() calls
zvol_create_minors(async=TRUE).  This kicks off some other threads that
create the minor device nodes (in /dev/zvol/poolname/...).  These async
threads call zvol_prefetch_minors_impl() and zvol_create_minor(), which
both call dmu_objset_own(), which puts a "long hold" on the dataset.
Since the zvol minor node creation is asynchronous, this can happen
after the `ZFS_IOC_RECV[_NEW]` ioctl and `zfs receive` process have
completed.

After the first receive ioctl has completed, userland may attempt to do
another receive into the same dataset (e.g. the next incremental
stream).  This second receive and the asynchronous minor node creation
can interfere with one another in several different ways, because they
both require exclusive access to the dataset:

1. When the second receive is finishing up, dmu_recv_end_check() does
dsl_dataset_handoff_check(), which can fail with EBUSY if the async
minor node creation already has a "long hold" on this dataset.  This
causes the 2nd receive to fail.

2. The async udev rule can fail if zvol_id and/or systemd-udevd try to
open the device while the the second receive's async attempt at minor
node creation owns the dataset (via zvol_prefetch_minors_impl).  This
causes the minor node (/dev/zd*) to exist, but the udev-generated
/dev/zvol/... to not exist.

3. The async minor node creation can silently fail with EBUSY if the
first receive's zvol_create_minor() trys to own the dataset while the
second receive's zvol_prefetch_minors_impl already owns the dataset.

To address these problems, this change synchronously creates the minor
node.  To avoid the lock ordering problems that the asynchrony was
introduced to fix (see #3681), we create the minor nodes from open
context, with no locks held, rather than from syncing contex as was
originally done.

Implementation notes:

We generally do not need to traverse children or prefetch anything (e.g.
when running the recv, snapshot, create, or clone subcommands of zfs).
We only need recursion when importing/opening a pool and when loading
encryption keys.  The existing recursive, asynchronous, prefetching code
is preserved for use in these cases.

Channel programs may need to create zvol minor nodes, when creating a
snapshot of a zvol with the snapdev property set.  We figure out what
snapshots are created when running the LUA program in syncing context.
In this case we need to remember what snapshots were created, and then
try to create their minor nodes from open context, after the LUA code
has completed.

There are additional zvol use cases that asynchronously own the dataset,
which can cause similar problems.  E.g. changing the volmode or snapdev
properties.  These are less problematic because they are not recursive
and don't touch datasets that are not involved in the operation, there
is still potential for interference with subsequent operations.  In the
future, these cases should be similarly converted to create the zvol
minor node synchronously from open context.

The async tasks of removing and renaming minors do not own the objset,
so they do not have this problem.  However, it may make sense to also
convert these operations to happen synchronously from open context, in
the future.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-65948
Closes #7863
Closes #9885
2020-02-03 09:33:14 -08:00
Romain Dolbeau
35b07497c6 Add AltiVec RAID-Z
Implements the RAID-Z function using AltiVec SIMD.
This is basically the NEON code translated to AltiVec.

Note that the 'fletcher' algorithm requires 64-bits
operations, and the initial implementations of AltiVec
(PPC74xx a.k.a. G4, PPC970 a.k.a. G5) only has up to
32-bits operations, so no 'fletcher'.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9539
2020-01-23 11:01:24 -08:00
Jason King
e2ef1cbf04 Support inheriting properties in channel programs
This adds support in channel programs to inherit properties analogous
to `zfs inherit` by adding `zfs.sync.inherit` and `zfs.check.inherit`
functions to the ZFS LUA API.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason King <jason.king@joyent.com>
Closes #9738
2020-01-22 17:03:17 -08:00
Tom Caputi
61152d1069 Fix errata #4 handling for resuming streams
Currently, the handling for errata #4 has two issues which allow
the checks for this issue to be bypassed using resumable sends.
The first issue is that drc->drc_fromsnapobj is not set in the
resuming code as it is in the non-resuming code. This causes
dsl_crypto_recv_key_check() to skip its checks for the
from_ivset_guid. The second issue is that resumable sends do not
clean up their on-disk state if they fail the checks in
dmu_recv_stream() that happen before any data is received.

As a result of these two bugs, a user can attempt a resumable send
of a dataset without a from_ivset_guid. This will fail the initial
dmu_recv_stream() checks, leaving a valid resume state. The send
can then be resumed, which skips those checks, allowing the receive
to be completed.

This commit fixes these issues by setting drc->drc_fromsnapobj in
the resuming receive path and by ensuring that resumablereceives
are properly cleaned up if they fail the initial dmu_recv_stream()
checks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9818 
Closes #9829
2020-01-14 12:25:20 -08:00
Kyle Evans
68a192e4b7 libzfs: add zfs_mount_at() function
zfs_mount_at() mounts a dataset at an arbitrary mountpoint rather than
at the configured mountpoint. This may be used by consumers that wish to
temporarily expose a dataset at another mountpoint without altering
dataset/pool properties.

This will be used by FreeBSD's libbe be_mount(), which mounts a boot
environment at an arbitrary mountpoint.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Kyle Evans <kevans@FreeBSD.org>
Closes #9833
2020-01-14 08:49:54 -08:00
Brian Behlendorf
e458fcca75
Change http://zfsonlinux.org links to https://zfsonlinux.org
Update the project website links contained in to repository to
reference the secure https://zfsonlinux.org address.

Reviewed-By: Richard Laager <rlaager@wiktel.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Garrett Fields <ghfields@gmail.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9837
2020-01-13 16:43:59 -08:00
Tom Caputi
ba0ba69e50 Add 'zfs send --saved' flag
This commit adds the --saved (-S) to the 'zfs send' command.
This flag allows a user to send a partially received dataset,
which can be useful when migrating a backup server to new
hardware. This flag is compatible with resumable receives, so
even if the saved send is interrupted, it can be resumed.
The flag does not require any user / kernel ABI changes or any
new feature flags in the send stream format.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Reviewed-by: Christian Schwarz <me@cschwarz.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9007
2020-01-10 10:16:58 -08:00
Nick Black
4abd7d80b2 Remove parameter names from libzfs.h signatures
Most of libzfs.h doesn't provide names for the parameters
in its signatures. These few functions included them. That
wouldn't be a problem, per se, but the 'lines' parameter
conflicts with the 'lines' #define from terminfo's term.h,
present for at least a decade. This makes it difficult to
compile code making use of both ZFS and terminfo.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Nick Black <dankamongmen@gmail.com>
Closes #9821
2020-01-08 17:50:05 -08:00
loli10K
c24fa4b19a Fix "zpool add -n" for dedup, special and log devices
For dedup, special and log devices "zpool add -n" does not print
correctly their vdev type:

~# zpool add -n pool dedup /tmp/dedup special /tmp/special log /tmp/log
would update 'pool' to the following configuration:
	pool
	  /tmp/normal
	  /tmp/dedup
	  /tmp/special
	  /tmp/log

This could lead storage administrators to modify their ZFS pools to
unexpected and unintended vdev configurations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9783 
Closes #9390
2020-01-06 15:40:06 -08:00
Tony Hutter
9fb2771aa5 Colorize zpool status output
If the ZFS_COLOR env variable is set, then use ANSI color
output in zpool status:

- Column headers are bold
- Degraded or offline pools/vdevs are yellow
- Non-zero error counters and faulted vdevs/pools are red
- The 'status:' and 'action:' sections are yellow if they're
  displaying a warning.

This also includes a new 'faketty' function in libtest.shlib that is
compatible with FreeBSD (code provided by @freqlabs).

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #9340
2019-12-19 16:26:07 -08:00
Matthew Macy
13a9a6f5e8 Make zfs_replay.c work on FreeBSD
FreeBSD's vfs currently doesn't permit file systems
to do their own locking. To avoid having to have
duplicate zfs functions with and without locking add
locking here. With luck these changes can be removed
in the future.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9715
2019-12-13 07:54:10 -08:00
Ryan Moeller
957c7aa23c Relocate common quota functions to shared code
The quota functions are common to all implementations and can be
moved to common code.  As a simplification they were moved to the
Linux platform code in the initial refactoring.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9710
2019-12-11 12:12:08 -08:00
Matthew Macy
4bc721965f Add FreeBSD jail support hooks
Add the 'zfs jail/unjail' subcommands along with the relevant 
documentation from FreeBSD.  This feature is not supported on
Linux and still requires the match kernel ioctls which will
be included when the FreeBSD platform code is integrated.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9686
2019-12-11 11:58:37 -08:00
Matthew Macy
657ce25357 Eliminate Linux specific inode usage from common code
Change many of the znops routines to take a znode rather
than an inode so that zfs_replay code can be largely shared
and in the future the much of the znops code may be shared.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9708
2019-12-11 11:53:57 -08:00
Fabian-Gruenbichler
b119e2c6f1 SIMD: Use alloc_pages_node to force alignment
fxsave and xsave require the target address to be 16-/64-byte aligned.

kmalloc(_node) does not (yet) offer such fine-grained control over
alignment[0,1], even though it does "the right thing" most of the time
for power-of-2 sizes. unfortunately, alignment is completely off when
using certain debugging or hardening features/configs, such as KASAN,
slub_debug=Z or the not-yet-upstream SLAB_CANARY.

Use alloc_pages_node() instead which allows us to allocate page-aligned
memory. Since fpregs_state is padded to a full page anyway, and this
code is only relevant for x86 which has 4k pages, this approach should
not allocate any unnecessary memory but still guarantee the needed
alignment.

0: https://lwn.net/Articles/787740/
1: https://lore.kernel.org/linux-block/20190826111627.7505-1-vbabka@suse.cz/

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9608 
Closes #9674
2019-12-10 12:53:25 -08:00
Matthew Macy
362ae8d11f Abstract away platform specific superblock references
The zfsvfs->z_sb field is Linux specified and should be abstracted.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9697
2019-12-10 09:21:07 -08:00
Matthew Macy
3c502d3b75 Exclude data from cores unconditionally and metadata conditionally
This change allows us to align the code dump logic across platforms.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9691
2019-12-09 12:29:56 -08:00
Matthew Macy
1f654753ba Remove stale ASSERTV comment
Followup for #9671, remove stale comment.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Issue #9671 
Closes #9682
2019-12-06 09:33:27 -08:00
Matthew Macy
f95704ca5e Disable EDONR on FreeBSD
FreeBSD uses its own crypto framework in-kernel which, at this time,
has no EDONR implementation.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9664
2019-12-05 13:10:29 -08:00
Matthew Macy
054a049841 Add ZFS_IOC offsets for FreeBSD
FreeBSD requires three additional ioctls, they are ZFS_IOC_NEXTBOOT,
ZFS_IOC_JAIL, and ZFS_IOC_UNJAIL.  These have been added after the
Linux-specific ioctls.  The range 0x80-0xFF has been reserved for 
future optional platform-specific ioctls.  Any platform may choose
to implement these as appropriate.

None of the existing ioctl numbers have been changed to maintain
compatibility.  For Linux no vectors have been registered for the
new ioctls and they are reported as unsupported.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9667
2019-12-05 13:06:51 -08:00
Matthew Macy
e64e84eca5 Refactor deadman set failmode to be cross platform
Update zfs_deadman_failmode to use the ZFS_MODULE_PARAM_CALL
wrapper, and split the common and platform specific portions.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9670
2019-12-05 12:40:45 -08:00
Matthew Macy
2a8ba608d3 Replace ASSERTV macro with compiler annotation
Remove the ASSERTV macro and handle suppressing unused 
compiler warnings for variables only in ASSERTs using the 
__attribute__((unused)) compiler annotation.  The annotation
is understood by both gcc and clang.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9671
2019-12-05 12:37:00 -08:00
Matthew Macy
be627fc847 Refactor zfs_context.h to build on FreeBSD
- on Linux move Linux specific headers to zfs_context_os.h
- on FreeBSD move FreeBSD specific definitions to zfs_context_os.h
- remove duplicate tsd_ definitions
- remove unused AT_TYPE

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9668
2019-12-04 13:12:57 -08:00
Matthew Macy
74d1d74959 Move linux qsort def to platform header
Moving qsort to the platform header allows each platform to
provide an appropriate sorting implementation.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9663
2019-12-03 09:49:40 -08:00
Matthew Macy
5142032106 Move zfs_cmd_t copyin/copyout to platform code
FreeBSD needs to cope with multiple version of the zfs_cmd_t
structure. Allowing the platform code to pre and post
process the cmd structure makes it possible to work with
legacy tooling.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9624
2019-12-02 10:08:27 -08:00
Matthew Macy
42a826eed3 Add FreeBSD required defines to mntent.h
Linux and FreeBSD use different names for suid / setuid.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9632
2019-11-30 15:49:09 -08:00
Matthew Macy
77323bcf53 Add FreeBSD support to zio_crypto.h
Minimal compatibility changes for FreeBSD.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9631
2019-11-30 15:38:16 -08:00
Matthew Macy
a5b762ab1d Resolve ZoF differences in zfs_ioctl.h
FreeBSD needs to be able to pass the jail id to the jail/unjail ioctls
and the struct file in the device structure is unused.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9625
2019-11-30 15:35:54 -08:00
Matthew Macy
d6f67df63c Minor diff reduction with ZoF in include/sys
- move linux/ includes to platform headers
- add void * io_bio to zio for tracking the underlying bio
- add freebsd specific fields to abd_scatter

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9615
2019-11-27 11:11:03 -08:00
Brian Behlendorf
9e17e6f254
Remove zfs_vdev_elevator module option
As described in commit f81d5ef6 the zfs_vdev_elevator module
option is being removed.  Users who require this functionality
should update their systems to set the disk scheduler using a
udev rule.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #8664
Closes #9417
Closes #9609
2019-11-27 10:35:49 -08:00
jwpoduska
3c819a2c7d Prevent unnecessary resilver restarts
If a device is participating in an active resilver, then it will have a
non-empty DTL. Operations like vdev_{open,reopen,probe}() can cause the
resilver to be restarted (or deferred to be restarted later), which is
unnecessary if the DTL is still covered by the current scan range. This
is similar to the logic in vdev_dtl_should_excise() where the DTL can
only be excised if it's max txg is in the resilvered range.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: John Poduska <jpoduska@datto.com>
Issue #840 
Closes #9155
Closes #9378
Closes #9551
Closes #9588
2019-11-27 10:15:01 -08:00
Matthew Macy
da92d5cbb3 Add zfs_file_* interface, remove vnodes
Provide a common zfs_file_* interface which can be implemented on all 
platforms to perform normal file access from either the kernel module
or the libzpool library.

This allows all non-portable vnode_t usage in the common code to be 
replaced by the new portable zfs_file_t.  The associated vnode and
kobj compatibility functions, types, and macros have been removed
from the SPL.  Moving forward, vnodes should only be used in platform
specific code when provided by the native operating system.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9556
2019-11-21 09:32:57 -08:00
Brian Behlendorf
7ae3f8dc8f
Partially revert 5a6ac4c
Reinstate the zpl_revalidate() functionality to resolve a regression
where dentries for open files during a rollback are not invalidated.

The unrelated functionality for automatically unmounting .zfs/snapshots
was not reverted.  Nor was the addition of shrink_dcache_sb() to the
zfs_resume_fs() function.

This issue was not immediately caught by the CI because the test case
intended to catch it was included in the list of ZTS tests which may
occasionally fail for unrelated reasons.  Remove all of the rollback
tests from this list to help identify the frequency of any spurious
failures.

The rollback_003_pos.ksh test case exposes a real issue with the
long standing code which needs to be investigated.  Regardless,
it has been enable with a small workaround in the test case itself.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9587
Closes #9592
2019-11-18 13:05:56 -08:00
Michael Niewöhner
6d948c3519 Add kmem_cache flag for forcing kvmalloc
This adds a new KMC_KVMEM flag was added to enforce use of the
kvmalloc allocator in kmem_cache_create even for large blocks, which
may also increase performance in some specific cases (e.g. zstd), too.

Default to KVMEM instead of VMEM in spl_kmem_cache_create.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9034
2019-11-13 10:05:23 -08:00
Michael Niewöhner
66955885e2 Make use of kvmalloc if available and fix vmem_alloc implementation
This patch implements use of kvmalloc for GFP_KERNEL allocations, which
may increase performance if the allocator is able to allocate physical
memory, if kvmalloc is available as a public kernel interface (since
v4.12). Otherwise it will simply fall back to virtual memory (vmalloc).

Also fix vmem_alloc implementation which can lead to slow allocations
since the first attempt with kmalloc does not make use of the noretry
flag but tells the linux kernel to retry several times before it fails.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #9034
2019-11-13 10:05:10 -08:00
Matthew Macy
1f2f46be95 Add wrapper stub for zfs_cmd ioctl to libzpool
FreeBSD needs a wrapper for handling zfs_cmd ioctls.
In libzfs this is handled by zfs_ioctl. However, here
we need to wrap the call directly.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9511
2019-11-12 10:40:39 -08:00
Brian Behlendorf
066e825221
Linux compat: Minimum kernel version 3.10
Increase the minimum supported kernel version from 2.6.32 to 3.10.
This removes support for the following Linux enterprise distributions.

    Distribution     | Kernel | End of Life
    ---------------- | ------ | -------------
    Ubuntu 12.04 LTS | 3.2    | Apr 28, 2017
    SLES 11          | 3.0    | Mar 32, 2019
    RHEL / CentOS 6  | 2.6.32 | Nov 30, 2020

The following changes were made as part of removing support.

* Updated `configure` to enforce a minimum kernel version as
  specified in the META file (Linux-Minimum: 3.10).

    configure: error:
        *** Cannot build against kernel version 2.6.32.
        *** The minimum supported kernel version is 3.10.

* Removed all `configure` kABI checks and matching C code for
  interfaces which solely predate the Linux 3.10 kernel.

* Updated all `configure` kABI checks to fail when an interface is
  missing which was in the 3.10 kernel up to the latest 5.1 kernel.
  Removed the HAVE_* preprocessor defines for these checks and
  updated the code to unconditionally use the verified interface.

* Inverted the detection logic in several kABI checks to match
  the new interface as it appears in 3.10 and newer and not the
  legacy interface.

* Consolidated the following checks in to individual files. Due
  the large number of changes in the checks it made sense to handle
  this now.  It would be desirable to group other related checks in
  the same fashion, but this as left as future work.

  - config/kernel-blkdev.m4 - Block device kABI checks
  - config/kernel-blk-queue.m4 - Block queue kABI checks
  - config/kernel-bio.m4 - Bio interface kABI checks

* Removed the kABI checks for sops->nr_cached_objects() and
  sops->free_cached_objects().  These interfaces are currently unused.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9566
2019-11-12 08:59:06 -08:00
Ryan Moeller
035ebb3653 Allow platform dependent path stripping for vdevs
On Linux the full path preceding devices is stripped when formatting
vdev names. On FreeBSD we only want to strip "/dev/". Hide the
implementation details of path stripping behind zfs_strip_path().

Make zfs_strip_partition_path() static in Linux implementation while
here, since it is never used outside of the file it is defined in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9565
2019-11-11 12:15:44 -08:00
Pavel Snajdr
5a6ac4cffc Remove zpl_revalidate
This patch removes the need for zpl_revalidate altogether.

There were 3 main reasons why we used d_revalidate:

1. periodic automounted snapshots umount deferral
2. negative dentries created before snapshot rollback
3. stale inodes referenced by dentry cache after snapshot rollback

Periodic snapshots deferral solution introduces zfs_exit_fs function,
which is called as a part of ZFS_EXIT(zfsvfs_t) macro.

Negative dentries and stale inodes are solved by flushing the dcache
for the particular dataset on zfs_resume_fs call.

This patch also removes now unused HAVE_S_D_OP configure test.

Reviewed-by: Aleksa Sarai <cyphar@cyphar.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #8774 
Closes #9549
2019-11-11 09:34:21 -08:00
Romain Dolbeau
4254e40729 Preliminary support for RV64G
This adds basic support for RISC-V, specifically RV64G.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu>
Closes #9540
2019-11-06 10:56:09 -08:00
Matthew Macy
27ece2ee4d Move platform specific parts of zfs_znode.h to platform code
Some of the znode fields are different and functions
consuming an inode don't exist on FreeBSD.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9536
2019-11-06 10:54:25 -08:00
Prakash Surya
ae38e00968 Add tracepoints for taskq entry lifetime events
This adds some new DTRACE_PROBE* endpoints so that we can observe taskq
latencies on a system. Additionally, a new "taskqlatency.bt" script is
added to do this observation via "bpftrace". Lastly, a "zfs-trace.sh"
script is added to wrap "bpftrace" with the proper options required to
run and use "taskqlatency.bt".

For example, with these changes in place, a user can run the following:

    $ cd ./contrib/bpftrace
    $ sudo ./zfs-trace.sh taskqlatency.bt
    Attaching 6 probes...
    ^C

Here's some example output, showing latency information for time spent
executing the taskq entry's function:

    @exec_lat_us[dp_sync_taskq, userquota_updates_task]:
    [2, 4)                 5 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [4, 8)                 0 |                                                    |
    [8, 16)                1 |@@@@@@@@@@                                          |
    [16, 32)               2 |@@@@@@@@@@@@@@@@@@@@                                |

    @exec_lat_us[z_wr_int_h, zio_execute]:
    [8, 16)               16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [16, 32)               2 |@@@@@@                                              |

    @exec_lat_us[z_wr_iss_h, zio_execute]:
    [16, 32)               4 |@@@@@@@@@@@@@@@@                                    |
    [32, 64)              13 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [64, 128)              1 |@@@@                                                |

    @exec_lat_us[z_ioctl_int, zio_execute]:
    [2, 4)                 1 |@@@@                                                |
    [4, 8)                11 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [8, 16)                8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@               |

    @exec_lat_us[dp_sync_taskq, sync_dnodes_task]:
    [2, 4)                 1 |@@@@@@                                              |
    [4, 8)                 7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       |
    [8, 16)                8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [16, 32)               2 |@@@@@@@@@@@@@                                       |
    [32, 64)               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
    [64, 128)              1 |@@@@@@                                              |
    [128, 256)             0 |                                                    |
    [256, 512)             1 |@@@@@@

Here's some example output, showing latency information for time spent
waiting on the taskq, prior to starting execution of entry's function:

    @queue_lat_us[dp_sync_taskq]:
    [2, 4)                 1 |@@@@                                                |
    [4, 8)                 7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      |
    [8, 16)                2 |@@@@@@@@                                            |
    [16, 32)               3 |@@@@@@@@@@@@@                                       |
    [32, 64)              12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [64, 128)              6 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
    [128, 256)             0 |                                                    |
    [256, 512)             1 |@@@@                                                |

    @queue_lat_us[z_wr_iss]:
    [4, 8)                 4 |@@@@                                                |
    [8, 16)               13 |@@@@@@@@@@@@@@@                                     |
    [16, 32)               6 |@@@@@@@                                             |
    [32, 64)               2 |@@                                                  |
    [64, 128)             12 |@@@@@@@@@@@@@@                                      |
    [128, 256)            15 |@@@@@@@@@@@@@@@@@@                                  |
    [256, 512)            33 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@             |
    [512, 1K)             27 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                    |
    [1K, 2K)               7 |@@@@@@@@                                            |
    [2K, 4K)              14 |@@@@@@@@@@@@@@@@                                    |
    [4K, 8K)              14 |@@@@@@@@@@@@@@@@                                    |
    [8K, 16K)             23 |@@@@@@@@@@@@@@@@@@@@@@@@@@@                         |
    [16K, 32K)            43 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

    @queue_lat_us[z_wr_int]:
    [2, 4)                10 |@@@@@                                               |
    [4, 8)                71 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           |
    [8, 16)               88 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [16, 32)              50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                       |
    [32, 64)              65 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@              |
    [64, 128)             43 |@@@@@@@@@@@@@@@@@@@@@@@@@                           |
    [128, 256)            19 |@@@@@@@@@@@                                         |
    [256, 512)             3 |@                                                   |
    [512, 1K)              1 |                                                    |

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9525
2019-11-01 13:14:54 -07:00
Prakash Surya
e5d1c27e30 Enable use of DTRACE_PROBE* macros in "spl" module
This change modifies some of the infrastructure for enabling the use of
the DTRACE_PROBE* macros, such that we can use tehm in the "spl" module.

Currently, when the DTRACE_PROBE* macros are used, they get expanded to
create new functions, and these dynamically generated functions become
part of the "zfs" module.

Since the "spl" module does not depend on the "zfs" module, the use of
DTRACE_PROBE* in the "spl" module would result in undefined symbols
being used in the "spl" module. Specifically, DTRACE_PROBE* would turn
into a function call, and the function being called would be a symbol
only contained in the "zfs" module; which results in a linker and/or
runtime error.

Thus, this change adds the necessary logic to the "spl" module, to
mirror the tracing functionality available to the "zfs" module. After
this change, we'll have a "trace_zfs.h" header file which defines the
probes available only to the "zfs" module, and a "trace_spl.h" header
file which defines the probes available only to the "spl" module.

Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #9525
2019-11-01 13:13:43 -07:00
Matthew Macy
4a2ed90013 Wrap Linux module macros
MODULE_VERSION is already defined on FreeBSD. Wrap all of the
used MODULE_* macros for the sake of consistency and portability.

Add a user space noop version to reduce the need for _KERNEL ifdefs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9542
2019-11-01 10:41:03 -07:00
Matthew Macy
bd4dde8ef7 Prefix struct rangelock
A struct rangelock already exists on FreeBSD.  Add a zfs_ prefix as
per our convention to prevent any conflict with existing symbols.
This change is a follow up to 2cc479d0.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9534
2019-11-01 10:37:33 -07:00
Matthew Macy
156f74fc03 Return an error code from zfs_acl_chmod_setattr
The FreeBSD implementation can fail, allow this function to
fail and add the required error handling for Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9541
2019-11-01 10:19:11 -07:00
Matthew Macy
c4ae27c763 Move sha2.h to platform code
FreeBSD has its own sha routines that the port uses.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9530
2019-10-31 15:45:58 -07:00
Matthew Macy
a5308e252d Don't cast away const
Follow up to 511fce6b which missed a cast.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9533
2019-10-31 10:38:03 -07:00
Matthew Macy
2a3aa5a109 Factor Linux specific code out of spa_misc.c
Move these Linux module parameter get/set helpers in to
platform specific code.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9457
2019-10-31 09:52:22 -07:00
Romain Dolbeau
0b2a642351 Add AVX512BW variant of fletcher
It is much faster than AVX512F when byteswapping on Skylake-SP
and newer, as we can do the byteswap in a single vshufb instead
of many instructions.

Reviewed by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net>
Closes #9517
2019-10-30 12:26:14 -07:00
Matthew Macy
d46f0deb03 Add wrapper for Linux BLKFLSBUF ioctl
FreeBSD has no analog. Buffered block devices were removed a decade
plus ago.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9508
2019-10-28 09:53:39 -07:00
Matthew Macy
4a22ba5be0 Minor spa portability fixes
- FreeBSD's rootpool import code uses spa_config_parse
- Move the zvol_create_minors call out from under the
  spa_namespace_lock in spa_import. It isn't needed and it causes
  a lock order reversal on FreeBSD.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9499
2019-10-28 09:51:53 -07:00
Chunwei Chen
7125a109dc Fix zpool history unbounded memory usage
In original implementation, zpool history will read the whole history
before printing anything, causing memory usage goes unbounded. We fix
this by breaking it into read-print iterations.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #9516
2019-10-28 09:49:44 -07:00
loli10K
e35704647e Fix for ARC sysctls ignored at runtime
This change leverage module_param_call() to run arc_tuning_update()
immediately after the ARC tunable has been updated as suggested in
cffa8372 code review.

A simple test case is added to the ZFS Test Suite to prevent future
regressions in functionality.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9487  
Closes #9489
2019-10-26 15:22:19 -07:00
Matthew Macy
1952fe0e25 Move platform dependent errno aliases
EBADE, EBADR, and ENOANO do not exist on FreeBSD

The libspl errno.h is similarly platform dependent.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9498
2019-10-25 13:40:50 -07:00
Matthew Macy
68a1b1589a Remove sdt.h
It's mostly a noop on ZoL and it conflicts with platforms that 
support dtrace.  Remove this header to resolve the conflict.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9497
2019-10-25 13:38:37 -07:00
Brian Behlendorf
10fa254539
Linux 4.14, 4.19, 5.0+ compat: SIMD save/restore
Contrary to initial testing we cannot rely on these kernels to
invalidate the per-cpu FPU state and restore the FPU registers.
Nor can we guarantee that the kernel won't modify the FPU state
which we saved in the task struck.

Therefore, the kfpu_begin() and kfpu_end() functions have been
updated to save and restore the FPU state using our own dedicated
per-cpu FPU state variables.

This has the additional advantage of allowing us to use the FPU
again in user threads.  So we remove the code which was added to
use task queues to ensure some functions ran in kernel threads.

Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #9346
Closes #9403
2019-10-24 10:17:33 -07:00
Serapheim Dimitropoulos
9f3c72a2a8 Name anonymous enum of KMC_BIT constants
Giving a name to this enum makes it discoverable from
debugging tools like DRGN and SDB. For example, with
the name proposed on this patch we can iterate over
these values in DRGN:
```
>>> prog.type('enum kmc_bit').enumerators
(('KMC_BIT_NOTOUCH', 0), ('KMC_BIT_NODEBUG', 1),
('KMC_BIT_NOMAGAZINE', 2), ('KMC_BIT_NOHASH', 3),
('KMC_BIT_QCACHE', 4), ('KMC_BIT_KMEM', 5),
('KMC_BIT_VMEM', 6), ('KMC_BIT_SLAB', 7),
...
```
This enables SDB to easily pretty-print the flags of
the spl_kmem_caches in the system like this:
```
> spl_kmem_caches -o "name,flags,total_memory"
name                                       flags total_memory
------------------------ ----------------------- ------------
abd_t                    KMC_NOMAGAZINE|KMC_SLAB        4.5MB
arc_buf_hdr_t_full       KMC_NOMAGAZINE|KMC_SLAB       12.3MB
... <cropped> ...
ddt_cache                               KMC_VMEM      583.7KB
ddt_entry_cache          KMC_NOMAGAZINE|KMC_SLAB         0.0B
... <cropped> ...
zio_buf_1048576             KMC_NODEBUG|KMC_VMEM         0.0B
... <cropped> ...
```

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9478
2019-10-18 13:25:44 -04:00
Matthew Macy
c9c9c1e213 OpenZFS restructuring - ARC memory pressure
Factor Linux specific memory pressure handling out of ARC.  Each
platform will have different available interfaces for managing memory
pressure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9472
2019-10-18 13:23:19 -04:00
Matthew Macy
08f530c699 Make zfsdev_getminor signature cross platform
Only pass the file descriptor to make zfsdev_get_miror() portable.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9466
2019-10-16 18:43:52 -07:00
Paul Dagnelie
511fce6b1f Don't call sizeof on void
We get the sizeof the appropriate type, and don't cast away const.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9455
2019-10-13 19:21:51 -07:00
Paul Dagnelie
516a83f886 Function name and comment updates
Rename certain functions for more consistency when they share common 
features. Make comments clearer about what arguments should be passed 
to the insert and add functions.

Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9441
2019-10-11 10:13:21 -07:00
Matthew Macy
af1698f59b Expose dmu_buf_hold_array_by_dnode to platform code
FreeBSD uses this in its pager ops routines

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9431
2019-10-11 10:06:18 -07:00
loli10K
715c996d3b Fix pool creation with feature@allocation_classes disabled
When "feature@allocation_classes" is not enabled on the pool no vdev
with "special" or "dedup" allocation type should be allowed to exist in
the vdev tree.

Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9427 
Closes #9429
2019-10-10 16:39:41 -07:00
Matthew Macy
2516a87821 Move get_temporary_prop to platform code
Temporary property handling at the VFS layer requires
platform specific code.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9401
2019-10-10 15:59:34 -07:00
Matthew Macy
6501906280 Add kmem cache accessors
Make the metaslab platform agnostic again by adding
accessor functions which can be implemented by each
platform.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9404
2019-10-10 15:45:52 -07:00
Matthew Macy
eedb3a62b9 Make zil_async_to_sync visible to platform code
FreeBSD's zvol platform code requires access to the
zil_async_to_sync() function.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9440
2019-10-10 15:39:44 -07:00
Matthew Macy
e4f5fa1229 Fix strdup conflict on other platforms
In the FreeBSD kernel the strdup signature is:

```
char	*strdup(const char *__restrict, struct malloc_type *);
```

It's unfortunate that the developers have chosen to change
the signature of libc functions - but it's what I have to
deal with.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9433
2019-10-10 09:47:06 -07:00
Paul Dagnelie
ca5777793e Reduce loaded range tree memory usage
This patch implements a new tree structure for ZFS, and uses it to 
store range trees more efficiently.

The new structure is approximately a B-tree, though there are some 
small differences from the usual characterizations. The tree has core 
nodes and leaf nodes; each contain data elements, which the elements 
in the core nodes acting as separators between its children. The 
difference between core and leaf nodes is that the core nodes have an 
array of children, while leaf nodes don't. Every node in the tree may 
be only partially full; in most cases, they are all at least 50% full 
(in terms of element count) except for the root node, which can be 
less full. Underfull nodes will steal from their neighbors or merge to 
remain full enough, while overfull nodes will split in two. The data 
elements are contained in tree-controlled buffers; they are copied 
into these on insertion, and overwritten on deletion. This means that 
the elements are not independently allocated, which reduces overhead, 
but also means they can't be shared between trees (and also that 
pointers to them are only valid until a side-effectful tree operation 
occurs). The overhead varies based on how dense the tree is, but is 
usually on the order of about 50% of the element size; the per-node 
overheads are very small, and so don't make a significant difference. 
The trees can accept arbitrary records; they accept a size and a 
comparator to allow them to be used for a variety of purposes.

The new trees replace the AVL trees used in the range trees today. 
Currently, the range_seg_t structure contains three 8 byte integers 
of payload and two 24 byte avl_tree_node_ts to handle its storage in 
both an offset-sorted tree and a size-sorted tree (total size: 64 
bytes). In the new model, the range seg structures are usually two 4 
byte integers, but a separate one needs to exist for the size-sorted 
and offset-sorted tree. Between the raw size, the 50% overhead, and 
the double storage, the new btrees are expected to use 8*1.5*2 = 24 
bytes per record, or 33.3% as much memory as the AVL trees (this is 
for the purposes of storing metaslab range trees; for other purposes, 
like scrubs, they use ~50% as much memory).

We reduced the size of the payload in the range segments by teaching 
range trees about starting offsets and shifts; since metaslabs have a 
fixed starting offset, and they all operate in terms of disk sectors, 
we can store the ranges using 4-byte integers as long as the size of 
the metaslab divided by the sector size is less than 2^32. For 512-byte
sectors, this is a 2^41 (or 2TB) metaslab, which with the default
settings corresponds to a 256PB disk. 4k sector disks can handle 
metaslabs up to 2^46 bytes, or 2^63 byte disks. Since we do not 
anticipate disks of this size in the near future, there should be 
almost no cases where metaslabs need 64-byte integers to store their 
ranges. We do still have the capability to store 64-byte integer ranges 
to account for cases where we are storing per-vdev (or per-dnode) trees, 
which could reasonably go above the limits discussed. We also do not 
store fill information in the compact version of the node, since it 
is only used for sorted scrub.

We also optimized the metaslab loading process in various other ways
to offset some inefficiencies in the btree model. While individual
operations (find, insert, remove_from) are faster for the btree than 
they are for the avl tree, remove usually requires a find operation, 
while in the AVL tree model the element itself suffices. Some clever 
changes actually caused an overall speedup in metaslab loading; we use 
approximately 40% less cpu to load metaslabs in our tests on Illumos.

Another memory and performance optimization was achieved by changing 
what is stored in the size-sorted trees. When a disk is heavily 
fragmented, the df algorithm used by default in ZFS will almost always 
find a number of small regions in its initial cursor-based search; it 
will usually only fall back to the size-sorted tree to find larger 
regions. If we increase the size of the cursor-based search slightly, 
and don't store segments that are smaller than a tunable size floor 
in the size-sorted tree, we can further cut memory usage down to 
below 20% of what the AVL trees store. This also results in further 
reductions in CPU time spent loading metaslabs.

The 16KiB size floor was chosen because it results in substantial memory 
usage reduction while not usually resulting in situations where we can't 
find an appropriate chunk with the cursor and are forced to use an 
oversized chunk from the size-sorted tree. In addition, even if we do 
have to use an oversized chunk from the size-sorted tree, the chunk 
would be too small to use for ZIL allocations, so it isn't as big of a 
loss as it might otherwise be. And often, more small allocations will 
follow the initial one, and the cursor search will now find the 
remainder of the chunk we didn't use all of and use it for subsequent 
allocations. Practical testing has shown little or no change in 
fragmentation as a result of this change.

If the size-sorted tree becomes empty while the offset sorted one still 
has entries, it will load all the entries from the offset sorted tree 
and disregard the size floor until it is unloaded again. This operation 
occurs rarely with the default setting, only on incredibly thoroughly 
fragmented pools.

There are some other small changes to zdb to teach it to handle btrees, 
but nothing major.
                                           
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sebastien Roy seb@delphix.com
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9181
2019-10-09 10:36:03 -07:00
Matthew Macy
2cc479d049 Rename rangelock_ functions to zfs_rangelock_
A rangelock KPI already exists on FreeBSD.  Add a zfs_ prefix as
per our convention to prevent any conflict with existing symbols.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9402
2019-10-03 15:54:29 -07:00
Matthew Macy
73cdcc6323 OpenZFS restructuring - libzfs
Factor Linux specific functionality out of libzfs.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Macy <mmacy@FreeBSD.org>
Closes #9377
2019-10-03 10:33:16 -07:00
Matthew Macy
7c5eff9400 OpenZFS restructuring - libzutil
Factor Linux specific functionality out of libzutil.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9356
2019-10-03 10:20:44 -07:00
Matthew Macy
6360e2779e Add inode accessors to common code
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9389
2019-10-02 09:15:12 -07:00
Matthew Macy
13a4027a7c OpenZFS restructuring - arc_stats
Make arc_stats visible to platform code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9386
2019-10-01 16:35:05 -07:00
Matthew Macy
3283f137d7 OpenZFS restructuring - zpool
Factor Linux specific functions out of the zpool command.
    
Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9333
2019-09-30 12:16:06 -07:00
Matthew Macy
7bb0c29468 OpenZFS restructuring - zfs_ioctl
Refactor the zfs ioctls in to platform dependent and independent bits.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Signed-off-by: Matthew Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9301
2019-09-27 10:46:28 -07:00
Tom Caputi
bb61cc3185 Fix encryption hierarchy issues with zfs recv -d
Currently, the recv_fix_encryption_hierarchy() function accepts
'destsnap' as one of its parameters. Originally, this was intended
to be the top-level dataset of a receive (whether or not the
receive was recursive). Unfortunately, this parameter actually is
simply the input that is passed in from the command line. When
the user specifies 'zfs recv -d', this string is actually only the
name of the receiving pool since the rest of the name is derived
from the send stream. This causes the function to fail, leaving
some datasets with an invalid encryption hierarchy.

This patch resolves this problem by passing in the top_zfs variable
instead. In order to make this work, this patch also includes some
changes that ensure the value is always present when we need it.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9273
Closes #9309
2019-09-25 17:02:32 -07:00
Matthew Macy
5df7e9d85c OpenZFS restructuring - zvol
Refactor the zvol in to platform dependent and independent bits.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9295
2019-09-25 09:20:30 -07:00
John Gallagher
e60e158eff Add subcommand to wait for background zfs activity to complete
Currently the best way to wait for the completion of a long-running
operation in a pool, like a scrub or device removal, is to poll 'zpool
status' and parse its output, which is neither efficient nor convenient.

This change adds a 'wait' subcommand to the zpool command. When invoked,
'zpool wait' will block until a specified type of background activity
completes. Currently, this subcommand can wait for any of the following:

 - Scrubs or resilvers to complete
 - Devices to initialized
 - Devices to be replaced
 - Devices to be removed
 - Checkpoints to be discarded
 - Background freeing to complete

For example, a scrub that is in progress could be waited for by running

    zpool wait -t scrub <pool>

This also adds a -w flag to the attach, checkpoint, initialize, replace,
remove, and scrub subcommands. When used, this flag makes the operations
kicked off by these subcommands synchronous instead of asynchronous.

This functionality is implemented using a new ioctl. The type of
activity to wait for is provided as input to the ioctl, and the ioctl
blocks until all activity of that type has completed. An ioctl was used
over other methods of kernel-userspace communiction primarily for the
sake of portability.

Porting Notes:
This is ported from Delphix OS change DLPX-44432. The following changes
were made while porting:

 - Added ZoL-style ioctl input declaration.
 - Reorganized error handling in zpool_initialize in libzfs to integrate
   better with changes made for TRIM support.
 - Fixed check for whether a checkpoint discard is in progress.
   Previously it also waited if the pool had a checkpoint, instead of
   just if a checkpoint was being discarded.
 - Exposed zfs_initialize_chunk_size as a ZoL-style tunable.
 - Updated more existing tests to make use of new 'zpool wait'
   functionality, tests that don't exist in Delphix OS.
 - Used existing ZoL tunable zfs_scan_suspend_progress, together with
   zinject, in place of a new tunable zfs_scan_max_blks_per_txg.
 - Added support for a non-integral interval argument to zpool wait.

Future work:
ZoL has support for trimming devices, which Delphix OS does not. In the
future, 'zpool wait' could be extended to add the ability to wait for
trim operations to complete.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
Closes #9162
2019-09-13 18:09:06 -07:00
Chengfei ZHu
7238cbd4d3 QAT related bug fixes
1. Fix issue:  Kernel BUG with QAT during decompression  #9276.
   Now it is uninterruptible for a specific given QAT request,
   but Ctrl-C interrupt still works in user-space process.

2. Copy the digest result to the buffer only when doing encryption,
   and vise-versa for decryption.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chengfei Zhu <chengfeix.zhu@intel.com>
Closes #9276 
Closes #9303
2019-09-12 13:33:44 -07:00
Matthew Macy
b01a6574ae Move objnode handling to common code
objnode is OS agnostic and used only by dmu_redact.c.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9315
2019-09-12 13:31:09 -07:00
Matthew Macy
74756182d2 Enable compiler to typecheck logging
Annotate spa logging declarations with printflike
Workaround gcc bug (non disable-able warning) by
replacing "" with " "

Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9316
2019-09-12 13:28:26 -07:00
Matthew Macy
d66620681d OpenZFS restructuring - move linux tracing code to platform directories
Move Linux specific tracing headers and source to platform directories
and update the build system.

Reviewed-by: Allan Jude <allanjude@freebsd.org>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #9290
2019-09-11 14:25:53 -07:00
Brian Behlendorf
25f06d677a
Fix /etc/hostid on root pool deadlock
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9256 
Closes #9285
2019-09-10 13:42:30 -07:00
Brian Behlendorf
b88ca2acf5
Enable SIMD for encryption
When adding the SIMD compatibility code in e5db313 the decryption of a
dataset wrapping key was left in a user thread context.  This was done
intentionally since it's a relatively infrequent operation.  However,
this also meant that the encryption context templates were initialized
using the generic operations.  Therefore, subsequent encryption and
decryption operations would use the generic implementation even when
executed by an I/O pipeline thread.

Resolve the issue by initializing the context templates in an I/O
pipeline thread.  And by updating zio_do_crypt_uio() to dispatch any
encryption operations to a pipeline thread when called from the user
context.  For example, when performing a read from the ARC.

Tested-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9215
Closes #9296
2019-09-10 10:45:46 -07:00
Matthew Macy
bced7e3aaa OpenZFS restructuring - move platform specific sources
Move platform specific Linux source under module/os/linux/
and update the build system accordingly.  Additional code
restructuring will follow to make the common code fully
portable.
    
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Macy <mmacy@FreeBSD.org>
Closes #9206
2019-09-06 11:26:26 -07:00
Matthew Macy
03fdcb9adc Make module tunables cross platform
Adds ZFS_MODULE_PARAM to abstract module parameter
setting to operating systems other than Linux.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@ixsystems.com>
Closes #9230
2019-09-05 14:49:49 -07:00
Matthew Macy
006e9a4088 OpenZFS restructuring - move platform specific headers
Move platform specific Linux headers under include/os/linux/.
Update the build system accordingly to detect the platform.
This lays some of the initial groundwork to supporting building
for other platforms.

As part of this change it was necessary to create both a user
and kernel space sys/simd.h header which can be included in
either context.  No functional change, the source has been
refactored and the relevant #include's updated.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Matthew Macy <mmacy@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9198
2019-09-05 09:34:54 -07:00
Andrea Gelmini
cf7c5a030e Fix typos in include/
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Closes #9238
2019-08-30 09:53:15 -07:00
Paul Dagnelie
eef0f4d84e Keep more metaslabs loaded
With the other metaslab changes loaded onto a system, we can 
significantly reduce the memory usage of each loaded metaslab and 
unload them on demand if there is memory pressure. However, none 
of those changes actually result in us keeping more metaslabs loaded. 
If we don't keep more metaslabs loaded, we will still have to wait 
for demand-loading to finish when no loaded metaslab can satisfy our 
allocation, which can cause ZIL performance issues. In addition,
performance is traditionally measured by IOs per unit time, while 
unloading is currently done on a txg-count basis. Txgs can take a 
widely varying range of times, from tenths of a second to several 
seconds. This can result in confusing, hard to predict behavior.

This change simply adds a time-based component to metaslab unloading. 
A metaslab will remain loaded for one minute and 8 txgs (by default) 
after it was last used, unless it is evicted due to memory pressure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
External-issue: DLPX-65016
External-issue: DLPX-65047
Closes #9197
2019-08-29 10:20:36 -07:00
Chunwei Chen
035e96118b Fix zil replay panic when TX_REMOVE followed by TX_CREATE
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.

We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7151
Closes #8910
Closes #9123
Closes #9145
2019-08-28 10:42:02 -07:00
Tom Caputi
e7a2fa70c3 Fix deadlock in 'zfs rollback'
Currently, the 'zfs rollback' code can end up deadlocked due to
the way the kernel handles unreferenced inodes on a suspended fs.
Essentially, the zfs_resume_fs() code path may cause zfs to spawn
new threads as it reinstantiates the suspended fs's zil. When a
new thread is spawned, the kernel may attempt to free memory for
that thread by freeing some unreferenced inodes. If it happens to
select inodes that are a a part of the suspended fs a deadlock
will occur because freeing inodes requires holding the fs's
z_teardown_inactive_lock which is still held from the suspend.

This patch corrects this issue by adding an additional reference
to all inodes that are still present when a suspend is initiated.
This prevents them from being freed by the kernel for any reason.

Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9203
2019-08-27 09:55:51 -07:00