Commit Graph

287 Commits

Author SHA1 Message Date
Ryan Moeller
1c94345103 FreeBSD: upstream changes to VFS interface
Set VIRF_MOUNTPOINT flag on snapshot mountpoint.

Authored-by: Mateusz Guzik <mjg@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11458
2021-01-23 15:40:43 -08:00
Brian Behlendorf
83b91ae1a4
Linux 5.10 compat: restore custom uio_prefaultpages()
As part of commit 1c2358c1 the custom uio_prefaultpages() code
was removed in favor of using the generic kernel provided
iov_iter_fault_in_readable() interface.  Unfortunately, it
turns out that up until the Linux 4.7 kernel the function would
only ever fault in the first iovec of the iov_iter.  The result
being uiomove_iov() may hang waiting for the page.

This commit effectively restores the custom uio_prefaultpages()
pages code for Linux 4.9 and earlier kernels which contain the
troublesome version of iov_iter_fault_in_readable().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11463 
Closes #11484
2021-01-21 10:43:39 -08:00
Brian Atkinson
d0cd9a5cc6
Extending FreeBSD UIO Struct
In FreeBSD the struct uio was just a typedef to uio_t. In order to
extend this struct, outside of the definition for the struct uio, the
struct uio has been embedded inside of a uio_t struct.

Also renamed all the uio_* interfaces to be zfs_uio_* to make it clear
this is a ZFS interface.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Closes #11438
2021-01-20 21:27:30 -08:00
Matthew Ahrens
e2af2acce3
allow callers to allocate and provide the abd_t struct
The `abd_get_offset_*()` routines create an abd_t that references
another abd_t, and doesn't allocate any pages/buffers of its own.  In
some workloads, these routines may be called frequently, to create many
abd_t's representing small pieces of a single large abd_t.  In
particular, the upcoming RAIDZ Expansion project makes heavy use of
these routines.

This commit adds the ability for the caller to allocate and provide the
abd_t struct to a variant of `abd_get_offset_*()`.  This eliminates the
cost of allocating the abd_t and performing the accounting associated
with it (`abdstat_struct_size`).  The RAIDZ/DRAID code uses this for
the `rc_abd`, which references the zio's abd.  The upcoming RAIDZ
Expansion project will leverage this infrastructure to increase
performance of reads post-expansion by around 50%.

Additionally, some of the interfaces around creating and destroying
abd_t's are cleaned up.  Most significantly, the distinction between
`abd_put()` and `abd_free()` is eliminated; all types of abd_t's are
now disposed of with `abd_free()`.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Issue #8853 
Closes #11439
2021-01-20 11:24:37 -08:00
Konstantin Khorenko
064c2cf40e
VZ 7 kernel compat: introduce ITER-enabled .direct_IO() via IOVECs
Virtuozzo 7 kernels starting 3.10.0-1127.18.2.vz7.163.46
have the following configuration:

  * no HAVE_VFS_RW_ITERATE
  * HAVE_VFS_DIRECT_IO_ITER_RW_OFFSET

=> let's add implementation of zpl_direct_IO() via
zpl_aio_{read,write}() in this case.

https://bugs.openvz.org/browse/OVZ-7243

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Closes #11410 
Closes #11411
2020-12-30 14:18:29 -08:00
Brian Behlendorf
c449d4b06d Linux 5.11 compat: blk_{un}register_region()
As of 5.11 the blk_register_region() and blk_unregister_region()
functions have been retired. This isn't a problem since add_disk()
has implicitly allocated minor numbers for a very long time.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:46 -08:00
Brian Behlendorf
19697e4545 Linux 5.11 compat: revalidate_disk_size()
Both revalidate_disk_size() and revalidate_disk() have been removed.
Functionally this isn't a problem because we only relied on these
functions to call zvol_revalidate_disk() for us and to perform any
additional handling which might be needed for that kernel version.
When neither are available we know there's no additional handling
needed and we can directly call zvol_revalidate_disk().

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:40 -08:00
Brian Behlendorf
72ba4b2a4c Linux 5.11 compat: bdev_whole()
The bd_contains member was removed from the block_device structure.
Callers needing to determine if a vdev is a whole block device should
use the new bdev_whole() wrapper.  For older kernels we provide our
own bdev_whole() wrapper which relies on bd_contains for compatibility.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:33 -08:00
Brian Behlendorf
a970f0594e Linux 5.11 compat: bio_start_io_acct() / bio_end_io_acct()
The generic IO accounting functions have been removed in favor of the
bio_start_io_acct() and bio_end_io_acct() functions which provide a
better interface.  These new functions were introduced in the 5.8
kernels but it wasn't until the 5.11 kernel that the previous generic
IO accounting interfaces were removed.

This commit updates the blk_generic_*_io_acct() wrappers to provide
and interface similar to the updated kernel interface.  It's slightly
different because for older kernels we need to pass the request queue
as well as the bio.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:24 -08:00
Brian Behlendorf
b7281c88bc Linux 5.11 compat: lookup_bdev()
The lookup_bdev() function has been updated to require a dev_t
be passed as the second argument. This is actually pretty nice
since the major number stored in the dev_t was the only part we
were interested in. This allows to us avoid handling the bdev
entirely.  The vdev_lookup_bdev() wrapper was updated to emulate
the behavior of the new lookup_bdev() for all supported kernels.

Reviewed-by: Rafael Kitover <rkitover@gmail.com>
Reviewed-by: Coleman Kane <ckane@colemankane.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11387
Closes #11390
2020-12-27 16:20:08 -08:00
Brian Behlendorf
8947fa4495
Fix maybe uninitialized variable warning
Commit 1c2358c12 restructured this code and introduced a warning
about the variable maybe not being initialized.  This cannot happen
with the updated code but we should initialize the variable anyway
to silence the warning.

    zpl_file.c: In function ‘zpl_iter_write’:
    zpl_file.c:324:9: warning: ‘count’ may be used uninitialized
        in this function [-Wmaybe-uninitialized]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11373
2020-12-20 09:50:13 -08:00
Brian Behlendorf
9ac535e662
Remove iov_iter_advance() from iter_read
There's no need to call iov_iter_advance() in zpl_iter_read().
This was preserved from the previous code where it wasn't needed
but also didn't cause any problems.  Now that the iter functions
also handle pipes that's no longer the case.  When fully reading a
pipe buffer iov_iter_advance() may results in the pipe buf release
function being called which will not be registered resulting in
a NULL dereference.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11375 
Closes #11378
2020-12-20 09:49:29 -08:00
Michael D Labriola
1c0bbd52c3
Linux 5.10 compat: also zvol_revalidate_disk()
Commit 59b68723 added a configure check for 5.10, which removed
revalidate_disk(), and conditionally replaced it's usage with a call to
the new revalidate_disk_size() function.  However, the old function also
invoked the device's registered callback, in our case
zvol_revalidate_disk().  This commit adds a call to zvol_revalidate_disk()
in zvol_update_volsize() to make sure the code path stays the same.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Michael D Labriola <michael.d.labriola@gmail.com>
Closes #11358
2020-12-18 09:36:19 -08:00
Brian Behlendorf
1c2358c12a
Linux 5.10 compat: use iov_iter in uio structure
As of the 5.10 kernel the generic splice compatibility code has been
removed.  All filesystems are now responsible for registering a
->splice_read and ->splice_write callback to support this operation.

The good news is the VFS provided generic_file_splice_read() and
iter_file_splice_write() callbacks can be used provided the ->iter_read
and ->iter_write callback support pipes.  However, this is currently
not the case and only iovecs and bvecs (not pipes) are ever attached
to the uio structure.

This commit changes that by allowing full iov_iter structures to be
attached to uios.  Ever since the 4.9 kernel the iov_iter structure
has supported iovecs, kvecs, bvevs, and pipes so it's desirable to
pass the entire thing when possible.  In conjunction with this the
uio helper functions (i.e uiomove(), uiocopy(), etc) have been
updated to understand the new UIO_ITER type.

Note that using the kernel provided uio_iter interfaces allowed the
existing Linux specific uio handling code to be simplified.  When
there's no longer a need to support kernel's older than 4.9, then
it will be possible to remove the iovec and bvec members from the
uio structure and always use a uio_iter.  Until then we need to
maintain all of the existing types for older kernels.

Some additional refactoring and cleanup was included in this change:

- Added checks to configure to detect available iov_iter interfaces.
  Some are available all the way back to the 3.10 kernel and are used
  when available.  In particular, uio_prefaultpages() now always uses
  iov_iter_fault_in_readable() which is available for all supported
  kernels.

- The unused UIO_USERISPACE type has been removed.  It is no longer
  needed now that the uio_seg enum is platform specific.

- Moved zfs_uio.c from the zcommon.ko module to the Linux specific
  platform code for the zfs.ko module.  This gets it out of libzfs
  where it was never needed and keeps this Linux specific code out
  of the common sources.

- Removed unnecessary O_APPEND handling from zfs_iter_write(), this
  is redundant and O_APPEND is already handled in zfs_write();

Reviewed-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11351
2020-12-18 08:48:26 -08:00
Ryan Moeller
8c5606ca0b FreeBSD: Fix format of vfs.zfs.arc_no_grow_shift
vfs.zfs.arc_no_grow_shift has an invalid type (15) and this causes
py-sysctl to format it as a bytearray when it should be an integer.

"U" is not a valid format, it should be "I" and the type should match
the variable type, int.  We can return EINVAL if the value is set below
zero.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11318
2020-12-10 15:28:56 -08:00
Brian Behlendorf
e5f732edbb
Fix possibly uninitialized 'root_inode' variable warning
Resolve an uninitialized variable warning when compiling.

    In function ‘zfs_domount’:
    warning: ‘root_inode’ may be used uninitialized in this
        function [-Wmaybe-uninitialized]
    sb->s_root = d_make_root(root_inode);

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11306
2020-12-10 15:23:26 -08:00
Paul Dagnelie
60a4c7d2a2
Implement memory and CPU hotplug
ZFS currently doesn't react to hotplugging cpu or memory into the 
system in any way. This patch changes that by adding logic to the ARC 
that allows the system to take advantage of new memory that is added 
for caching purposes. It also adds logic to the taskq infrastructure 
to support dynamically expanding the number of threads allocated to a 
taskq.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #11212
2020-12-10 14:09:23 -08:00
Alexander Motin
6366ef2240
Bring consistency to ABD chunk count types.
With both abd_size and abd_nents being uint_t it makes no sense for
abd_chunkcnt_for_bytes() to return size_t.  Random mix of different
types used to count chunks looks bad and makes compiler more difficult
to optimize the code.

In particular on FreeBSD this change allows compiler to completely
optimize out abd_verify_scatter() when built without debug, removing
pointless 64-bit division and even more pointless empty loop.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11279
2020-12-06 09:53:40 -08:00
George Amanakis
d1d47691c2
Fix raw sends on encrypted datasets when copying back snapshots
When sending raw encrypted datasets the user space accounting is present
when it's not expected to be. This leads to the subsequent mount failure
due a checksum error when verifying the local mac.
Fix this by clearing the OBJSET_FLAG_USERACCOUNTING_COMPLETE and reset
the local mac. This allows the user accounting to be correctly updated
on first mount using the normal upgrade process.

Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-By: Tom Caputi <caputit1@tcnj.edu>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10523 
Closes #11221
2020-12-04 14:34:29 -08:00
Matthew Macy
0ca45cb310
Fix problems in zvol_set_volmode_impl
- Don't leave fstrans set when passed a snapshot
- Don't remove minor if volmode already matches new value
- (FreeBSD) Wait for GEOM ops to complete before trying
  remove (at create time GEOM will be "tasting" in parallel)
- (FreeBSD) Don't leak zvol_state_lock on open if zv == NULL
- (FreeBSD) Don't try to unlock zv->zv_state lock if zv == NULL

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11199
2020-11-17 09:50:52 -08:00
Brian Behlendorf
4352edaafb
Linux: Fix ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP usage
The ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP macros should not be used
in the Linux zpl_*.c source files.  They return a positive error
value which is correct for the common code, but not for the Linux
specific kernel code which expects a negative return value.  The
ZPL_ENTER/ZPL_EXIT/ZPL_VERFY_ZP macros should be used instead.

Furthermore, the ZPL_EXIT macro has been updated to not call the
zfs_exit_fs() function.  This prevents a possible deadlock which
can occur when a snapshot is automatically unmounted because the
zpl_show_devname() must never wait on in progress automatic
snapshot unmounts.

Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11169 
Closes #11201
2020-11-14 10:19:00 -08:00
Brian Behlendorf
b2255edcc0
Distributed Spare (dRAID) Feature
This patch adds a new top-level vdev type called dRAID, which stands
for Distributed parity RAID.  This pool configuration allows all dRAID
vdevs to participate when rebuilding to a distributed hot spare device.
This can substantially reduce the total time required to restore full
parity to pool with a failed device.

A dRAID pool can be created using the new top-level `draid` type.
Like `raidz`, the desired redundancy is specified after the type:
`draid[1,2,3]`.  No additional information is required to create the
pool and reasonable default values will be chosen based on the number
of child vdevs in the dRAID vdev.

    zpool create <pool> draid[1,2,3] <vdevs...>

Unlike raidz, additional optional dRAID configuration values can be
provided as part of the draid type as colon separated values. This
allows administrators to fully specify a layout for either performance
or capacity reasons.  The supported options include:

    zpool create <pool> \
        draid[<parity>][:<data>d][:<children>c][:<spares>s] \
        <vdevs...>

    - draid[parity]       - Parity level (default 1)
    - draid[:<data>d]     - Data devices per group (default 8)
    - draid[:<children>c] - Expected number of child vdevs
    - draid[:<spares>s]   - Distributed hot spares (default 0)

Abbreviated example `zpool status` output for a 68 disk dRAID pool
with two distributed spares using special allocation classes.

```
  pool: tank
 state: ONLINE
config:

    NAME                  STATE     READ WRITE CKSUM
    slag7                 ONLINE       0     0     0
      draid2:8d:68c:2s-0  ONLINE       0     0     0
        L0                ONLINE       0     0     0
        L1                ONLINE       0     0     0
        ...
        U25               ONLINE       0     0     0
        U26               ONLINE       0     0     0
        spare-53          ONLINE       0     0     0
          U27             ONLINE       0     0     0
          draid2-0-0      ONLINE       0     0     0
        U28               ONLINE       0     0     0
        U29               ONLINE       0     0     0
        ...
        U42               ONLINE       0     0     0
        U43               ONLINE       0     0     0
    special
      mirror-1            ONLINE       0     0     0
        L5                ONLINE       0     0     0
        U5                ONLINE       0     0     0
      mirror-2            ONLINE       0     0     0
        L6                ONLINE       0     0     0
        U6                ONLINE       0     0     0
    spares
      draid2-0-0          INUSE     currently in use
      draid2-0-1          AVAIL
```

When adding test coverage for the new dRAID vdev type the following
options were added to the ztest command.  These options are leverages
by zloop.sh to test a wide range of dRAID configurations.

    -K draid|raidz|random - kind of RAID to test
    -D <value>            - dRAID data drives per group
    -S <value>            - dRAID distributed hot spares
    -R <value>            - RAID parity (raidz or dRAID)

The zpool_create, zpool_import, redundancy, replacement and fault
test groups have all been updated provide test coverage for the
dRAID feature.

Co-authored-by: Isaac Huang <he.huang@intel.com>
Co-authored-by: Mark Maybee <mmaybee@cray.com>
Co-authored-by: Don Brady <don.brady@delphix.com>
Co-authored-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mmaybee@cray.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10102
2020-11-13 13:51:51 -08:00
Brian Behlendorf
c08d442e45
Linux: Fix mount/unmount when dataset name has a space
The custom zpl_show_devname() helper should translate spaces in
to the octal escape sequence \040.  The getmntent(2) function
is aware of this convention and properly translates the escape
character back to a space when reading the fsname.

Without this change the `zfs mount` and `zfs unmount` commands
incorrectly detect when a dataset with a name containing spaces
is mounted.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11182 
Closes #11187
2020-11-11 17:14:24 -08:00
Tony Perkins
9bd14b8724 Start snapdir_iterate traversals to begin wtih the value of zero.
The microzap hash can sometimes be zero for single digit snapnames.
The zap cursor can then have a serialized value of two (for . and ..),
and skip the first entry in the avl tree for the .zfs/snapshot directory
listing, and therefore does not return all snapshots.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: Tony Perkins <tperkins@datto.com>
Closes #11039
2020-11-11 17:06:16 -08:00
Mateusz Guzik
1a0b4f566c
G/C struct znode -> z_moved
The field is yet another leftover from unsupported zfs_znode_move.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11186
2020-11-10 12:42:47 -08:00
Ryan Moeller
7b42f09049 FreeBSD: Simplify zvol_geom_open and zvol_cdev_open
We can consolidate the unlocking procedure into one place by starting
with drop_suspend set to B_FALSE and moving the open count check up.

While here, a little code cleanup. Match the out labels between
zvol_geom_open and zvol_cdev_open, and add a missing period in some
comments.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175
2020-11-10 11:08:10 -08:00
Ryan Moeller
2186ed33f1 FreeBSD: Avoid spurious EINTR in zvol_cdev_open
zvol_first_open can fail with EINTR if spa_namespace_lock is not held
and cannot be taken without waiting.

Apply the same logic that was done for zvol_geom_open to take
spa_namespace_lock if not already held on first open in zvol_cdev_open.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175
2020-11-10 11:07:25 -08:00
Ryan Moeller
8a9634e2f3 Remove redundant oid parameter to update_pages
The oid comes from the znode we are already passing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11176
2020-11-10 10:54:30 -08:00
Mariusz Zaborski
ae37ceadaa
FreeBSD: Prevent a NULL reference in zvol_cdev_open
Check if the ZVOL has been written before calling zil_async_to_sync.
The ZIL will be opened on the first write, not earlier.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Closes #11152
2020-11-05 17:02:19 -08:00
khng300
a4246bce50
FreeBSD: Prevent NULL pointer dereference of resid
spa_config_load() passes NULL into resid when doing zfs_file_read().
This would trip over when vfs.zfs.autoimport_disable=0.

Sponsored by: The FreeBSD Foundation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Ka Ho Ng <khng@freebsdfoundation.org>
Closes #11149
2020-11-04 16:50:08 -08:00
Ryan Moeller
181b2adc2a
FreeBSD: zvol_os: Use SET_ERROR more judiciously
SET_ERROR is useful to trace errors, so use it where the errors occur
rather than factored out to the end of a function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11146
2020-11-03 09:21:09 -08:00
Coleman Kane
59b6872327 Linux 5.10 compat: revalidate_disk_size() added
A new function was added named revalidate_disk_size() and the old
revalidate_disk() appears to have been deprecated. As the only ZFS
code that calls this function is zvol_update_volsize, swapping the
old function call out for the new one should be all that is required.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-02 22:01:19 +00:00
Coleman Kane
ae15f1c1d8 Linux 5.10 compat: check_disk_change() removed
Kernel 5.10 removed check_disk_change() in favor of callers using
the faster bdev_check_media_change() instead, and explicitly forcing
bdev revalidation when they desire that behavior. To preserve prior
behavior, I have wrapped this into a zfs_check_media_change() macro
that calls an inline function for the new API that mimics the old
behavior when check_disk_change() doesn't exist, and just calls
check_disk_change() if it exists.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-02 22:01:19 +00:00
Coleman Kane
838a249012 Linux 5.10 compat: percpu_ref added data member
Kernel commit 2b0d3d3e4fcfb brought in some changes to the struct
percpu_ref structure that moves most of its fields into a member
struct named "data" of type struct percpu_ref_data. This includes
the "count" member which is updated by vdev_blkg_tryget(), so update
this function to chase the API change, and detect it via configure.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Coleman Kane <ckane@colemankane.org>
Closes #11085
2020-11-02 22:01:19 +00:00
Christian Schwarz
ab8c935ea6
zfs_vnops: make zfs_get_data OS-independent
Move zfs_get_data() in to platform-independent code. The only
platform-specific aspect of it is the way we release an inode 
(Linux) / vnode_t (FreeBSD). I am not aware of a platform that
could be supported by ZFS that couldn't implement zfs_rele_async 
itself. It's sibling zvol_get_data already is platform-independent.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <me@cschwarz.com>
Closes #10979
2020-11-02 12:07:07 -08:00
Matthew Macy
8583540c6e
Consolidate zfs_holey and zfs_access
The zfs_holey() and zfs_access() functions can be made common
to both FreeBSD and Linux.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11125
2020-10-31 09:40:08 -07:00
Ryan Moeller
65a343bbd3 zvol_os: Fix handling of zvol private data
zvol private data is supposed to be nulled by zvol_clear_private before
zvol_free is called as an indicator that the zvol is going away.

Implement zvol_clear_private for volmode=dev.

Assert that zvol_clear_private has been called before zvol_free.

Check that zvol_clear_private has not been called when updating
volsize.  If it has, fail with ENXIO.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:49 -07:00
Ryan Moeller
277884ab42 zvol_os: Don't leak doi in cdev error path
Make sure to free doi in zvol_create_minor impl when make_dev_s fails.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:43 -07:00
Ryan Moeller
9a0ef216e5 zvol_os: Properly ignore error in volmode lookup
We fall back to a default volmode and continue when looking up a zvol's
volmode property fails.  After this we should set the error to 0 to
ensure we take the success paths in the out section.

While here, make sure we only log that the zvol was created on success.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:36 -07:00
Ryan Moeller
1a6a75ac07 zvol_os: Code cleanup in zvol_create_minor_impl
Nonfunctional changes for readability and consistency.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:30 -07:00
Ryan Moeller
260f6a28af zvol_os: Keep better track of open count in close
zvol_geom_close gets a count of the number of close operations to do.

Make sure we're always using this count to check if this will be the
last close operation performed on the zvol.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:23 -07:00
Ryan Moeller
0b32d81783 zvol_os: Tidy up asserts
Using more specific assert variants gives better messages on failure.

No functional change.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11117
2020-10-30 15:34:15 -07:00
Mateusz Guzik
115216cc92
FreeBSD: catch up with 1300124 version bump
- use cache_vop_mkdir
- cache_rename -> cache_vop_rename

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11136
2020-10-30 15:22:04 -07:00
Ryan Moeller
d1e4ded7bc
FreeBSD: Fix 12.2-STABLE after AT_BENEATH MFC
AT_BENEATH was merged to stable/12, where kern_unlinkat takes a 
non-const path.  DECONST the path passed to kern_unlinkat in the 
case where AT_BENEATH is defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11139
2020-10-30 15:19:02 -07:00
Matthew Macy
5fa356ea44
Remove UIO_ZEROCOPY functions structures
The original xuio zero copy functionality has always been unused 
on Linux and FreeBSD.  Remove this disabled code to avoid any
confusion and improve readability.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11124
2020-10-30 10:00:33 -07:00
Ryan Moeller
76d04993a6
Update references to nonexistent man pages in code
Refer to the correct section or alternative for FreeBSD and Linux.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11132
2020-10-30 08:55:59 -07:00
Alexander Motin
e3a6ac8d06
FreeBSD: Remove BIO_ORDERED flag from BIO_FLUSH
ZFS always waits for the write completion before flushing the cache.
That is why it does not require explicit ordering fences around it,
which are pretty difficult to implement for NVMe, since one has no
internal concept of strict request ordering.

This was already removed from FreeBSD once, but got resurrected
by mistake during OpenZFS merge.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11130
2020-10-30 08:50:57 -07:00
Mateusz Guzik
973ba682f5
Linux: g/c leftover fence in zfs_znode_alloc
The port removed provisions for zfs_znode_move but the cleanup missed
this bit. To quote the original:

[snip]
    list_insert_tail(&zfsvfs->z_all_znodes, zp);
    membar_producer();
    /*
     * Everything else must be valid before assigning z_zfsvfs makes the
     * znode eligible for zfs_znode_move().
     */
    zp->z_zfsvfs = zfsvfs;
[/snip]

In the current code it is immediately followed by unlock which issues
the same fence, thus plays no role in correctness.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11115
2020-10-29 09:54:20 -07:00
Mateusz Guzik
082ff328f2
FreeBSD: g/c unused zfs_znode_move support
The allocator does not provide the functionality to begin with.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11114
2020-10-29 09:52:50 -07:00
Ryan Moeller
5c810ac499
FreeBSD: Skip RAW kstat sysctls by default
These kstats are often expensive to compute so we want to avoid them
unless specifically requested.

The following kstats are affected by this change:

kstat.zfs.${pool}.multihost
kstat.zfs.${pool}.misc.state
kstat.zfs.${pool}.txgs
kstat.zfs.misc.fletcher_4_bench
kstat.zfs.misc.vdev_raidz_bench
kstat.zfs.misc.dbufs
kstat.zfs.misc.dbgmsg

In FreeBSD 13, sysctl(8) has been updated to still list the
names/description/type of skipped sysctls so they are still
discoverable.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11099
2020-10-26 14:34:28 -07:00