2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* CDDL HEADER START
|
|
|
|
*
|
|
|
|
* The contents of this file are subject to the terms of the
|
|
|
|
* Common Development and Distribution License (the "License").
|
|
|
|
* You may not use this file except in compliance with the License.
|
|
|
|
*
|
|
|
|
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
2022-07-12 00:16:13 +03:00
|
|
|
* or https://opensource.org/licenses/CDDL-1.0.
|
2008-11-20 23:01:55 +03:00
|
|
|
* See the License for the specific language governing permissions
|
|
|
|
* and limitations under the License.
|
|
|
|
*
|
|
|
|
* When distributing Covered Code, include this CDDL HEADER in each
|
|
|
|
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
|
|
|
* If applicable, add the following below this CDDL HEADER, with the
|
|
|
|
* fields enclosed by brackets "[]" replaced with your own identifying
|
|
|
|
* information: Portions Copyright [yyyy] [name of copyright owner]
|
|
|
|
*
|
|
|
|
* CDDL HEADER END
|
|
|
|
*/
|
2012-12-14 03:24:15 +04:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
2010-05-29 00:45:14 +04:00
|
|
|
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
2012-04-08 21:18:48 +04:00
|
|
|
* Portions Copyright 2011 Martin Matuska
|
2016-06-09 22:24:29 +03:00
|
|
|
* Copyright 2015, OmniTI Computer Consulting, Inc. All rights reserved.
|
2023-03-10 22:59:53 +03:00
|
|
|
* Copyright (c) 2012 Pawel Jakub Dawidek
|
2017-01-31 21:24:23 +03:00
|
|
|
* Copyright (c) 2014, 2016 Joyent, Inc. All rights reserved.
|
2016-07-12 20:53:53 +03:00
|
|
|
* Copyright 2016 Nexenta Systems, Inc. All rights reserved.
|
2015-04-01 16:07:48 +03:00
|
|
|
* Copyright (c) 2014, Joyent, Inc. All rights reserved.
|
2020-04-23 20:06:57 +03:00
|
|
|
* Copyright (c) 2011, 2020 by Delphix. All rights reserved.
|
2013-01-23 13:54:30 +04:00
|
|
|
* Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
|
2013-05-25 06:06:23 +04:00
|
|
|
* Copyright (c) 2013 Steven Hartland. All rights reserved.
|
2017-04-13 19:40:00 +03:00
|
|
|
* Copyright (c) 2014 Integros [integros.com]
|
|
|
|
* Copyright 2016 Toomas Soome <tsoome@me.com>
|
2014-03-22 13:07:14 +04:00
|
|
|
* Copyright (c) 2016 Actifio, Inc. All rights reserved.
|
2019-02-09 02:44:15 +03:00
|
|
|
* Copyright (c) 2018, loli10K <ezomori.nozomu@gmail.com>. All rights reserved.
|
2017-06-27 02:56:09 +03:00
|
|
|
* Copyright 2017 RackTop Systems.
|
2017-10-26 22:26:09 +03:00
|
|
|
* Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
|
2019-03-12 23:13:22 +03:00
|
|
|
* Copyright (c) 2019 Datto Inc.
|
2019-11-11 10:24:14 +03:00
|
|
|
* Copyright (c) 2019, 2020 by Christian Schwarz. All rights reserved.
|
2021-11-30 17:46:25 +03:00
|
|
|
* Copyright (c) 2019, 2021, Klara Inc.
|
Add zstd support to zfs
This PR adds two new compression types, based on ZStandard:
- zstd: A basic ZStandard compression algorithm Available compression.
Levels for zstd are zstd-1 through zstd-19, where the compression
increases with every level, but speed decreases.
- zstd-fast: A faster version of the ZStandard compression algorithm
zstd-fast is basically a "negative" level of zstd. The compression
decreases with every level, but speed increases.
Available compression levels for zstd-fast:
- zstd-fast-1 through zstd-fast-10
- zstd-fast-20 through zstd-fast-100 (in increments of 10)
- zstd-fast-500 and zstd-fast-1000
For more information check the man page.
Implementation details:
Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.
The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers). The upper bits are used to store the compression level.
It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.
All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables. Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).
The userspace tools all use the combined/bit-shifted value.
Additional notes:
zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.
ZSTD is included with all current tests and new tests are added
as-needed.
Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born. This is currently only used by zstd but can be
extended as needed.
Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #6247
Closes #9024
Closes #10277
Closes #10278
2020-08-18 20:10:17 +03:00
|
|
|
* Copyright (c) 2019, Allan Jude
|
2013-08-28 15:45:09 +04:00
|
|
|
*/
|
|
|
|
|
|
|
|
/*
|
|
|
|
* ZFS ioctls.
|
|
|
|
*
|
|
|
|
* This file handles the ioctls to /dev/zfs, used for configuring ZFS storage
|
|
|
|
* pools and filesystems, e.g. with /sbin/zfs and /sbin/zpool.
|
|
|
|
*
|
|
|
|
* There are two ways that we handle ioctls: the legacy way where almost
|
|
|
|
* all of the logic is in the ioctl callback, and the new way where most
|
|
|
|
* of the marshalling is handled in the common entry point, zfsdev_ioctl().
|
|
|
|
*
|
|
|
|
* Non-legacy ioctls should be registered by calling
|
|
|
|
* zfs_ioctl_register() from zfs_ioctl_init(). The ioctl is invoked
|
|
|
|
* from userland by lzc_ioctl().
|
|
|
|
*
|
|
|
|
* The registration arguments are as follows:
|
|
|
|
*
|
|
|
|
* const char *name
|
|
|
|
* The name of the ioctl. This is used for history logging. If the
|
|
|
|
* ioctl returns successfully (the callback returns 0), and allow_log
|
|
|
|
* is true, then a history log entry will be recorded with the input &
|
|
|
|
* output nvlists. The log entry can be printed with "zpool history -i".
|
|
|
|
*
|
|
|
|
* zfs_ioc_t ioc
|
|
|
|
* The ioctl request number, which userland will pass to ioctl(2).
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
* We want newer versions of libzfs and libzfs_core to run against
|
|
|
|
* existing zfs kernel modules (i.e. a deferred reboot after an update).
|
|
|
|
* Therefore the ioctl numbers cannot change from release to release.
|
2013-08-28 15:45:09 +04:00
|
|
|
*
|
|
|
|
* zfs_secpolicy_func_t *secpolicy
|
|
|
|
* This function will be called before the zfs_ioc_func_t, to
|
|
|
|
* determine if this operation is permitted. It should return EPERM
|
|
|
|
* on failure, and 0 on success. Checks include determining if the
|
|
|
|
* dataset is visible in this zone, and if the user has either all
|
|
|
|
* zfs privileges in the zone (SYS_MOUNT), or has been granted permission
|
|
|
|
* to do this operation on this dataset with "zfs allow".
|
|
|
|
*
|
|
|
|
* zfs_ioc_namecheck_t namecheck
|
|
|
|
* This specifies what to expect in the zfs_cmd_t:zc_name -- a pool
|
|
|
|
* name, a dataset name, or nothing. If the name is not well-formed,
|
|
|
|
* the ioctl will fail and the callback will not be called.
|
|
|
|
* Therefore, the callback can assume that the name is well-formed
|
|
|
|
* (e.g. is null-terminated, doesn't have more than one '@' character,
|
|
|
|
* doesn't have invalid characters).
|
|
|
|
*
|
|
|
|
* zfs_ioc_poolcheck_t pool_check
|
|
|
|
* This specifies requirements on the pool state. If the pool does
|
|
|
|
* not meet them (is suspended or is readonly), the ioctl will fail
|
|
|
|
* and the callback will not be called. If any checks are specified
|
|
|
|
* (i.e. it is not POOL_CHECK_NONE), namecheck must not be NO_NAME.
|
|
|
|
* Multiple checks can be or-ed together (e.g. POOL_CHECK_SUSPENDED |
|
|
|
|
* POOL_CHECK_READONLY).
|
|
|
|
*
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
* zfs_ioc_key_t *nvl_keys
|
|
|
|
* The list of expected/allowable innvl input keys. This list is used
|
|
|
|
* to validate the nvlist input to the ioctl.
|
|
|
|
*
|
2013-08-28 15:45:09 +04:00
|
|
|
* boolean_t smush_outnvlist
|
|
|
|
* If smush_outnvlist is true, then the output is presumed to be a
|
|
|
|
* list of errors, and it will be "smushed" down to fit into the
|
|
|
|
* caller's buffer, by removing some entries and replacing them with a
|
|
|
|
* single "N_MORE_ERRORS" entry indicating how many were removed. See
|
|
|
|
* nvlist_smush() for details. If smush_outnvlist is false, and the
|
|
|
|
* outnvlist does not fit into the userland-provided buffer, then the
|
|
|
|
* ioctl will fail with ENOMEM.
|
|
|
|
*
|
|
|
|
* zfs_ioc_func_t *func
|
|
|
|
* The callback function that will perform the operation.
|
|
|
|
*
|
|
|
|
* The callback should return 0 on success, or an error number on
|
|
|
|
* failure. If the function fails, the userland ioctl will return -1,
|
|
|
|
* and errno will be set to the callback's return value. The callback
|
|
|
|
* will be called with the following arguments:
|
|
|
|
*
|
|
|
|
* const char *name
|
|
|
|
* The name of the pool or dataset to operate on, from
|
|
|
|
* zfs_cmd_t:zc_name. The 'namecheck' argument specifies the
|
|
|
|
* expected type (pool, dataset, or none).
|
|
|
|
*
|
|
|
|
* nvlist_t *innvl
|
|
|
|
* The input nvlist, deserialized from zfs_cmd_t:zc_nvlist_src. Or
|
|
|
|
* NULL if no input nvlist was provided. Changes to this nvlist are
|
|
|
|
* ignored. If the input nvlist could not be deserialized, the
|
|
|
|
* ioctl will fail and the callback will not be called.
|
|
|
|
*
|
|
|
|
* nvlist_t *outnvl
|
|
|
|
* The output nvlist, initially empty. The callback can fill it in,
|
|
|
|
* and it will be returned to userland by serializing it into
|
|
|
|
* zfs_cmd_t:zc_nvlist_dst. If it is non-empty, and serialization
|
|
|
|
* fails (e.g. because the caller didn't supply a large enough
|
|
|
|
* buffer), then the overall ioctl will fail. See the
|
|
|
|
* 'smush_nvlist' argument above for additional behaviors.
|
|
|
|
*
|
|
|
|
* There are two typical uses of the output nvlist:
|
|
|
|
* - To return state, e.g. property values. In this case,
|
|
|
|
* smush_outnvlist should be false. If the buffer was not large
|
|
|
|
* enough, the caller will reallocate a larger buffer and try
|
|
|
|
* the ioctl again.
|
|
|
|
*
|
|
|
|
* - To return multiple errors from an ioctl which makes on-disk
|
|
|
|
* changes. In this case, smush_outnvlist should be true.
|
|
|
|
* Ioctls which make on-disk modifications should generally not
|
|
|
|
* use the outnvl if they succeed, because the caller can not
|
|
|
|
* distinguish between the operation failing, and
|
|
|
|
* deserialization failing.
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
*
|
|
|
|
* IOCTL Interface Errors
|
|
|
|
*
|
|
|
|
* The following ioctl input errors can be returned:
|
|
|
|
* ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
|
|
|
|
* ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
|
|
|
|
* ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
|
|
|
|
* ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
|
2011-11-12 02:07:54 +04:00
|
|
|
*/
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
#include <sys/types.h>
|
|
|
|
#include <sys/param.h>
|
|
|
|
#include <sys/errno.h>
|
2021-02-21 07:16:50 +03:00
|
|
|
#include <sys/uio_impl.h>
|
2008-11-20 23:01:55 +03:00
|
|
|
#include <sys/file.h>
|
|
|
|
#include <sys/kmem.h>
|
|
|
|
#include <sys/cmn_err.h>
|
|
|
|
#include <sys/stat.h>
|
|
|
|
#include <sys/zfs_ioctl.h>
|
2019-12-11 23:12:08 +03:00
|
|
|
#include <sys/zfs_quota.h>
|
2010-05-29 00:45:14 +04:00
|
|
|
#include <sys/zfs_vfsops.h>
|
2008-11-20 23:01:55 +03:00
|
|
|
#include <sys/zfs_znode.h>
|
|
|
|
#include <sys/zap.h>
|
|
|
|
#include <sys/spa.h>
|
|
|
|
#include <sys/spa_impl.h>
|
|
|
|
#include <sys/vdev.h>
|
2017-05-19 22:30:16 +03:00
|
|
|
#include <sys/vdev_impl.h>
|
2008-11-20 23:01:55 +03:00
|
|
|
#include <sys/dmu.h>
|
|
|
|
#include <sys/dsl_dir.h>
|
|
|
|
#include <sys/dsl_dataset.h>
|
|
|
|
#include <sys/dsl_prop.h>
|
|
|
|
#include <sys/dsl_deleg.h>
|
|
|
|
#include <sys/dmu_objset.h>
|
2012-05-10 02:05:14 +04:00
|
|
|
#include <sys/dmu_impl.h>
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
#include <sys/dmu_redact.h>
|
2013-09-04 16:00:57 +04:00
|
|
|
#include <sys/dmu_tx.h>
|
2008-11-20 23:01:55 +03:00
|
|
|
#include <sys/sunddi.h>
|
|
|
|
#include <sys/policy.h>
|
|
|
|
#include <sys/zone.h>
|
|
|
|
#include <sys/nvpair.h>
|
|
|
|
#include <sys/pathname.h>
|
|
|
|
#include <sys/fs/zfs.h>
|
2011-11-11 11:15:53 +04:00
|
|
|
#include <sys/zfs_ctldir.h>
|
2008-11-20 23:01:55 +03:00
|
|
|
#include <sys/zfs_dir.h>
|
2010-08-27 01:24:34 +04:00
|
|
|
#include <sys/zfs_onexit.h>
|
2008-11-20 23:01:55 +03:00
|
|
|
#include <sys/zvol.h>
|
2010-05-29 00:45:14 +04:00
|
|
|
#include <sys/dsl_scan.h>
|
2010-08-26 22:44:39 +04:00
|
|
|
#include <sys/fm/util.h>
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
#include <sys/dsl_crypt.h>
|
2019-09-27 20:46:28 +03:00
|
|
|
#include <sys/rrwlock.h>
|
2019-11-21 20:32:57 +03:00
|
|
|
#include <sys/zfs_file.h>
|
2010-08-26 22:44:39 +04:00
|
|
|
|
2018-10-10 00:05:13 +03:00
|
|
|
#include <sys/dmu_recv.h>
|
2013-09-04 16:00:57 +04:00
|
|
|
#include <sys/dmu_send.h>
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
#include <sys/dmu_recv.h>
|
2013-09-04 16:00:57 +04:00
|
|
|
#include <sys/dsl_destroy.h>
|
2013-12-12 02:33:41 +04:00
|
|
|
#include <sys/dsl_bookmark.h>
|
2013-09-04 16:00:57 +04:00
|
|
|
#include <sys/dsl_userhold.h>
|
2013-01-23 13:54:30 +04:00
|
|
|
#include <sys/zfeature.h>
|
2018-02-08 19:16:23 +03:00
|
|
|
#include <sys/zcp.h>
|
2016-06-16 01:47:05 +03:00
|
|
|
#include <sys/zio_checksum.h>
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
#include <sys/vdev_removal.h>
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 17:54:59 +03:00
|
|
|
#include <sys/vdev_impl.h>
|
|
|
|
#include <sys/vdev_initialize.h>
|
2019-03-29 19:13:20 +03:00
|
|
|
#include <sys/vdev_trim.h>
|
2013-01-23 13:54:30 +04:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
#include "zfs_namecheck.h"
|
|
|
|
#include "zfs_prop.h"
|
|
|
|
#include "zfs_deleg.h"
|
2010-05-29 00:45:14 +04:00
|
|
|
#include "zfs_comutil.h"
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
#include <sys/lua/lua.h>
|
|
|
|
#include <sys/lua/lauxlib.h>
|
2019-09-27 20:46:28 +03:00
|
|
|
#include <sys/zfs_ioctl_impl.h>
|
2016-06-07 19:16:52 +03:00
|
|
|
|
2010-08-26 22:44:39 +04:00
|
|
|
kmutex_t zfsdev_state_lock;
|
2023-02-07 11:23:45 +03:00
|
|
|
static zfsdev_state_t zfsdev_state_listhead;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
/*
|
2019-09-27 20:46:28 +03:00
|
|
|
* Limit maximum nvlist size. We don't want users passing in insane values
|
|
|
|
* for zc->zc_nvlist_src_size, since we will need to allocate that much memory.
|
2020-08-18 19:33:55 +03:00
|
|
|
* Defaults to 0=auto which is handled by platform code.
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
*/
|
Cleanup: 64-bit kernel module parameters should use fixed width types
Various module parameters such as `zfs_arc_max` were originally
`uint64_t` on OpenSolaris/Illumos, but were changed to `unsigned long`
for Linux compatibility because Linux's kernel default module parameter
implementation did not support 64-bit types on 32-bit platforms. This
caused problems when porting OpenZFS to Windows because its LLP64 memory
model made `unsigned long` a 32-bit type on 64-bit, which created the
undesireable situation that parameters that should accept 64-bit values
could not on 64-bit Windows.
Upon inspection, it turns out that the Linux kernel module parameter
interface is extensible, such that we are allowed to define our own
types. Rather than maintaining the original type change via hacks to to
continue shrinking module parameters on 32-bit Linux, we implement
support for 64-bit module parameters on Linux.
After doing a review of all 64-bit kernel parameters (found via the man
page and also proposed changes by Andrew Innes), the kernel module
parameters fell into a few groups:
Parameters that were originally 64-bit on Illumos:
* dbuf_cache_max_bytes
* dbuf_metadata_cache_max_bytes
* l2arc_feed_min_ms
* l2arc_feed_secs
* l2arc_headroom
* l2arc_headroom_boost
* l2arc_write_boost
* l2arc_write_max
* metaslab_aliquot
* metaslab_force_ganging
* zfetch_array_rd_sz
* zfs_arc_max
* zfs_arc_meta_limit
* zfs_arc_meta_min
* zfs_arc_min
* zfs_async_block_max_blocks
* zfs_condense_max_obsolete_bytes
* zfs_condense_min_mapping_bytes
* zfs_deadman_checktime_ms
* zfs_deadman_synctime_ms
* zfs_initialize_chunk_size
* zfs_initialize_value
* zfs_lua_max_instrlimit
* zfs_lua_max_memlimit
* zil_slog_bulk
Parameters that were originally 32-bit on Illumos:
* zfs_per_txg_dirty_frees_percent
Parameters that were originally `ssize_t` on Illumos:
* zfs_immediate_write_sz
Note that `ssize_t` is `int32_t` on 32-bit and `int64_t` on 64-bit. It
has been upgraded to 64-bit.
Parameters that were `long`/`unsigned long` because of Linux/FreeBSD
influence:
* l2arc_rebuild_blocks_min_l2size
* zfs_key_max_salt_uses
* zfs_max_log_walking
* zfs_max_logsm_summary_length
* zfs_metaslab_max_size_cache_sec
* zfs_min_metaslabs_to_flush
* zfs_multihost_interval
* zfs_unflushed_log_block_max
* zfs_unflushed_log_block_min
* zfs_unflushed_log_block_pct
* zfs_unflushed_max_mem_amt
* zfs_unflushed_max_mem_ppm
New parameters that do not exist in Illumos:
* l2arc_trim_ahead
* vdev_file_logical_ashift
* vdev_file_physical_ashift
* zfs_arc_dnode_limit
* zfs_arc_dnode_limit_percent
* zfs_arc_dnode_reduce_percent
* zfs_arc_meta_limit_percent
* zfs_arc_sys_free
* zfs_deadman_ziotime_ms
* zfs_delete_blocks
* zfs_history_output_max
* zfs_livelist_max_entries
* zfs_max_async_dedup_frees
* zfs_max_nvlist_src_size
* zfs_rebuild_max_segment
* zfs_rebuild_vdev_limit
* zfs_unflushed_log_txg_max
* zfs_vdev_max_auto_ashift
* zfs_vdev_min_auto_ashift
* zfs_vnops_read_chunk_size
* zvol_max_discard_blocks
Rather than clutter the lists with commentary, the module parameters
that need comments are repeated below.
A few parameters were defined in Linux/FreeBSD specific code, where the
use of ulong/long is not an issue for portability, so we leave them
alone:
* zfs_delete_blocks
* zfs_key_max_salt_uses
* zvol_max_discard_blocks
The documentation for a few parameters was found to be incorrect:
* zfs_deadman_checktime_ms - incorrectly documented as int
* zfs_delete_blocks - not documented as Linux only
* zfs_history_output_max - incorrectly documented as int
* zfs_vnops_read_chunk_size - incorrectly documented as long
* zvol_max_discard_blocks - incorrectly documented as ulong
The documentation for these has been fixed, alongside the changes to
document the switch to fixed width types.
In addition, several kernel module parameters were percentages or held
ashift values, so being 64-bit never made sense for them. They have been
downgraded to 32-bit:
* vdev_file_logical_ashift
* vdev_file_physical_ashift
* zfs_arc_dnode_limit_percent
* zfs_arc_dnode_reduce_percent
* zfs_arc_meta_limit_percent
* zfs_per_txg_dirty_frees_percent
* zfs_unflushed_log_block_pct
* zfs_vdev_max_auto_ashift
* zfs_vdev_min_auto_ashift
Of special note are `zfs_vdev_max_auto_ashift` and
`zfs_vdev_min_auto_ashift`, which were already defined as `uint64_t`,
and passed to the kernel as `ulong`. This is inherently buggy on big
endian 32-bit Linux, since the values would not be written to the
correct locations. 32-bit FreeBSD was unaffected because its sysctl code
correctly treated this as a `uint64_t`.
Lastly, a code comment suggests that `zfs_arc_sys_free` is
Linux-specific, but there is nothing to indicate to me that it is
Linux-specific. Nothing was done about that.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Original-patch-by: Andrew Innes <andrew.c12@gmail.com>
Original-patch-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13984
Closes #14004
2022-10-03 22:06:54 +03:00
|
|
|
uint64_t zfs_max_nvlist_src_size = 0;
|
2019-09-27 20:46:28 +03:00
|
|
|
|
2020-11-14 21:17:16 +03:00
|
|
|
/*
|
|
|
|
* When logging the output nvlist of an ioctl in the on-disk history, limit
|
2021-04-03 04:38:53 +03:00
|
|
|
* the logged size to this many bytes. This must be less than DMU_MAX_ACCESS.
|
2020-11-14 21:17:16 +03:00
|
|
|
* This applies primarily to zfs_ioc_channel_program().
|
|
|
|
*/
|
Cleanup: 64-bit kernel module parameters should use fixed width types
Various module parameters such as `zfs_arc_max` were originally
`uint64_t` on OpenSolaris/Illumos, but were changed to `unsigned long`
for Linux compatibility because Linux's kernel default module parameter
implementation did not support 64-bit types on 32-bit platforms. This
caused problems when porting OpenZFS to Windows because its LLP64 memory
model made `unsigned long` a 32-bit type on 64-bit, which created the
undesireable situation that parameters that should accept 64-bit values
could not on 64-bit Windows.
Upon inspection, it turns out that the Linux kernel module parameter
interface is extensible, such that we are allowed to define our own
types. Rather than maintaining the original type change via hacks to to
continue shrinking module parameters on 32-bit Linux, we implement
support for 64-bit module parameters on Linux.
After doing a review of all 64-bit kernel parameters (found via the man
page and also proposed changes by Andrew Innes), the kernel module
parameters fell into a few groups:
Parameters that were originally 64-bit on Illumos:
* dbuf_cache_max_bytes
* dbuf_metadata_cache_max_bytes
* l2arc_feed_min_ms
* l2arc_feed_secs
* l2arc_headroom
* l2arc_headroom_boost
* l2arc_write_boost
* l2arc_write_max
* metaslab_aliquot
* metaslab_force_ganging
* zfetch_array_rd_sz
* zfs_arc_max
* zfs_arc_meta_limit
* zfs_arc_meta_min
* zfs_arc_min
* zfs_async_block_max_blocks
* zfs_condense_max_obsolete_bytes
* zfs_condense_min_mapping_bytes
* zfs_deadman_checktime_ms
* zfs_deadman_synctime_ms
* zfs_initialize_chunk_size
* zfs_initialize_value
* zfs_lua_max_instrlimit
* zfs_lua_max_memlimit
* zil_slog_bulk
Parameters that were originally 32-bit on Illumos:
* zfs_per_txg_dirty_frees_percent
Parameters that were originally `ssize_t` on Illumos:
* zfs_immediate_write_sz
Note that `ssize_t` is `int32_t` on 32-bit and `int64_t` on 64-bit. It
has been upgraded to 64-bit.
Parameters that were `long`/`unsigned long` because of Linux/FreeBSD
influence:
* l2arc_rebuild_blocks_min_l2size
* zfs_key_max_salt_uses
* zfs_max_log_walking
* zfs_max_logsm_summary_length
* zfs_metaslab_max_size_cache_sec
* zfs_min_metaslabs_to_flush
* zfs_multihost_interval
* zfs_unflushed_log_block_max
* zfs_unflushed_log_block_min
* zfs_unflushed_log_block_pct
* zfs_unflushed_max_mem_amt
* zfs_unflushed_max_mem_ppm
New parameters that do not exist in Illumos:
* l2arc_trim_ahead
* vdev_file_logical_ashift
* vdev_file_physical_ashift
* zfs_arc_dnode_limit
* zfs_arc_dnode_limit_percent
* zfs_arc_dnode_reduce_percent
* zfs_arc_meta_limit_percent
* zfs_arc_sys_free
* zfs_deadman_ziotime_ms
* zfs_delete_blocks
* zfs_history_output_max
* zfs_livelist_max_entries
* zfs_max_async_dedup_frees
* zfs_max_nvlist_src_size
* zfs_rebuild_max_segment
* zfs_rebuild_vdev_limit
* zfs_unflushed_log_txg_max
* zfs_vdev_max_auto_ashift
* zfs_vdev_min_auto_ashift
* zfs_vnops_read_chunk_size
* zvol_max_discard_blocks
Rather than clutter the lists with commentary, the module parameters
that need comments are repeated below.
A few parameters were defined in Linux/FreeBSD specific code, where the
use of ulong/long is not an issue for portability, so we leave them
alone:
* zfs_delete_blocks
* zfs_key_max_salt_uses
* zvol_max_discard_blocks
The documentation for a few parameters was found to be incorrect:
* zfs_deadman_checktime_ms - incorrectly documented as int
* zfs_delete_blocks - not documented as Linux only
* zfs_history_output_max - incorrectly documented as int
* zfs_vnops_read_chunk_size - incorrectly documented as long
* zvol_max_discard_blocks - incorrectly documented as ulong
The documentation for these has been fixed, alongside the changes to
document the switch to fixed width types.
In addition, several kernel module parameters were percentages or held
ashift values, so being 64-bit never made sense for them. They have been
downgraded to 32-bit:
* vdev_file_logical_ashift
* vdev_file_physical_ashift
* zfs_arc_dnode_limit_percent
* zfs_arc_dnode_reduce_percent
* zfs_arc_meta_limit_percent
* zfs_per_txg_dirty_frees_percent
* zfs_unflushed_log_block_pct
* zfs_vdev_max_auto_ashift
* zfs_vdev_min_auto_ashift
Of special note are `zfs_vdev_max_auto_ashift` and
`zfs_vdev_min_auto_ashift`, which were already defined as `uint64_t`,
and passed to the kernel as `ulong`. This is inherently buggy on big
endian 32-bit Linux, since the values would not be written to the
correct locations. 32-bit FreeBSD was unaffected because its sysctl code
correctly treated this as a `uint64_t`.
Lastly, a code comment suggests that `zfs_arc_sys_free` is
Linux-specific, but there is nothing to indicate to me that it is
Linux-specific. Nothing was done about that.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Original-patch-by: Andrew Innes <andrew.c12@gmail.com>
Original-patch-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13984
Closes #14004
2022-10-03 22:06:54 +03:00
|
|
|
static uint64_t zfs_history_output_max = 1024 * 1024;
|
2020-11-14 21:17:16 +03:00
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
uint_t zfs_fsyncer_key;
|
|
|
|
uint_t zfs_allow_log_key;
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
|
|
|
|
/* DATA_TYPE_ANY is used when zkey_type can vary. */
|
|
|
|
#define DATA_TYPE_ANY DATA_TYPE_UNKNOWN
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
typedef struct zfs_ioc_vec {
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_ioc_legacy_func_t *zvec_legacy_func;
|
2008-11-20 23:01:55 +03:00
|
|
|
zfs_ioc_func_t *zvec_func;
|
|
|
|
zfs_secpolicy_func_t *zvec_secpolicy;
|
2009-07-03 02:44:48 +04:00
|
|
|
zfs_ioc_namecheck_t zvec_namecheck;
|
2013-08-28 15:45:09 +04:00
|
|
|
boolean_t zvec_allow_log;
|
2010-08-27 01:24:34 +04:00
|
|
|
zfs_ioc_poolcheck_t zvec_pool_check;
|
2013-08-28 15:45:09 +04:00
|
|
|
boolean_t zvec_smush_outnvlist;
|
|
|
|
const char *zvec_name;
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
const zfs_ioc_key_t *zvec_nvl_keys;
|
|
|
|
size_t zvec_nvl_key_count;
|
2008-11-20 23:01:55 +03:00
|
|
|
} zfs_ioc_vec_t;
|
|
|
|
|
2009-07-03 02:44:48 +04:00
|
|
|
/* This array is indexed by zfs_userquota_prop_t */
|
|
|
|
static const char *userquota_perms[] = {
|
|
|
|
ZFS_DELEG_PERM_USERUSED,
|
|
|
|
ZFS_DELEG_PERM_USERQUOTA,
|
|
|
|
ZFS_DELEG_PERM_GROUPUSED,
|
|
|
|
ZFS_DELEG_PERM_GROUPQUOTA,
|
2016-10-04 21:46:10 +03:00
|
|
|
ZFS_DELEG_PERM_USEROBJUSED,
|
|
|
|
ZFS_DELEG_PERM_USEROBJQUOTA,
|
|
|
|
ZFS_DELEG_PERM_GROUPOBJUSED,
|
|
|
|
ZFS_DELEG_PERM_GROUPOBJQUOTA,
|
2018-02-14 01:54:54 +03:00
|
|
|
ZFS_DELEG_PERM_PROJECTUSED,
|
|
|
|
ZFS_DELEG_PERM_PROJECTQUOTA,
|
|
|
|
ZFS_DELEG_PERM_PROJECTOBJUSED,
|
|
|
|
ZFS_DELEG_PERM_PROJECTOBJQUOTA,
|
2009-07-03 02:44:48 +04:00
|
|
|
};
|
|
|
|
|
|
|
|
static int zfs_ioc_userspace_upgrade(zfs_cmd_t *zc);
|
2018-02-14 01:54:54 +03:00
|
|
|
static int zfs_ioc_id_quota_upgrade(zfs_cmd_t *zc);
|
2010-05-29 00:45:14 +04:00
|
|
|
static int zfs_check_settable(const char *name, nvpair_t *property,
|
|
|
|
cred_t *cr);
|
2020-10-03 03:44:10 +03:00
|
|
|
static int zfs_check_clearable(const char *dataset, nvlist_t *props,
|
2010-05-29 00:45:14 +04:00
|
|
|
nvlist_t **errors);
|
2008-12-03 23:09:06 +03:00
|
|
|
static int zfs_fill_zplprops_root(uint64_t, nvlist_t *, nvlist_t *,
|
|
|
|
boolean_t *);
|
2013-08-28 15:45:09 +04:00
|
|
|
int zfs_set_prop_nvlist(const char *, zprop_source_t, nvlist_t *, nvlist_t *);
|
|
|
|
static int get_nvlist(uint64_t nvl, uint64_t size, int iflag, nvlist_t **nvp);
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
static void
|
|
|
|
history_str_free(char *buf)
|
|
|
|
{
|
|
|
|
kmem_free(buf, HIS_MAX_RECORD_LEN);
|
|
|
|
}
|
|
|
|
|
|
|
|
static char *
|
|
|
|
history_str_get(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
char *buf;
|
|
|
|
|
2010-08-26 20:52:39 +04:00
|
|
|
if (zc->zc_history == 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
return (NULL);
|
|
|
|
|
2014-12-03 22:56:32 +03:00
|
|
|
buf = kmem_alloc(HIS_MAX_RECORD_LEN, KM_SLEEP);
|
2008-11-20 23:01:55 +03:00
|
|
|
if (copyinstr((void *)(uintptr_t)zc->zc_history,
|
|
|
|
buf, HIS_MAX_RECORD_LEN, NULL) != 0) {
|
|
|
|
history_str_free(buf);
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
buf[HIS_MAX_RECORD_LEN -1] = '\0';
|
|
|
|
|
|
|
|
return (buf);
|
|
|
|
}
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
/*
|
2013-06-11 21:12:34 +04:00
|
|
|
* Return non-zero if the spa version is less than requested version.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
|
|
|
static int
|
2008-12-03 23:09:06 +03:00
|
|
|
zfs_earlier_version(const char *name, int version)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
|
|
|
|
if (spa_open(name, &spa, FTAG) == 0) {
|
|
|
|
if (spa_version(spa) < version) {
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (1);
|
|
|
|
}
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2008-12-03 23:09:06 +03:00
|
|
|
* Return TRUE if the ZPL version is less than requested version.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
2008-12-03 23:09:06 +03:00
|
|
|
static boolean_t
|
|
|
|
zpl_earlier_version(const char *name, int version)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
|
|
|
objset_t *os;
|
2008-12-03 23:09:06 +03:00
|
|
|
boolean_t rc = B_TRUE;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
if (dmu_objset_hold(name, FTAG, &os) == 0) {
|
2008-12-03 23:09:06 +03:00
|
|
|
uint64_t zplversion;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
if (dmu_objset_type(os) != DMU_OST_ZFS) {
|
|
|
|
dmu_objset_rele(os, FTAG);
|
|
|
|
return (B_TRUE);
|
|
|
|
}
|
|
|
|
/* XXX reading from non-owned objset */
|
2008-12-03 23:09:06 +03:00
|
|
|
if (zfs_get_zplprop(os, ZFS_PROP_VERSION, &zplversion) == 0)
|
|
|
|
rc = zplversion < version;
|
2010-05-29 00:45:14 +04:00
|
|
|
dmu_objset_rele(os, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
return (rc);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
zfs_log_history(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
char *buf;
|
|
|
|
|
|
|
|
if ((buf = history_str_get(zc)) == NULL)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (spa_open(zc->zc_name, &spa, FTAG) == 0) {
|
|
|
|
if (spa_version(spa) >= SPA_VERSION_ZPOOL_HISTORY)
|
2013-08-28 15:45:09 +04:00
|
|
|
(void) spa_history_log(spa, buf);
|
2008-11-20 23:01:55 +03:00
|
|
|
spa_close(spa, FTAG);
|
|
|
|
}
|
|
|
|
history_str_free(buf);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Policy for top-level read operations (list pools). Requires no privileges,
|
|
|
|
* and can be used in the local zone, as there is no associated dataset.
|
|
|
|
*/
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_none(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) zc, (void) innvl, (void) cr;
|
2008-11-20 23:01:55 +03:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Policy for dataset read operations (list children, get statistics). Requires
|
|
|
|
* no privileges, but must be visible in the local zone.
|
|
|
|
*/
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_read(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl, (void) cr;
|
2008-11-20 23:01:55 +03:00
|
|
|
if (INGLOBALZONE(curproc) ||
|
|
|
|
zone_dataset_visible(zc->zc_name, NULL))
|
|
|
|
return (0);
|
|
|
|
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOENT));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2010-08-27 01:24:34 +04:00
|
|
|
zfs_dozonecheck_impl(const char *dataset, uint64_t zoned, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
|
|
|
int writable = 1;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The dataset must be visible by this zone -- check this first
|
|
|
|
* so they don't see EPERM on something they shouldn't know about.
|
|
|
|
*/
|
|
|
|
if (!INGLOBALZONE(curproc) &&
|
|
|
|
!zone_dataset_visible(dataset, &writable))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOENT));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
if (INGLOBALZONE(curproc)) {
|
|
|
|
/*
|
|
|
|
* If the fs is zoned, only root can access it from the
|
|
|
|
* global zone.
|
|
|
|
*/
|
|
|
|
if (secpolicy_zfs(cr) && zoned)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2008-11-20 23:01:55 +03:00
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* If we are in a local zone, the 'zoned' property must be set.
|
|
|
|
*/
|
|
|
|
if (!zoned)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
/* must be writable by this zone */
|
|
|
|
if (!writable)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2010-08-27 01:24:34 +04:00
|
|
|
static int
|
|
|
|
zfs_dozonecheck(const char *dataset, cred_t *cr)
|
|
|
|
{
|
|
|
|
uint64_t zoned;
|
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
if (dsl_prop_get_integer(dataset, zfs_prop_to_name(ZFS_PROP_ZONED),
|
|
|
|
&zoned, NULL))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOENT));
|
2010-08-27 01:24:34 +04:00
|
|
|
|
|
|
|
return (zfs_dozonecheck_impl(dataset, zoned, cr));
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_dozonecheck_ds(const char *dataset, dsl_dataset_t *ds, cred_t *cr)
|
|
|
|
{
|
|
|
|
uint64_t zoned;
|
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
if (dsl_prop_get_int_ds(ds, zfs_prop_to_name(ZFS_PROP_ZONED), &zoned))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOENT));
|
2010-08-27 01:24:34 +04:00
|
|
|
|
|
|
|
return (zfs_dozonecheck_impl(dataset, zoned, cr));
|
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
static int
|
2013-09-04 16:00:57 +04:00
|
|
|
zfs_secpolicy_write_perms_ds(const char *name, dsl_dataset_t *ds,
|
|
|
|
const char *perm, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
|
|
|
int error;
|
|
|
|
|
2011-11-17 22:14:36 +04:00
|
|
|
error = zfs_dozonecheck_ds(name, ds, cr);
|
2008-11-20 23:01:55 +03:00
|
|
|
if (error == 0) {
|
|
|
|
error = secpolicy_zfs(cr);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
2013-08-28 15:45:09 +04:00
|
|
|
error = dsl_deleg_access_impl(ds, perm, cr);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
static int
|
2013-09-04 16:00:57 +04:00
|
|
|
zfs_secpolicy_write_perms(const char *name, const char *perm, cred_t *cr)
|
2010-08-27 01:24:34 +04:00
|
|
|
{
|
|
|
|
int error;
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_t *ds;
|
|
|
|
dsl_pool_t *dp;
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2017-01-18 01:52:17 +03:00
|
|
|
/*
|
|
|
|
* First do a quick check for root in the global zone, which
|
|
|
|
* is allowed to do all write_perms. This ensures that zfs_ioc_*
|
|
|
|
* will get to handle nonexistent datasets.
|
|
|
|
*/
|
|
|
|
if (INGLOBALZONE(curproc) && secpolicy_zfs(cr) == 0)
|
|
|
|
return (0);
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_pool_hold(name, FTAG, &dp);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
error = dsl_dataset_hold(dp, name, FTAG, &ds);
|
|
|
|
if (error != 0) {
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
return (error);
|
2010-08-27 01:24:34 +04:00
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
error = zfs_secpolicy_write_perms_ds(name, ds, perm, cr);
|
|
|
|
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
2010-08-27 01:24:34 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
/*
|
|
|
|
* Policy for setting the security label property.
|
|
|
|
*
|
|
|
|
* Returns 0 for success, non-zero for access and other errors.
|
|
|
|
*/
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
2020-10-03 03:44:10 +03:00
|
|
|
zfs_set_slabel_policy(const char *name, const char *strval, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2010-08-26 22:43:42 +04:00
|
|
|
#ifdef HAVE_MLSLABEL
|
2010-05-29 00:45:14 +04:00
|
|
|
char ds_hexsl[MAXNAMELEN];
|
|
|
|
bslabel_t ds_sl, new_sl;
|
|
|
|
boolean_t new_default = FALSE;
|
|
|
|
uint64_t zoned;
|
|
|
|
int needed_priv = -1;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
/* First get the existing dataset label. */
|
|
|
|
error = dsl_prop_get(name, zfs_prop_to_name(ZFS_PROP_MLSLABEL),
|
|
|
|
1, sizeof (ds_hexsl), &ds_hexsl, NULL);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
if (strcasecmp(strval, ZFS_MLSLABEL_DEFAULT) == 0)
|
|
|
|
new_default = TRUE;
|
|
|
|
|
|
|
|
/* The label must be translatable */
|
|
|
|
if (!new_default && (hexstr_to_label(strval, &new_sl) != 0))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* In a non-global zone, disallow attempts to set a label that
|
|
|
|
* doesn't match that of the zone; otherwise no other checks
|
|
|
|
* are needed.
|
|
|
|
*/
|
|
|
|
if (!INGLOBALZONE(curproc)) {
|
|
|
|
if (new_default || !blequal(&new_sl, CR_SL(CRED())))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2010-05-29 00:45:14 +04:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* For global-zone datasets (i.e., those whose zoned property is
|
|
|
|
* "off", verify that the specified new label is valid for the
|
|
|
|
* global zone.
|
|
|
|
*/
|
|
|
|
if (dsl_prop_get_integer(name,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_ZONED), &zoned, NULL))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2010-05-29 00:45:14 +04:00
|
|
|
if (!zoned) {
|
|
|
|
if (zfs_check_global_label(name, strval) != 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the existing dataset label is nondefault, check if the
|
|
|
|
* dataset is mounted (label cannot be changed while mounted).
|
2017-03-08 03:21:37 +03:00
|
|
|
* Get the zfsvfs_t; if there isn't one, then the dataset isn't
|
2010-05-29 00:45:14 +04:00
|
|
|
* mounted (or isn't a dataset, doesn't exist, ...).
|
|
|
|
*/
|
|
|
|
if (strcasecmp(ds_hexsl, ZFS_MLSLABEL_DEFAULT) != 0) {
|
|
|
|
objset_t *os;
|
2020-10-03 03:44:10 +03:00
|
|
|
static const char *setsl_tag = "setsl_tag";
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Try to own the dataset; abort if there is any error,
|
|
|
|
* (e.g., already mounted, in use, or other error).
|
|
|
|
*/
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
error = dmu_objset_own(name, DMU_OST_ZFS, B_TRUE, B_TRUE,
|
2010-05-29 00:45:14 +04:00
|
|
|
setsl_tag, &os);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2010-05-29 00:45:14 +04:00
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dmu_objset_disown(os, B_TRUE, setsl_tag);
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
if (new_default) {
|
|
|
|
needed_priv = PRIV_FILE_DOWNGRADE_SL;
|
|
|
|
goto out_check;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (hexstr_to_label(strval, &new_sl) != 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
if (blstrictdom(&ds_sl, &new_sl))
|
|
|
|
needed_priv = PRIV_FILE_DOWNGRADE_SL;
|
|
|
|
else if (blstrictdom(&new_sl, &ds_sl))
|
|
|
|
needed_priv = PRIV_FILE_UPGRADE_SL;
|
|
|
|
} else {
|
|
|
|
/* dataset currently has a default label */
|
|
|
|
if (!new_default)
|
|
|
|
needed_priv = PRIV_FILE_UPGRADE_SL;
|
|
|
|
}
|
|
|
|
|
|
|
|
out_check:
|
|
|
|
if (needed_priv != -1)
|
|
|
|
return (PRIV_POLICY(cr, needed_priv, B_FALSE, EPERM, NULL));
|
|
|
|
return (0);
|
2010-08-26 22:43:42 +04:00
|
|
|
#else
|
2017-08-03 07:16:12 +03:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2010-08-26 22:43:42 +04:00
|
|
|
#endif /* HAVE_MLSLABEL */
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_secpolicy_setprop(const char *dsname, zfs_prop_t prop, nvpair_t *propval,
|
|
|
|
cred_t *cr)
|
|
|
|
{
|
2023-03-11 21:39:24 +03:00
|
|
|
const char *strval;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* Check permissions for special properties.
|
|
|
|
*/
|
|
|
|
switch (prop) {
|
2010-08-26 20:52:41 +04:00
|
|
|
default:
|
|
|
|
break;
|
2008-11-20 23:01:55 +03:00
|
|
|
case ZFS_PROP_ZONED:
|
|
|
|
/*
|
|
|
|
* Disallow setting of 'zoned' from within a local zone.
|
|
|
|
*/
|
|
|
|
if (!INGLOBALZONE(curproc))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2008-11-20 23:01:55 +03:00
|
|
|
break;
|
|
|
|
|
|
|
|
case ZFS_PROP_QUOTA:
|
2015-04-01 16:07:48 +03:00
|
|
|
case ZFS_PROP_FILESYSTEM_LIMIT:
|
|
|
|
case ZFS_PROP_SNAPSHOT_LIMIT:
|
2008-11-20 23:01:55 +03:00
|
|
|
if (!INGLOBALZONE(curproc)) {
|
|
|
|
uint64_t zoned;
|
2016-06-16 00:28:36 +03:00
|
|
|
char setpoint[ZFS_MAX_DATASET_NAME_LEN];
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* Unprivileged users are allowed to modify the
|
2015-04-01 16:07:48 +03:00
|
|
|
* limit on things *under* (ie. contained by)
|
2008-11-20 23:01:55 +03:00
|
|
|
* the thing they own.
|
|
|
|
*/
|
2019-09-27 20:46:28 +03:00
|
|
|
if (dsl_prop_get_integer(dsname,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_ZONED), &zoned, setpoint))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2010-05-29 00:45:14 +04:00
|
|
|
if (!zoned || strlen(dsname) <= strlen(setpoint))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
break;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
case ZFS_PROP_MLSLABEL:
|
|
|
|
if (!is_system_labeled())
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
if (nvpair_value_string(propval, &strval) == 0) {
|
|
|
|
int err;
|
|
|
|
|
|
|
|
err = zfs_set_slabel_policy(dsname, strval, CRED());
|
|
|
|
if (err != 0)
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
break;
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
return (zfs_secpolicy_write_perms(dsname, zfs_prop_to_name(prop), cr));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
static int
|
|
|
|
zfs_secpolicy_set_fsacl(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* permission to set permissions will be evaluated later in
|
|
|
|
* dsl_deleg_can_allow()
|
|
|
|
*/
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl;
|
|
|
|
return (zfs_dozonecheck(zc->zc_name, cr));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
static int
|
|
|
|
zfs_secpolicy_rollback(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl;
|
2010-05-29 00:45:14 +04:00
|
|
|
return (zfs_secpolicy_write_perms(zc->zc_name,
|
|
|
|
ZFS_DELEG_PERM_ROLLBACK, cr));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
static int
|
|
|
|
zfs_secpolicy_send(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl;
|
2010-08-27 01:24:34 +04:00
|
|
|
dsl_pool_t *dp;
|
|
|
|
dsl_dataset_t *ds;
|
2020-10-03 03:44:10 +03:00
|
|
|
const char *cp;
|
2010-08-27 01:24:34 +04:00
|
|
|
int error;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Generate the current snapshot name from the given objsetid, then
|
|
|
|
* use that name for the secpolicy/zone checks.
|
|
|
|
*/
|
|
|
|
cp = strchr(zc->zc_name, '@');
|
|
|
|
if (cp == NULL)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_pool_hold(zc->zc_name, FTAG, &dp);
|
|
|
|
if (error != 0)
|
2010-08-27 01:24:34 +04:00
|
|
|
return (error);
|
|
|
|
|
|
|
|
error = dsl_dataset_hold_obj(dp, zc->zc_sendobj, FTAG, &ds);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0) {
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
2010-08-27 01:24:34 +04:00
|
|
|
return (error);
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
2010-08-27 01:24:34 +04:00
|
|
|
|
|
|
|
dsl_dataset_name(ds, zc->zc_name);
|
|
|
|
|
|
|
|
error = zfs_secpolicy_write_perms_ds(zc->zc_name, ds,
|
|
|
|
ZFS_DELEG_PERM_SEND, cr);
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_rele(dp, FTAG);
|
2010-08-27 01:24:34 +04:00
|
|
|
|
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
static int
|
|
|
|
zfs_secpolicy_send_new(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl;
|
2013-08-28 15:45:09 +04:00
|
|
|
return (zfs_secpolicy_write_perms(zc->zc_name,
|
|
|
|
ZFS_DELEG_PERM_SEND, cr));
|
|
|
|
}
|
|
|
|
|
2020-06-15 21:30:37 +03:00
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_share(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) zc, (void) innvl, (void) cr;
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2009-07-03 02:44:48 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2020-06-15 21:30:37 +03:00
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_smb_acl(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2009-07-03 02:44:48 +04:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) zc, (void) innvl, (void) cr;
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_get_parent(const char *datasetname, char *parent, int parentsize)
|
|
|
|
{
|
|
|
|
char *cp;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Remove the @bla or /bla from the end of the name to get the parent.
|
|
|
|
*/
|
Cleanup: Switch to strlcpy from strncpy
Coverity found a bug in `zfs_secpolicy_create_clone()` where it is
possible for us to pass an unterminated string when `zfs_get_parent()`
returns an error. Upon inspection, it is clear that using `strlcpy()`
would have avoided this issue.
Looking at the codebase, there are a number of other uses of `strncpy()`
that are unsafe and even when it is used safely, switching to
`strlcpy()` would make the code more readable. Therefore, we switch all
instances where we use `strncpy()` to use `strlcpy()`.
Unfortunately, we do not portably have access to `strlcpy()` in
tests/zfs-tests/cmd/zfs_diff-socket.c because it does not link to
libspl. Modifying the appropriate Makefile.am to try to link to it
resulted in an error from the naming choice used in the file. Trying to
disable the check on the file did not work on FreeBSD because Clang
ignores `#undef` when a definition is provided by `-Dstrncpy(...)=...`.
We workaround that by explictly including the C file from libspl into
the test. This makes things build correctly everywhere.
We add a deprecation warning to `config/Rules.am` and suppress it on the
remaining `strncpy()` usage. `strlcpy()` is not portably avaliable in
tests/zfs-tests/cmd/zfs_diff-socket.c, so we use `snprintf()` there as a
substitute.
This patch does not tackle the related problem of `strcpy()`, which is
even less safe. Thankfully, a quick inspection found that it is used far
more correctly than strncpy() was used. A quick inspection did not find
any problems with `strcpy()` usage outside of zhack, but it should be
said that I only checked around 90% of them.
Lastly, some of the fields in kstat_t varied in size by 1 depending on
whether they were in userspace or in the kernel. The origin of this
discrepancy appears to be 04a479f7066ccdaa23a6546955303b172f4a6909 where
it was made for no apparent reason. It conflicts with the comment on
KSTAT_STRLEN, so we shrink the kernel field sizes to match the userspace
field sizes.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13876
2022-09-28 02:35:29 +03:00
|
|
|
(void) strlcpy(parent, datasetname, parentsize);
|
2008-11-20 23:01:55 +03:00
|
|
|
cp = strrchr(parent, '@');
|
|
|
|
if (cp != NULL) {
|
|
|
|
cp[0] = '\0';
|
|
|
|
} else {
|
|
|
|
cp = strrchr(parent, '/');
|
|
|
|
if (cp == NULL)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOENT));
|
2008-11-20 23:01:55 +03:00
|
|
|
cp[0] = '\0';
|
|
|
|
}
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
zfs_secpolicy_destroy_perms(const char *name, cred_t *cr)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
|
|
|
|
if ((error = zfs_secpolicy_write_perms(name,
|
|
|
|
ZFS_DELEG_PERM_MOUNT, cr)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
return (zfs_secpolicy_write_perms(name, ZFS_DELEG_PERM_DESTROY, cr));
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_destroy(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl;
|
2008-11-20 23:01:55 +03:00
|
|
|
return (zfs_secpolicy_destroy_perms(zc->zc_name, cr));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2010-05-29 00:45:14 +04:00
|
|
|
* Destroying snapshots with delegated permissions requires
|
2013-08-28 15:45:09 +04:00
|
|
|
* descendant mount and destroy permissions.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_destroy_snaps(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) zc;
|
2013-08-28 15:45:09 +04:00
|
|
|
nvlist_t *snaps;
|
|
|
|
nvpair_t *pair, *nextpair;
|
|
|
|
int error = 0;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
snaps = fnvlist_lookup_nvlist(innvl, "snaps");
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
for (pair = nvlist_next_nvpair(snaps, NULL); pair != NULL;
|
|
|
|
pair = nextpair) {
|
|
|
|
nextpair = nvlist_next_nvpair(snaps, pair);
|
2013-12-12 02:33:41 +04:00
|
|
|
error = zfs_secpolicy_destroy_perms(nvpair_name(pair), cr);
|
|
|
|
if (error == ENOENT) {
|
2013-08-28 15:45:09 +04:00
|
|
|
/*
|
|
|
|
* Ignore any snapshots that don't exist (we consider
|
|
|
|
* them "already destroyed"). Remove the name from the
|
|
|
|
* nvl here in case the snapshot is created between
|
|
|
|
* now and when we try to destroy it (in which case
|
|
|
|
* we don't want to destroy it since we haven't
|
|
|
|
* checked for permission).
|
|
|
|
*/
|
|
|
|
fnvlist_remove_nvpair(snaps, pair);
|
|
|
|
error = 0;
|
|
|
|
}
|
|
|
|
if (error != 0)
|
|
|
|
break;
|
|
|
|
}
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
zfs_secpolicy_rename_perms(const char *from, const char *to, cred_t *cr)
|
|
|
|
{
|
2016-06-16 00:28:36 +03:00
|
|
|
char parentname[ZFS_MAX_DATASET_NAME_LEN];
|
2008-11-20 23:01:55 +03:00
|
|
|
int error;
|
|
|
|
|
|
|
|
if ((error = zfs_secpolicy_write_perms(from,
|
|
|
|
ZFS_DELEG_PERM_RENAME, cr)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
if ((error = zfs_secpolicy_write_perms(from,
|
|
|
|
ZFS_DELEG_PERM_MOUNT, cr)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
if ((error = zfs_get_parent(to, parentname,
|
|
|
|
sizeof (parentname))) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
if ((error = zfs_secpolicy_write_perms(parentname,
|
|
|
|
ZFS_DELEG_PERM_CREATE, cr)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
if ((error = zfs_secpolicy_write_perms(parentname,
|
|
|
|
ZFS_DELEG_PERM_MOUNT, cr)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_rename(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl;
|
2008-11-20 23:01:55 +03:00
|
|
|
return (zfs_secpolicy_rename_perms(zc->zc_name, zc->zc_value, cr));
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_promote(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl;
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_t *dp;
|
|
|
|
dsl_dataset_t *clone;
|
2008-11-20 23:01:55 +03:00
|
|
|
int error;
|
|
|
|
|
|
|
|
error = zfs_secpolicy_write_perms(zc->zc_name,
|
|
|
|
ZFS_DELEG_PERM_PROMOTE, cr);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
error = dsl_pool_hold(zc->zc_name, FTAG, &dp);
|
|
|
|
if (error != 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_hold(dp, zc->zc_name, FTAG, &clone);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
if (error == 0) {
|
2016-06-16 00:28:36 +03:00
|
|
|
char parentname[ZFS_MAX_DATASET_NAME_LEN];
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_t *origin = NULL;
|
2008-11-20 23:01:55 +03:00
|
|
|
dsl_dir_t *dd;
|
2013-09-04 16:00:57 +04:00
|
|
|
dd = clone->ds_dir;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
error = dsl_dataset_hold_obj(dd->dd_pool,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(dd)->dd_origin_obj, FTAG, &origin);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0) {
|
|
|
|
dsl_dataset_rele(clone, FTAG);
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = zfs_secpolicy_write_perms_ds(zc->zc_name, clone,
|
2008-11-20 23:01:55 +03:00
|
|
|
ZFS_DELEG_PERM_MOUNT, cr);
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_name(origin, parentname);
|
|
|
|
if (error == 0) {
|
|
|
|
error = zfs_secpolicy_write_perms_ds(parentname, origin,
|
2008-11-20 23:01:55 +03:00
|
|
|
ZFS_DELEG_PERM_PROMOTE, cr);
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
|
|
|
dsl_dataset_rele(clone, FTAG);
|
|
|
|
dsl_dataset_rele(origin, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_rele(dp, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_recv(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl;
|
2008-11-20 23:01:55 +03:00
|
|
|
int error;
|
|
|
|
|
|
|
|
if ((error = zfs_secpolicy_write_perms(zc->zc_name,
|
|
|
|
ZFS_DELEG_PERM_RECEIVE, cr)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
if ((error = zfs_secpolicy_write_perms(zc->zc_name,
|
|
|
|
ZFS_DELEG_PERM_MOUNT, cr)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
return (zfs_secpolicy_write_perms(zc->zc_name,
|
|
|
|
ZFS_DELEG_PERM_CREATE, cr));
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
zfs_secpolicy_snapshot_perms(const char *name, cred_t *cr)
|
|
|
|
{
|
2010-05-29 00:45:14 +04:00
|
|
|
return (zfs_secpolicy_write_perms(name,
|
|
|
|
ZFS_DELEG_PERM_SNAPSHOT, cr));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
/*
|
|
|
|
* Check for permission to create each snapshot in the nvlist.
|
|
|
|
*/
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_snapshot(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) zc;
|
2013-08-28 15:45:09 +04:00
|
|
|
nvlist_t *snaps;
|
|
|
|
int error = 0;
|
|
|
|
nvpair_t *pair;
|
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
snaps = fnvlist_lookup_nvlist(innvl, "snaps");
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
for (pair = nvlist_next_nvpair(snaps, NULL); pair != NULL;
|
|
|
|
pair = nvlist_next_nvpair(snaps, pair)) {
|
2023-03-11 21:39:24 +03:00
|
|
|
char *name = (char *)nvpair_name(pair);
|
2013-08-28 15:45:09 +04:00
|
|
|
char *atp = strchr(name, '@');
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
if (atp == NULL) {
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EINVAL);
|
2013-08-28 15:45:09 +04:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
*atp = '\0';
|
|
|
|
error = zfs_secpolicy_snapshot_perms(name, cr);
|
|
|
|
*atp = '@';
|
|
|
|
if (error != 0)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2013-12-12 02:33:41 +04:00
|
|
|
/*
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
* Check for permission to create each bookmark in the nvlist.
|
2013-12-12 02:33:41 +04:00
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_secpolicy_bookmark(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) zc;
|
2013-12-12 02:33:41 +04:00
|
|
|
int error = 0;
|
|
|
|
|
2017-11-04 23:25:13 +03:00
|
|
|
for (nvpair_t *pair = nvlist_next_nvpair(innvl, NULL);
|
2013-12-12 02:33:41 +04:00
|
|
|
pair != NULL; pair = nvlist_next_nvpair(innvl, pair)) {
|
2023-03-11 21:39:24 +03:00
|
|
|
char *name = (char *)nvpair_name(pair);
|
2013-12-12 02:33:41 +04:00
|
|
|
char *hashp = strchr(name, '#');
|
|
|
|
|
|
|
|
if (hashp == NULL) {
|
|
|
|
error = SET_ERROR(EINVAL);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
*hashp = '\0';
|
|
|
|
error = zfs_secpolicy_write_perms(name,
|
|
|
|
ZFS_DELEG_PERM_BOOKMARK, cr);
|
|
|
|
*hashp = '#';
|
|
|
|
if (error != 0)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_secpolicy_destroy_bookmarks(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) zc;
|
2013-12-12 02:33:41 +04:00
|
|
|
nvpair_t *pair, *nextpair;
|
|
|
|
int error = 0;
|
|
|
|
|
|
|
|
for (pair = nvlist_next_nvpair(innvl, NULL); pair != NULL;
|
|
|
|
pair = nextpair) {
|
2023-03-11 21:39:24 +03:00
|
|
|
char *name = (char *)nvpair_name(pair);
|
2013-12-12 02:33:41 +04:00
|
|
|
char *hashp = strchr(name, '#');
|
|
|
|
nextpair = nvlist_next_nvpair(innvl, pair);
|
|
|
|
|
|
|
|
if (hashp == NULL) {
|
|
|
|
error = SET_ERROR(EINVAL);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
*hashp = '\0';
|
|
|
|
error = zfs_secpolicy_write_perms(name,
|
|
|
|
ZFS_DELEG_PERM_DESTROY, cr);
|
|
|
|
*hashp = '#';
|
|
|
|
if (error == ENOENT) {
|
|
|
|
/*
|
|
|
|
* Ignore any filesystems that don't exist (we consider
|
|
|
|
* their bookmarks "already destroyed"). Remove
|
|
|
|
* the name from the nvl here in case the filesystem
|
|
|
|
* is created between now and when we try to destroy
|
|
|
|
* the bookmark (in which case we don't want to
|
|
|
|
* destroy it since we haven't checked for permission).
|
|
|
|
*/
|
|
|
|
fnvlist_remove_nvpair(innvl, pair);
|
|
|
|
error = 0;
|
|
|
|
}
|
|
|
|
if (error != 0)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
static int
|
|
|
|
zfs_secpolicy_log_history(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) zc, (void) innvl, (void) cr;
|
2013-08-28 15:45:09 +04:00
|
|
|
/*
|
|
|
|
* Even root must have a proper TSD so that we know what pool
|
|
|
|
* to log to.
|
|
|
|
*/
|
|
|
|
if (tsd_get(zfs_allow_log_key) == NULL)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2013-08-28 15:45:09 +04:00
|
|
|
return (0);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_create_clone(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2023-03-11 21:39:24 +03:00
|
|
|
char parentname[ZFS_MAX_DATASET_NAME_LEN];
|
|
|
|
int error;
|
|
|
|
const char *origin;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
if ((error = zfs_get_parent(zc->zc_name, parentname,
|
|
|
|
sizeof (parentname))) != 0)
|
|
|
|
return (error);
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
if (nvlist_lookup_string(innvl, "origin", &origin) == 0 &&
|
|
|
|
(error = zfs_secpolicy_write_perms(origin,
|
|
|
|
ZFS_DELEG_PERM_CLONE, cr)) != 0)
|
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
if ((error = zfs_secpolicy_write_perms(parentname,
|
|
|
|
ZFS_DELEG_PERM_CREATE, cr)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
return (zfs_secpolicy_write_perms(parentname,
|
|
|
|
ZFS_DELEG_PERM_MOUNT, cr));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Policy for pool operations - create/destroy pools, add vdevs, etc. Requires
|
|
|
|
* SYS_CONFIG privilege, which is not available in a local zone.
|
|
|
|
*/
|
2019-09-27 20:46:28 +03:00
|
|
|
int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_config(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) zc, (void) innvl;
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
if (secpolicy_sys_config(cr, B_FALSE) != 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EPERM));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2010-08-27 01:24:34 +04:00
|
|
|
/*
|
|
|
|
* Policy for object to name lookups.
|
|
|
|
*/
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_diff(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2010-08-27 01:24:34 +04:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl;
|
2010-08-27 01:24:34 +04:00
|
|
|
int error;
|
|
|
|
|
Cleanup of dead code suggested by Clang Static Analyzer (#14380)
I recently gained the ability to run Clang's static analyzer on the
linux kernel modules via a few hacks. This extended coverage to code
that was previously missed since Clang's static analyzer only looked at
code that we built in userspace. Running it against the Linux kernel
modules built from my local branch produced a total of 72 reports
against my local branch. Of those, 50 were reports of logic errors and
22 were reports of dead code. Since we already had cleaned up all of
the previous dead code reports, I felt it would be a good next step to
clean up these dead code reports. Clang did a further breakdown of the
dead code reports into:
Dead assignment 15
Dead increment 2
Dead nested assignment 5
The benefit of cleaning these up, especially in the case of dead nested
assignment, is that they can expose places where our error handling is
incorrect. A number of them were fairly straight forward. However
several were not:
In vdev_disk_physio_completion(), not only were we not using the return
value from the static function vdev_disk_dio_put(), but nothing used it,
so I changed it to return void and removed the existing (void) cast in
the other area where we call it in addition to no longer storing it to a
stack value.
In FSE_createDTable(), the function is dead code. Its helper function
FSE_freeDTable() is also dead code, as are the CPP definitions in
`module/zstd/include/zstd_compat_wrapper.h`. We just delete it all.
In zfs_zevent_wait(), we have an optimization opportunity. cv_wait_sig()
returns 0 if there are waiting signals and 1 if there are none. The
Linux SPL version literally returns `signal_pending(current) ? 0 : 1)`
and FreeBSD implements the same semantics, we can just do
`!cv_wait_sig()` in place of `signal_pending(current)` to avoid
unnecessarily calling it again.
zfs_setattr() on FreeBSD version did not have error handling issue
because the code was removed entirely from FreeBSD version. The error is
from updating the attribute directory's files. After some thought, I
decided to propapage errors on it to userspace.
In zfs_secpolicy_tmp_snapshot(), we ignore a lack of permission from the
first check in favor of checking three other permissions. I assume this
is intentional.
In zfs_create_fs(), the return value of zap_update() was not checked
despite setting an important version number. I see no backward
compatibility reason to permit failures, so we add an assertion to catch
failures. Interestingly, Linux is still using ASSERT(error == 0) from
OpenSolaris while FreeBSD has switched to the improved ASSERT0(error)
from illumos, although illumos has yet to adopt it here. ASSERT(error ==
0) was used on Linux while ASSERT0(error) was used on FreeBSD since the
entire file needs conversion and that should be the subject of
another patch.
dnode_move()'s issue was caused by us not having implemented
POINTER_IS_VALID() on Linux. We have a stub in
`include/os/linux/spl/sys/kmem_cache.h` for it, when it really should be
in `include/os/linux/spl/sys/kmem.h` to be consistent with
Illumos/OpenSolaris. FreeBSD put both `POINTER_IS_VALID()` and
`POINTER_INVALIDATE()` in `include/os/freebsd/spl/sys/kmem.h`, so we
copy what it did.
Whenever a report was in platform-specific code, I checked the FreeBSD
version to see if it also applied to FreeBSD, but it was only relevant a
few times.
Lastly, the patch that enabled Clang's static analyzer to be run on the
Linux kernel modules needs more work before it can be put into a PR. I
plan to do that in the future as part of the on-going static analysis
work that I am doing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14380
2023-01-17 20:57:12 +03:00
|
|
|
if (secpolicy_sys_config(cr, B_FALSE) == 0)
|
2010-08-27 01:24:34 +04:00
|
|
|
return (0);
|
|
|
|
|
|
|
|
error = zfs_secpolicy_write_perms(zc->zc_name, ZFS_DELEG_PERM_DIFF, cr);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* Policy for fault injection. Requires all privileges.
|
|
|
|
*/
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_inject(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) zc, (void) innvl;
|
2008-11-20 23:01:55 +03:00
|
|
|
return (secpolicy_zinject(cr));
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_inherit_prop(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl;
|
2008-11-20 23:01:55 +03:00
|
|
|
zfs_prop_t prop = zfs_name_to_prop(zc->zc_value);
|
|
|
|
|
2022-06-14 21:27:53 +03:00
|
|
|
if (prop == ZPROP_USERPROP) {
|
2008-11-20 23:01:55 +03:00
|
|
|
if (!zfs_prop_user(zc->zc_value))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2008-11-20 23:01:55 +03:00
|
|
|
return (zfs_secpolicy_write_perms(zc->zc_name,
|
|
|
|
ZFS_DELEG_PERM_USERPROP, cr));
|
|
|
|
} else {
|
2010-05-29 00:45:14 +04:00
|
|
|
return (zfs_secpolicy_setprop(zc->zc_name, prop,
|
|
|
|
NULL, cr));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2009-07-03 02:44:48 +04:00
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_userspace_one(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2009-07-03 02:44:48 +04:00
|
|
|
{
|
2013-08-28 15:45:09 +04:00
|
|
|
int err = zfs_secpolicy_read(zc, innvl, cr);
|
2009-07-03 02:44:48 +04:00
|
|
|
if (err)
|
|
|
|
return (err);
|
|
|
|
|
|
|
|
if (zc->zc_objset_type >= ZFS_NUM_USERQUOTA_PROPS)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2009-07-03 02:44:48 +04:00
|
|
|
|
|
|
|
if (zc->zc_value[0] == 0) {
|
|
|
|
/*
|
|
|
|
* They are asking about a posix uid/gid. If it's
|
|
|
|
* themself, allow it.
|
|
|
|
*/
|
|
|
|
if (zc->zc_objset_type == ZFS_PROP_USERUSED ||
|
2016-10-04 21:46:10 +03:00
|
|
|
zc->zc_objset_type == ZFS_PROP_USERQUOTA ||
|
|
|
|
zc->zc_objset_type == ZFS_PROP_USEROBJUSED ||
|
|
|
|
zc->zc_objset_type == ZFS_PROP_USEROBJQUOTA) {
|
2009-07-03 02:44:48 +04:00
|
|
|
if (zc->zc_guid == crgetuid(cr))
|
|
|
|
return (0);
|
2018-02-14 01:54:54 +03:00
|
|
|
} else if (zc->zc_objset_type == ZFS_PROP_GROUPUSED ||
|
|
|
|
zc->zc_objset_type == ZFS_PROP_GROUPQUOTA ||
|
|
|
|
zc->zc_objset_type == ZFS_PROP_GROUPOBJUSED ||
|
|
|
|
zc->zc_objset_type == ZFS_PROP_GROUPOBJQUOTA) {
|
2009-07-03 02:44:48 +04:00
|
|
|
if (groupmember(zc->zc_guid, cr))
|
|
|
|
return (0);
|
|
|
|
}
|
2018-02-14 01:54:54 +03:00
|
|
|
/* else is for project quota/used */
|
2009-07-03 02:44:48 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
return (zfs_secpolicy_write_perms(zc->zc_name,
|
|
|
|
userquota_perms[zc->zc_objset_type], cr));
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_userspace_many(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2009-07-03 02:44:48 +04:00
|
|
|
{
|
2013-08-28 15:45:09 +04:00
|
|
|
int err = zfs_secpolicy_read(zc, innvl, cr);
|
2009-07-03 02:44:48 +04:00
|
|
|
if (err)
|
|
|
|
return (err);
|
|
|
|
|
|
|
|
if (zc->zc_objset_type >= ZFS_NUM_USERQUOTA_PROPS)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2009-07-03 02:44:48 +04:00
|
|
|
|
|
|
|
return (zfs_secpolicy_write_perms(zc->zc_name,
|
|
|
|
userquota_perms[zc->zc_objset_type], cr));
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_userspace_upgrade(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2009-07-03 02:44:48 +04:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl;
|
2010-05-29 00:45:14 +04:00
|
|
|
return (zfs_secpolicy_setprop(zc->zc_name, ZFS_PROP_VERSION,
|
|
|
|
NULL, cr));
|
2009-07-03 02:44:48 +04:00
|
|
|
}
|
|
|
|
|
2009-08-18 22:43:27 +04:00
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_hold(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2009-08-18 22:43:27 +04:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) zc;
|
2013-09-04 16:00:57 +04:00
|
|
|
nvpair_t *pair;
|
|
|
|
nvlist_t *holds;
|
|
|
|
int error;
|
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
holds = fnvlist_lookup_nvlist(innvl, "holds");
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
for (pair = nvlist_next_nvpair(holds, NULL); pair != NULL;
|
|
|
|
pair = nvlist_next_nvpair(holds, pair)) {
|
2016-06-16 00:28:36 +03:00
|
|
|
char fsname[ZFS_MAX_DATASET_NAME_LEN];
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dmu_fsname(nvpair_name(pair), fsname);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
error = zfs_secpolicy_write_perms(fsname,
|
|
|
|
ZFS_DELEG_PERM_HOLD, cr);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
return (0);
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_release(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2009-08-18 22:43:27 +04:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) zc;
|
2013-09-04 16:00:57 +04:00
|
|
|
nvpair_t *pair;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
for (pair = nvlist_next_nvpair(innvl, NULL); pair != NULL;
|
|
|
|
pair = nvlist_next_nvpair(innvl, pair)) {
|
2016-06-16 00:28:36 +03:00
|
|
|
char fsname[ZFS_MAX_DATASET_NAME_LEN];
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dmu_fsname(nvpair_name(pair), fsname);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
error = zfs_secpolicy_write_perms(fsname,
|
|
|
|
ZFS_DELEG_PERM_RELEASE, cr);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
return (0);
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
2010-08-27 01:24:34 +04:00
|
|
|
/*
|
|
|
|
* Policy for allowing temporary snapshots to be taken or released
|
|
|
|
*/
|
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_secpolicy_tmp_snapshot(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
2010-08-27 01:24:34 +04:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* A temporary snapshot is the same as a snapshot,
|
|
|
|
* hold, destroy and release all rolled into one.
|
|
|
|
* Delegated diff alone is sufficient that we allow this.
|
|
|
|
*/
|
|
|
|
int error;
|
|
|
|
|
Cleanup of dead code suggested by Clang Static Analyzer (#14380)
I recently gained the ability to run Clang's static analyzer on the
linux kernel modules via a few hacks. This extended coverage to code
that was previously missed since Clang's static analyzer only looked at
code that we built in userspace. Running it against the Linux kernel
modules built from my local branch produced a total of 72 reports
against my local branch. Of those, 50 were reports of logic errors and
22 were reports of dead code. Since we already had cleaned up all of
the previous dead code reports, I felt it would be a good next step to
clean up these dead code reports. Clang did a further breakdown of the
dead code reports into:
Dead assignment 15
Dead increment 2
Dead nested assignment 5
The benefit of cleaning these up, especially in the case of dead nested
assignment, is that they can expose places where our error handling is
incorrect. A number of them were fairly straight forward. However
several were not:
In vdev_disk_physio_completion(), not only were we not using the return
value from the static function vdev_disk_dio_put(), but nothing used it,
so I changed it to return void and removed the existing (void) cast in
the other area where we call it in addition to no longer storing it to a
stack value.
In FSE_createDTable(), the function is dead code. Its helper function
FSE_freeDTable() is also dead code, as are the CPP definitions in
`module/zstd/include/zstd_compat_wrapper.h`. We just delete it all.
In zfs_zevent_wait(), we have an optimization opportunity. cv_wait_sig()
returns 0 if there are waiting signals and 1 if there are none. The
Linux SPL version literally returns `signal_pending(current) ? 0 : 1)`
and FreeBSD implements the same semantics, we can just do
`!cv_wait_sig()` in place of `signal_pending(current)` to avoid
unnecessarily calling it again.
zfs_setattr() on FreeBSD version did not have error handling issue
because the code was removed entirely from FreeBSD version. The error is
from updating the attribute directory's files. After some thought, I
decided to propapage errors on it to userspace.
In zfs_secpolicy_tmp_snapshot(), we ignore a lack of permission from the
first check in favor of checking three other permissions. I assume this
is intentional.
In zfs_create_fs(), the return value of zap_update() was not checked
despite setting an important version number. I see no backward
compatibility reason to permit failures, so we add an assertion to catch
failures. Interestingly, Linux is still using ASSERT(error == 0) from
OpenSolaris while FreeBSD has switched to the improved ASSERT0(error)
from illumos, although illumos has yet to adopt it here. ASSERT(error ==
0) was used on Linux while ASSERT0(error) was used on FreeBSD since the
entire file needs conversion and that should be the subject of
another patch.
dnode_move()'s issue was caused by us not having implemented
POINTER_IS_VALID() on Linux. We have a stub in
`include/os/linux/spl/sys/kmem_cache.h` for it, when it really should be
in `include/os/linux/spl/sys/kmem.h` to be consistent with
Illumos/OpenSolaris. FreeBSD put both `POINTER_IS_VALID()` and
`POINTER_INVALIDATE()` in `include/os/freebsd/spl/sys/kmem.h`, so we
copy what it did.
Whenever a report was in platform-specific code, I checked the FreeBSD
version to see if it also applied to FreeBSD, but it was only relevant a
few times.
Lastly, the patch that enabled Clang's static analyzer to be run on the
Linux kernel modules needs more work before it can be put into a PR. I
plan to do that in the future as part of the on-going static analysis
work that I am doing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14380
2023-01-17 20:57:12 +03:00
|
|
|
if (zfs_secpolicy_write_perms(zc->zc_name,
|
|
|
|
ZFS_DELEG_PERM_DIFF, cr) == 0)
|
2010-08-27 01:24:34 +04:00
|
|
|
return (0);
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
error = zfs_secpolicy_snapshot_perms(zc->zc_name, cr);
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
|
|
|
|
if (innvl != NULL) {
|
|
|
|
if (error == 0)
|
|
|
|
error = zfs_secpolicy_hold(zc, innvl, cr);
|
|
|
|
if (error == 0)
|
|
|
|
error = zfs_secpolicy_release(zc, innvl, cr);
|
|
|
|
if (error == 0)
|
|
|
|
error = zfs_secpolicy_destroy(zc, innvl, cr);
|
|
|
|
}
|
2010-08-27 01:24:34 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
static int
|
|
|
|
zfs_secpolicy_load_key(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
|
|
|
{
|
|
|
|
return (zfs_secpolicy_write_perms(zc->zc_name,
|
|
|
|
ZFS_DELEG_PERM_LOAD_KEY, cr));
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_secpolicy_change_key(zfs_cmd_t *zc, nvlist_t *innvl, cred_t *cr)
|
|
|
|
{
|
|
|
|
return (zfs_secpolicy_write_perms(zc->zc_name,
|
|
|
|
ZFS_DELEG_PERM_CHANGE_KEY, cr));
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* Returns the nvlist as specified by the user in the zfs_cmd_t.
|
|
|
|
*/
|
|
|
|
static int
|
2009-07-03 02:44:48 +04:00
|
|
|
get_nvlist(uint64_t nvl, uint64_t size, int iflag, nvlist_t **nvp)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
|
|
|
char *packed;
|
|
|
|
int error;
|
|
|
|
nvlist_t *list = NULL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Read in and unpack the user-supplied nvlist.
|
|
|
|
*/
|
|
|
|
if (size == 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2015-02-05 23:43:37 +03:00
|
|
|
packed = vmem_alloc(size, KM_SLEEP);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
Cleanup of dead code suggested by Clang Static Analyzer (#14380)
I recently gained the ability to run Clang's static analyzer on the
linux kernel modules via a few hacks. This extended coverage to code
that was previously missed since Clang's static analyzer only looked at
code that we built in userspace. Running it against the Linux kernel
modules built from my local branch produced a total of 72 reports
against my local branch. Of those, 50 were reports of logic errors and
22 were reports of dead code. Since we already had cleaned up all of
the previous dead code reports, I felt it would be a good next step to
clean up these dead code reports. Clang did a further breakdown of the
dead code reports into:
Dead assignment 15
Dead increment 2
Dead nested assignment 5
The benefit of cleaning these up, especially in the case of dead nested
assignment, is that they can expose places where our error handling is
incorrect. A number of them were fairly straight forward. However
several were not:
In vdev_disk_physio_completion(), not only were we not using the return
value from the static function vdev_disk_dio_put(), but nothing used it,
so I changed it to return void and removed the existing (void) cast in
the other area where we call it in addition to no longer storing it to a
stack value.
In FSE_createDTable(), the function is dead code. Its helper function
FSE_freeDTable() is also dead code, as are the CPP definitions in
`module/zstd/include/zstd_compat_wrapper.h`. We just delete it all.
In zfs_zevent_wait(), we have an optimization opportunity. cv_wait_sig()
returns 0 if there are waiting signals and 1 if there are none. The
Linux SPL version literally returns `signal_pending(current) ? 0 : 1)`
and FreeBSD implements the same semantics, we can just do
`!cv_wait_sig()` in place of `signal_pending(current)` to avoid
unnecessarily calling it again.
zfs_setattr() on FreeBSD version did not have error handling issue
because the code was removed entirely from FreeBSD version. The error is
from updating the attribute directory's files. After some thought, I
decided to propapage errors on it to userspace.
In zfs_secpolicy_tmp_snapshot(), we ignore a lack of permission from the
first check in favor of checking three other permissions. I assume this
is intentional.
In zfs_create_fs(), the return value of zap_update() was not checked
despite setting an important version number. I see no backward
compatibility reason to permit failures, so we add an assertion to catch
failures. Interestingly, Linux is still using ASSERT(error == 0) from
OpenSolaris while FreeBSD has switched to the improved ASSERT0(error)
from illumos, although illumos has yet to adopt it here. ASSERT(error ==
0) was used on Linux while ASSERT0(error) was used on FreeBSD since the
entire file needs conversion and that should be the subject of
another patch.
dnode_move()'s issue was caused by us not having implemented
POINTER_IS_VALID() on Linux. We have a stub in
`include/os/linux/spl/sys/kmem_cache.h` for it, when it really should be
in `include/os/linux/spl/sys/kmem.h` to be consistent with
Illumos/OpenSolaris. FreeBSD put both `POINTER_IS_VALID()` and
`POINTER_INVALIDATE()` in `include/os/freebsd/spl/sys/kmem.h`, so we
copy what it did.
Whenever a report was in platform-specific code, I checked the FreeBSD
version to see if it also applied to FreeBSD, but it was only relevant a
few times.
Lastly, the patch that enabled Clang's static analyzer to be run on the
Linux kernel modules needs more work before it can be put into a PR. I
plan to do that in the future as part of the on-going static analysis
work that I am doing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14380
2023-01-17 20:57:12 +03:00
|
|
|
if (ddi_copyin((void *)(uintptr_t)nvl, packed, size, iflag) != 0) {
|
2015-02-05 23:43:37 +03:00
|
|
|
vmem_free(packed, size);
|
2015-07-03 19:20:17 +03:00
|
|
|
return (SET_ERROR(EFAULT));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
if ((error = nvlist_unpack(packed, size, &list, 0)) != 0) {
|
2015-02-05 23:43:37 +03:00
|
|
|
vmem_free(packed, size);
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2015-02-05 23:43:37 +03:00
|
|
|
vmem_free(packed, size);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
*nvp = list;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
/*
|
|
|
|
* Reduce the size of this nvlist until it can be serialized in 'max' bytes.
|
|
|
|
* Entries will be removed from the end of the nvlist, and one int32 entry
|
|
|
|
* named "N_MORE_ERRORS" will be added indicating how many entries were
|
|
|
|
* removed.
|
|
|
|
*/
|
2010-05-29 00:45:14 +04:00
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
nvlist_smush(nvlist_t *errors, size_t max)
|
2010-05-29 00:45:14 +04:00
|
|
|
{
|
|
|
|
size_t size;
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
size = fnvlist_size(errors);
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
if (size > max) {
|
2010-05-29 00:45:14 +04:00
|
|
|
nvpair_t *more_errors;
|
|
|
|
int n = 0;
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
if (max < 1024)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOMEM));
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
fnvlist_add_int32(errors, ZPROP_N_MORE_ERRORS, 0);
|
|
|
|
more_errors = nvlist_prev_nvpair(errors, NULL);
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
do {
|
2013-08-28 15:45:09 +04:00
|
|
|
nvpair_t *pair = nvlist_prev_nvpair(errors,
|
2010-05-29 00:45:14 +04:00
|
|
|
more_errors);
|
2013-08-28 15:45:09 +04:00
|
|
|
fnvlist_remove_nvpair(errors, pair);
|
2010-05-29 00:45:14 +04:00
|
|
|
n++;
|
2013-08-28 15:45:09 +04:00
|
|
|
size = fnvlist_size(errors);
|
|
|
|
} while (size > max);
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
fnvlist_remove_nvpair(errors, more_errors);
|
|
|
|
fnvlist_add_int32(errors, ZPROP_N_MORE_ERRORS, n);
|
|
|
|
ASSERT3U(fnvlist_size(errors), <=, max);
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
|
|
|
put_nvlist(zfs_cmd_t *zc, nvlist_t *nvl)
|
|
|
|
{
|
|
|
|
char *packed = NULL;
|
2010-05-29 00:45:14 +04:00
|
|
|
int error = 0;
|
2008-11-20 23:01:55 +03:00
|
|
|
size_t size;
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
size = fnvlist_size(nvl);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
if (size > zc->zc_nvlist_dst_size) {
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(ENOMEM);
|
2008-11-20 23:01:55 +03:00
|
|
|
} else {
|
2013-08-28 15:45:09 +04:00
|
|
|
packed = fnvlist_pack(nvl, &size);
|
2010-05-29 00:45:14 +04:00
|
|
|
if (ddi_copyout(packed, (void *)(uintptr_t)zc->zc_nvlist_dst,
|
|
|
|
size, zc->zc_iflags) != 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EFAULT);
|
2013-08-28 15:45:09 +04:00
|
|
|
fnvlist_pack_free(packed, size);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
zc->zc_nvlist_dst_size = size;
|
2013-08-28 15:45:09 +04:00
|
|
|
zc->zc_nvlist_dst_filled = B_TRUE;
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
int
|
|
|
|
getzfsvfs_impl(objset_t *os, zfsvfs_t **zfvp)
|
2009-07-03 02:44:48 +04:00
|
|
|
{
|
2018-02-08 19:16:23 +03:00
|
|
|
int error = 0;
|
2010-05-29 00:45:14 +04:00
|
|
|
if (dmu_objset_type(os) != DMU_OST_ZFS) {
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2009-07-03 02:44:48 +04:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
mutex_enter(&os->os_user_ptr_lock);
|
2017-03-08 03:21:37 +03:00
|
|
|
*zfvp = dmu_objset_get_user(os);
|
2016-07-09 02:59:54 +03:00
|
|
|
/* bump s_active only when non-zero to prevent umount race */
|
2019-09-27 20:46:28 +03:00
|
|
|
error = zfs_vfs_ref(zfvp);
|
2010-05-29 00:45:14 +04:00
|
|
|
mutex_exit(&os->os_user_ptr_lock);
|
2018-02-08 19:16:23 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2018-02-08 19:32:45 +03:00
|
|
|
int
|
2018-02-08 19:16:23 +03:00
|
|
|
getzfsvfs(const char *dsname, zfsvfs_t **zfvp)
|
|
|
|
{
|
|
|
|
objset_t *os;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
error = dmu_objset_hold(dsname, FTAG, &os);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
error = getzfsvfs_impl(os, zfvp);
|
2010-05-29 00:45:14 +04:00
|
|
|
dmu_objset_rele(os, FTAG);
|
2009-07-03 02:44:48 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2017-03-08 03:21:37 +03:00
|
|
|
* Find a zfsvfs_t for a mounted filesystem, or create our own, in which
|
2011-05-19 22:44:07 +04:00
|
|
|
* case its z_sb will be NULL, and it will be opened as the owner.
|
2012-12-14 03:24:15 +04:00
|
|
|
* If 'writer' is set, the z_teardown_lock will be held for RW_WRITER,
|
|
|
|
* which prevents all inode ops from running.
|
2009-07-03 02:44:48 +04:00
|
|
|
*/
|
|
|
|
static int
|
2022-04-19 21:49:30 +03:00
|
|
|
zfsvfs_hold(const char *name, const void *tag, zfsvfs_t **zfvp,
|
|
|
|
boolean_t writer)
|
2009-07-03 02:44:48 +04:00
|
|
|
{
|
|
|
|
int error = 0;
|
|
|
|
|
2017-03-09 01:56:19 +03:00
|
|
|
if (getzfsvfs(name, zfvp) != 0)
|
2018-02-21 03:27:31 +03:00
|
|
|
error = zfsvfs_create(name, B_FALSE, zfvp);
|
2009-07-03 02:44:48 +04:00
|
|
|
if (error == 0) {
|
2020-11-05 01:23:48 +03:00
|
|
|
if (writer)
|
|
|
|
ZFS_TEARDOWN_ENTER_WRITE(*zfvp, tag);
|
|
|
|
else
|
|
|
|
ZFS_TEARDOWN_ENTER_READ(*zfvp, tag);
|
2017-03-08 03:21:37 +03:00
|
|
|
if ((*zfvp)->z_unmounted) {
|
2009-07-03 02:44:48 +04:00
|
|
|
/*
|
|
|
|
* XXX we could probably try again, since the unmounting
|
|
|
|
* thread should be just about to disassociate the
|
2017-03-08 03:21:37 +03:00
|
|
|
* objset from the zfsvfs.
|
2009-07-03 02:44:48 +04:00
|
|
|
*/
|
2020-11-05 01:23:48 +03:00
|
|
|
ZFS_TEARDOWN_EXIT(*zfvp, tag);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EBUSY));
|
2009-07-03 02:44:48 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
2022-04-19 21:49:30 +03:00
|
|
|
zfsvfs_rele(zfsvfs_t *zfsvfs, const void *tag)
|
2009-07-03 02:44:48 +04:00
|
|
|
{
|
2020-11-05 01:23:48 +03:00
|
|
|
ZFS_TEARDOWN_EXIT(zfsvfs, tag);
|
2009-07-03 02:44:48 +04:00
|
|
|
|
2019-12-10 20:21:07 +03:00
|
|
|
if (zfs_vfs_held(zfsvfs)) {
|
|
|
|
zfs_vfs_rele(zfsvfs);
|
2009-07-03 02:44:48 +04:00
|
|
|
} else {
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dmu_objset_disown(zfsvfs->z_os, B_TRUE, zfsvfs);
|
2017-03-09 01:56:19 +03:00
|
|
|
zfsvfs_free(zfsvfs);
|
2009-07-03 02:44:48 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_pool_create(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
nvlist_t *config, *props = NULL;
|
2008-12-03 23:09:06 +03:00
|
|
|
nvlist_t *rootprops = NULL;
|
|
|
|
nvlist_t *zplprops = NULL;
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dsl_crypto_params_t *dcp = NULL;
|
2020-10-03 03:44:10 +03:00
|
|
|
const char *spa_name = zc->zc_name;
|
2019-05-29 01:19:50 +03:00
|
|
|
boolean_t unload_wkey = B_TRUE;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-08-26 20:52:42 +04:00
|
|
|
if ((error = get_nvlist(zc->zc_nvlist_conf, zc->zc_nvlist_conf_size,
|
|
|
|
zc->zc_iflags, &config)))
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
|
|
|
|
if (zc->zc_nvlist_src_size != 0 && (error =
|
2009-07-03 02:44:48 +04:00
|
|
|
get_nvlist(zc->zc_nvlist_src, zc->zc_nvlist_src_size,
|
|
|
|
zc->zc_iflags, &props))) {
|
2008-11-20 23:01:55 +03:00
|
|
|
nvlist_free(config);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
if (props) {
|
|
|
|
nvlist_t *nvl = NULL;
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
nvlist_t *hidden_args = NULL;
|
2008-12-03 23:09:06 +03:00
|
|
|
uint64_t version = SPA_VERSION;
|
2023-03-11 21:39:24 +03:00
|
|
|
const char *tname;
|
2008-12-03 23:09:06 +03:00
|
|
|
|
|
|
|
(void) nvlist_lookup_uint64(props,
|
|
|
|
zpool_prop_to_name(ZPOOL_PROP_VERSION), &version);
|
2012-12-14 03:24:15 +04:00
|
|
|
if (!SPA_VERSION_IS_SUPPORTED(version)) {
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EINVAL);
|
2008-12-03 23:09:06 +03:00
|
|
|
goto pool_props_bad;
|
|
|
|
}
|
|
|
|
(void) nvlist_lookup_nvlist(props, ZPOOL_ROOTFS_PROPS, &nvl);
|
|
|
|
if (nvl) {
|
|
|
|
error = nvlist_dup(nvl, &rootprops, KM_SLEEP);
|
2019-05-29 01:19:50 +03:00
|
|
|
if (error != 0)
|
|
|
|
goto pool_props_bad;
|
2008-12-03 23:09:06 +03:00
|
|
|
(void) nvlist_remove_all(props, ZPOOL_ROOTFS_PROPS);
|
|
|
|
}
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
|
|
|
|
(void) nvlist_lookup_nvlist(props, ZPOOL_HIDDEN_ARGS,
|
|
|
|
&hidden_args);
|
|
|
|
error = dsl_crypto_params_create_nvlist(DCP_CMD_NONE,
|
|
|
|
rootprops, hidden_args, &dcp);
|
2019-05-29 01:19:50 +03:00
|
|
|
if (error != 0)
|
|
|
|
goto pool_props_bad;
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
(void) nvlist_remove_all(props, ZPOOL_HIDDEN_ARGS);
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
VERIFY(nvlist_alloc(&zplprops, NV_UNIQUE_NAME, KM_SLEEP) == 0);
|
|
|
|
error = zfs_fill_zplprops_root(version, rootprops,
|
|
|
|
zplprops, NULL);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
2008-12-03 23:09:06 +03:00
|
|
|
goto pool_props_bad;
|
2018-05-08 07:11:59 +03:00
|
|
|
|
|
|
|
if (nvlist_lookup_string(props,
|
|
|
|
zpool_prop_to_name(ZPOOL_PROP_TNAME), &tname) == 0)
|
|
|
|
spa_name = tname;
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
error = spa_create(zc->zc_name, config, props, zplprops, dcp);
|
2008-12-03 23:09:06 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Set the remaining root properties
|
|
|
|
*/
|
2018-05-08 07:11:59 +03:00
|
|
|
if (!error && (error = zfs_set_prop_nvlist(spa_name,
|
2019-05-29 01:19:50 +03:00
|
|
|
ZPROP_SRC_LOCAL, rootprops, NULL)) != 0) {
|
2018-05-08 07:11:59 +03:00
|
|
|
(void) spa_destroy(spa_name);
|
2019-05-29 01:19:50 +03:00
|
|
|
unload_wkey = B_FALSE; /* spa_destroy() unloads wrapping keys */
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
pool_props_bad:
|
|
|
|
nvlist_free(rootprops);
|
|
|
|
nvlist_free(zplprops);
|
2008-11-20 23:01:55 +03:00
|
|
|
nvlist_free(config);
|
2008-12-03 23:09:06 +03:00
|
|
|
nvlist_free(props);
|
2019-05-29 01:19:50 +03:00
|
|
|
dsl_crypto_params_free(dcp, unload_wkey && !!error);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_pool_destroy(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
zfs_log_history(zc);
|
|
|
|
error = spa_destroy(zc->zc_name);
|
2014-03-22 13:07:14 +04:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_pool_import(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
nvlist_t *config, *props = NULL;
|
|
|
|
uint64_t guid;
|
2010-05-29 00:45:14 +04:00
|
|
|
int error;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
if ((error = get_nvlist(zc->zc_nvlist_conf, zc->zc_nvlist_conf_size,
|
2009-07-03 02:44:48 +04:00
|
|
|
zc->zc_iflags, &config)) != 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
|
|
|
|
if (zc->zc_nvlist_src_size != 0 && (error =
|
2009-07-03 02:44:48 +04:00
|
|
|
get_nvlist(zc->zc_nvlist_src, zc->zc_nvlist_src_size,
|
|
|
|
zc->zc_iflags, &props))) {
|
2008-11-20 23:01:55 +03:00
|
|
|
nvlist_free(config);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_GUID, &guid) != 0 ||
|
|
|
|
guid != zc->zc_guid)
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EINVAL);
|
2008-11-20 23:01:55 +03:00
|
|
|
else
|
2010-08-27 01:24:34 +04:00
|
|
|
error = spa_import(zc->zc_name, config, props, zc->zc_cookie);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-08-27 01:24:34 +04:00
|
|
|
if (zc->zc_nvlist_dst != 0) {
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if ((err = put_nvlist(zc, config)) != 0)
|
|
|
|
error = err;
|
|
|
|
}
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
nvlist_free(config);
|
2016-04-01 06:54:07 +03:00
|
|
|
nvlist_free(props);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_pool_export(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
int error;
|
2008-12-03 23:09:06 +03:00
|
|
|
boolean_t force = (boolean_t)zc->zc_cookie;
|
2009-01-16 00:59:39 +03:00
|
|
|
boolean_t hardforce = (boolean_t)zc->zc_guid;
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
zfs_log_history(zc);
|
2009-01-16 00:59:39 +03:00
|
|
|
error = spa_export(zc->zc_name, NULL, force, hardforce);
|
2014-03-22 13:07:14 +04:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_pool_configs(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
nvlist_t *configs;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
if ((configs = spa_all_configs(&zc->zc_cookie)) == NULL)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EEXIST));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
error = put_nvlist(zc, configs);
|
|
|
|
|
|
|
|
nvlist_free(configs);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2012-12-14 03:24:15 +04:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of the pool
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_cookie real errno
|
|
|
|
* zc_nvlist_dst config nvlist
|
|
|
|
* zc_nvlist_dst_size size of config nvlist
|
|
|
|
*/
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_pool_stats(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
nvlist_t *config;
|
|
|
|
int error;
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
error = spa_get_stats(zc->zc_name, &config, zc->zc_value,
|
|
|
|
sizeof (zc->zc_value));
|
|
|
|
|
|
|
|
if (config != NULL) {
|
|
|
|
ret = put_nvlist(zc, config);
|
|
|
|
nvlist_free(config);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The config may be present even if 'error' is non-zero.
|
|
|
|
* In this case we return success, and preserve the real errno
|
|
|
|
* in 'zc_cookie'.
|
|
|
|
*/
|
|
|
|
zc->zc_cookie = error;
|
|
|
|
} else {
|
|
|
|
ret = error;
|
|
|
|
}
|
|
|
|
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Try to import the given pool, returning pool stats as appropriate so that
|
|
|
|
* user land knows which devices are available and overall pool health.
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_pool_tryimport(zfs_cmd_t *zc)
|
|
|
|
{
|
Multi-modifier protection (MMP)
Add multihost=on|off pool property to control MMP. When enabled
a new thread writes uberblocks to the last slot in each label, at a
set frequency, to indicate to other hosts the pool is actively imported.
These uberblocks are the last synced uberblock with an updated
timestamp. Property defaults to off.
During tryimport, find the "best" uberblock (newest txg and timestamp)
repeatedly, checking for change in the found uberblock. Include the
results of the activity test in the config returned by tryimport.
These results are reported to user in "zpool import".
Allow the user to control the period between MMP writes, and the
duration of the activity test on import, via a new module parameter
zfs_multihost_interval. The period is specified in milliseconds. The
activity test duration is calculated from this value, and from the
mmp_delay in the "best" uberblock found initially.
Add a kstat interface to export statistics about Multiple Modifier
Protection (MMP) updates. Include the last synced txg number, the
timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV
label that received the last MMP update, and the VDEV path. Abbreviated
output below.
$ cat /proc/spl/kstat/zfs/mypool/multihost
31 0 0x01 10 880 105092382393521 105144180101111
txg timestamp mmp_delay vdev_guid vdev_label vdev_path
20468 261337 250274925 68396651780 3 /dev/sda
20468 261339 252023374 6267402363293 1 /dev/sdc
20468 261340 252000858 6698080955233 1 /dev/sdx
20468 261341 251980635 783892869810 2 /dev/sdy
20468 261342 253385953 8923255792467 3 /dev/sdd
20468 261344 253336622 042125143176 0 /dev/sdab
20468 261345 253310522 1200778101278 2 /dev/sde
20468 261346 253286429 0950576198362 2 /dev/sdt
20468 261347 253261545 96209817917 3 /dev/sds
20468 261349 253238188 8555725937673 3 /dev/sdb
Add a new tunable zfs_multihost_history to specify the number of MMP
updates to store history for. By default it is set to zero meaning that
no MMP statistics are stored.
When using ztest to generate activity, for automated tests of the MMP
function, some test functions interfere with the test. For example, the
pool is exported to run zdb and then imported again. Add a new ztest
function, "-M", to alter ztest behavior to prevent this.
Add new tests to verify the new functionality. Tests provided by
Giuseppe Di Natale.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #745
Closes #6279
2017-07-08 06:20:35 +03:00
|
|
|
nvlist_t *tryconfig, *config = NULL;
|
2008-11-20 23:01:55 +03:00
|
|
|
int error;
|
|
|
|
|
|
|
|
if ((error = get_nvlist(zc->zc_nvlist_conf, zc->zc_nvlist_conf_size,
|
2009-07-03 02:44:48 +04:00
|
|
|
zc->zc_iflags, &tryconfig)) != 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
|
|
|
|
config = spa_tryimport(tryconfig);
|
|
|
|
|
|
|
|
nvlist_free(tryconfig);
|
|
|
|
|
|
|
|
if (config == NULL)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
error = put_nvlist(zc, config);
|
|
|
|
nvlist_free(config);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of the pool
|
|
|
|
* zc_cookie scan func (pool_scan_func_t)
|
2017-07-07 08:16:13 +03:00
|
|
|
* zc_flags scrub pause/resume flag (pool_scrub_cmd_t)
|
2010-05-29 00:45:14 +04:00
|
|
|
*/
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
2010-05-29 00:45:14 +04:00
|
|
|
zfs_ioc_pool_scan(zfs_cmd_t *zc)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
|
|
|
|
2017-07-07 08:16:13 +03:00
|
|
|
if (zc->zc_flags >= POOL_SCRUB_FLAGS_END)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
2018-04-04 03:31:30 +03:00
|
|
|
if ((error = spa_open(zc->zc_name, &spa, FTAG)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
2017-07-07 08:16:13 +03:00
|
|
|
if (zc->zc_flags == POOL_SCRUB_PAUSE)
|
|
|
|
error = spa_scrub_pause_resume(spa, POOL_SCRUB_PAUSE);
|
|
|
|
else if (zc->zc_cookie == POOL_SCAN_NONE)
|
2010-05-29 00:45:14 +04:00
|
|
|
error = spa_scan_stop(spa);
|
|
|
|
else
|
|
|
|
error = spa_scan(spa, zc->zc_cookie);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_pool_freeze(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
error = spa_open(zc->zc_name, &spa, FTAG);
|
|
|
|
if (error == 0) {
|
|
|
|
spa_freeze(spa);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
}
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_pool_upgrade(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
if ((error = spa_open(zc->zc_name, &spa, FTAG)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
2012-12-14 03:24:15 +04:00
|
|
|
if (zc->zc_cookie < spa_version(spa) ||
|
|
|
|
!SPA_VERSION_IS_SUPPORTED(zc->zc_cookie)) {
|
2008-11-20 23:01:55 +03:00
|
|
|
spa_close(spa, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
spa_upgrade(spa, zc->zc_cookie);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_pool_get_history(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
char *hist_buf;
|
|
|
|
uint64_t size;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
if ((size = zc->zc_history_len) == 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
if ((error = spa_open(zc->zc_name, &spa, FTAG)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
if (spa_version(spa) < SPA_VERSION_ZPOOL_HISTORY) {
|
|
|
|
spa_close(spa, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2011-05-06 20:59:52 +04:00
|
|
|
hist_buf = vmem_alloc(size, KM_SLEEP);
|
2008-11-20 23:01:55 +03:00
|
|
|
if ((error = spa_history_get(spa, &zc->zc_history_offset,
|
|
|
|
&zc->zc_history_len, hist_buf)) == 0) {
|
2009-07-03 02:44:48 +04:00
|
|
|
error = ddi_copyout(hist_buf,
|
|
|
|
(void *)(uintptr_t)zc->zc_history,
|
|
|
|
zc->zc_history_len, zc->zc_iflags);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
spa_close(spa, FTAG);
|
2011-05-06 20:59:52 +04:00
|
|
|
vmem_free(hist_buf, size);
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2011-11-12 02:07:54 +04:00
|
|
|
static int
|
|
|
|
zfs_ioc_pool_reguid(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
error = spa_open(zc->zc_name, &spa, FTAG);
|
|
|
|
if (error == 0) {
|
|
|
|
error = spa_change_guid(spa);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
}
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_dsobj_to_dsname(zfs_cmd_t *zc)
|
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
return (dsl_dsobj_to_dsname(zc->zc_name, zc->zc_obj, zc->zc_value));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
* zc_obj object to find
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_value name of object
|
|
|
|
*/
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_obj_to_path(zfs_cmd_t *zc)
|
|
|
|
{
|
2010-05-29 00:45:14 +04:00
|
|
|
objset_t *os;
|
2008-11-20 23:01:55 +03:00
|
|
|
int error;
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
/* XXX reading from objset not owned */
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
if ((error = dmu_objset_hold_flags(zc->zc_name, B_TRUE,
|
|
|
|
FTAG, &os)) != 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
2010-05-29 00:45:14 +04:00
|
|
|
if (dmu_objset_type(os) != DMU_OST_ZFS) {
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dmu_objset_rele_flags(os, B_TRUE, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
error = zfs_obj_to_path(os, zc->zc_obj, zc->zc_value,
|
2008-11-20 23:01:55 +03:00
|
|
|
sizeof (zc->zc_value));
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dmu_objset_rele_flags(os, B_TRUE, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2010-08-27 01:24:34 +04:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
* zc_obj object to find
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_stat stats on object
|
|
|
|
* zc_value path to object
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_obj_to_stats(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
objset_t *os;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
/* XXX reading from objset not owned */
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
if ((error = dmu_objset_hold_flags(zc->zc_name, B_TRUE,
|
|
|
|
FTAG, &os)) != 0)
|
2010-08-27 01:24:34 +04:00
|
|
|
return (error);
|
|
|
|
if (dmu_objset_type(os) != DMU_OST_ZFS) {
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dmu_objset_rele_flags(os, B_TRUE, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2010-08-27 01:24:34 +04:00
|
|
|
}
|
|
|
|
error = zfs_obj_to_stats(os, zc->zc_obj, &zc->zc_stat, zc->zc_value,
|
|
|
|
sizeof (zc->zc_value));
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dmu_objset_rele_flags(os, B_TRUE, FTAG);
|
2010-08-27 01:24:34 +04:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_vdev_add(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
2013-11-15 02:22:52 +04:00
|
|
|
nvlist_t *config;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
error = spa_open(zc->zc_name, &spa, FTAG);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
error = get_nvlist(zc->zc_nvlist_conf, zc->zc_nvlist_conf_size,
|
2009-07-03 02:44:48 +04:00
|
|
|
zc->zc_iflags, &config);
|
2008-11-20 23:01:55 +03:00
|
|
|
if (error == 0) {
|
|
|
|
error = spa_vdev_add(spa, config);
|
|
|
|
nvlist_free(config);
|
|
|
|
}
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of the pool
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
* zc_guid guid of vdev to remove
|
|
|
|
* zc_cookie cancel removal
|
2010-05-29 00:45:14 +04:00
|
|
|
*/
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_vdev_remove(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
error = spa_open(zc->zc_name, &spa, FTAG);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
if (zc->zc_cookie != 0) {
|
|
|
|
error = spa_vdev_remove_cancel(spa);
|
|
|
|
} else {
|
|
|
|
error = spa_vdev_remove(spa, zc->zc_guid, B_FALSE);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_vdev_set_state(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
|
|
|
vdev_state_t newstate = VDEV_STATE_UNKNOWN;
|
|
|
|
|
|
|
|
if ((error = spa_open(zc->zc_name, &spa, FTAG)) != 0)
|
|
|
|
return (error);
|
|
|
|
switch (zc->zc_cookie) {
|
|
|
|
case VDEV_STATE_ONLINE:
|
|
|
|
error = vdev_online(spa, zc->zc_guid, zc->zc_obj, &newstate);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case VDEV_STATE_OFFLINE:
|
|
|
|
error = vdev_offline(spa, zc->zc_guid, zc->zc_obj);
|
|
|
|
break;
|
|
|
|
|
|
|
|
case VDEV_STATE_FAULTED:
|
2010-05-29 00:45:14 +04:00
|
|
|
if (zc->zc_obj != VDEV_AUX_ERR_EXCEEDED &&
|
2017-05-19 22:30:16 +03:00
|
|
|
zc->zc_obj != VDEV_AUX_EXTERNAL &&
|
|
|
|
zc->zc_obj != VDEV_AUX_EXTERNAL_PERSIST)
|
2010-05-29 00:45:14 +04:00
|
|
|
zc->zc_obj = VDEV_AUX_ERR_EXCEEDED;
|
|
|
|
|
|
|
|
error = vdev_fault(spa, zc->zc_guid, zc->zc_obj);
|
2008-11-20 23:01:55 +03:00
|
|
|
break;
|
|
|
|
|
|
|
|
case VDEV_STATE_DEGRADED:
|
2010-05-29 00:45:14 +04:00
|
|
|
if (zc->zc_obj != VDEV_AUX_ERR_EXCEEDED &&
|
|
|
|
zc->zc_obj != VDEV_AUX_EXTERNAL)
|
|
|
|
zc->zc_obj = VDEV_AUX_ERR_EXCEEDED;
|
|
|
|
|
|
|
|
error = vdev_degrade(spa, zc->zc_guid, zc->zc_obj);
|
2008-11-20 23:01:55 +03:00
|
|
|
break;
|
|
|
|
|
2022-09-28 19:48:46 +03:00
|
|
|
case VDEV_STATE_REMOVED:
|
|
|
|
error = vdev_remove_wanted(spa, zc->zc_guid);
|
|
|
|
break;
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
default:
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EINVAL);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
zc->zc_cookie = newstate;
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_vdev_attach(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
nvlist_t *config;
|
2020-07-03 21:05:50 +03:00
|
|
|
int replacing = zc->zc_cookie;
|
|
|
|
int rebuild = zc->zc_simple;
|
2008-11-20 23:01:55 +03:00
|
|
|
int error;
|
|
|
|
|
|
|
|
if ((error = spa_open(zc->zc_name, &spa, FTAG)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
if ((error = get_nvlist(zc->zc_nvlist_conf, zc->zc_nvlist_conf_size,
|
2009-07-03 02:44:48 +04:00
|
|
|
zc->zc_iflags, &config)) == 0) {
|
2020-07-03 21:05:50 +03:00
|
|
|
error = spa_vdev_attach(spa, zc->zc_guid, config, replacing,
|
|
|
|
rebuild);
|
2008-11-20 23:01:55 +03:00
|
|
|
nvlist_free(config);
|
|
|
|
}
|
|
|
|
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_vdev_detach(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
if ((error = spa_open(zc->zc_name, &spa, FTAG)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
2009-01-16 00:59:39 +03:00
|
|
|
error = spa_vdev_detach(spa, zc->zc_guid, 0, B_FALSE);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
static int
|
|
|
|
zfs_ioc_vdev_split(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
nvlist_t *config, *props = NULL;
|
|
|
|
int error;
|
|
|
|
boolean_t exp = !!(zc->zc_cookie & ZPOOL_EXPORT_AFTER_SPLIT);
|
|
|
|
|
|
|
|
if ((error = spa_open(zc->zc_name, &spa, FTAG)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
2010-08-26 20:52:42 +04:00
|
|
|
if ((error = get_nvlist(zc->zc_nvlist_conf, zc->zc_nvlist_conf_size,
|
|
|
|
zc->zc_iflags, &config))) {
|
2010-05-29 00:45:14 +04:00
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (zc->zc_nvlist_src_size != 0 && (error =
|
|
|
|
get_nvlist(zc->zc_nvlist_src, zc->zc_nvlist_src_size,
|
|
|
|
zc->zc_iflags, &props))) {
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
nvlist_free(config);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
error = spa_vdev_split_mirror(spa, zc->zc_string, config, props, exp);
|
|
|
|
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
|
|
|
|
nvlist_free(config);
|
|
|
|
nvlist_free(props);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_vdev_setpath(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
2020-10-03 03:44:10 +03:00
|
|
|
const char *path = zc->zc_value;
|
2008-11-20 23:01:55 +03:00
|
|
|
uint64_t guid = zc->zc_guid;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
error = spa_open(zc->zc_name, &spa, FTAG);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
error = spa_vdev_setpath(spa, guid, path);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2009-07-03 02:44:48 +04:00
|
|
|
static int
|
|
|
|
zfs_ioc_vdev_setfru(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
2020-10-03 03:44:10 +03:00
|
|
|
const char *fru = zc->zc_value;
|
2009-07-03 02:44:48 +04:00
|
|
|
uint64_t guid = zc->zc_guid;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
error = spa_open(zc->zc_name, &spa, FTAG);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
error = spa_vdev_setfru(spa, guid, fru);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
2010-08-27 01:24:34 +04:00
|
|
|
zfs_ioc_objset_stats_impl(zfs_cmd_t *zc, objset_t *os)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2010-08-27 01:24:34 +04:00
|
|
|
int error = 0;
|
2008-11-20 23:01:55 +03:00
|
|
|
nvlist_t *nv;
|
|
|
|
|
|
|
|
dmu_objset_fast_stat(os, &zc->zc_objset_stats);
|
|
|
|
|
2022-12-14 04:27:54 +03:00
|
|
|
if (!zc->zc_simple && zc->zc_nvlist_dst != 0 &&
|
2010-05-29 00:45:14 +04:00
|
|
|
(error = dsl_prop_get_all(os, &nv)) == 0) {
|
2008-11-20 23:01:55 +03:00
|
|
|
dmu_objset_stats(os, nv);
|
|
|
|
/*
|
|
|
|
* NB: zvol_get_stats() will read the objset contents,
|
|
|
|
* which we aren't supposed to do with a
|
2008-12-03 23:09:06 +03:00
|
|
|
* DS_MODE_USER hold, because it could be
|
2008-11-20 23:01:55 +03:00
|
|
|
* inconsistent. So this is a bit of a workaround...
|
2019-09-03 03:56:41 +03:00
|
|
|
* XXX reading without owning
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
2011-11-17 22:14:36 +04:00
|
|
|
if (!zc->zc_objset_stats.dds_inconsistent &&
|
|
|
|
dmu_objset_type(os) == DMU_OST_ZVOL) {
|
|
|
|
error = zvol_get_stats(os, nv);
|
2016-11-02 22:34:10 +03:00
|
|
|
if (error == EIO) {
|
|
|
|
nvlist_free(nv);
|
2011-11-17 22:14:36 +04:00
|
|
|
return (error);
|
2016-11-02 22:34:10 +03:00
|
|
|
}
|
2013-05-11 01:17:03 +04:00
|
|
|
VERIFY0(error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
2010-08-26 21:34:33 +04:00
|
|
|
if (error == 0)
|
|
|
|
error = put_nvlist(zc, nv);
|
2008-11-20 23:01:55 +03:00
|
|
|
nvlist_free(nv);
|
|
|
|
}
|
|
|
|
|
2010-08-27 01:24:34 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
* zc_nvlist_dst_size size of buffer for property nvlist
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_objset_stats stats
|
|
|
|
* zc_nvlist_dst property nvlist
|
|
|
|
* zc_nvlist_dst_size size of property nvlist
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_objset_stats(zfs_cmd_t *zc)
|
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
objset_t *os;
|
2010-08-27 01:24:34 +04:00
|
|
|
int error;
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dmu_objset_hold(zc->zc_name, FTAG, &os);
|
|
|
|
if (error == 0) {
|
|
|
|
error = zfs_ioc_objset_stats_impl(zc, os);
|
|
|
|
dmu_objset_rele(os, FTAG);
|
|
|
|
}
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
* zc_nvlist_dst_size size of buffer for property nvlist
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_nvlist_dst received property nvlist
|
|
|
|
* zc_nvlist_dst_size size of received property nvlist
|
|
|
|
*
|
|
|
|
* Gets received properties (distinct from local properties on or after
|
|
|
|
* SPA_VERSION_RECVD_PROPS) for callers who want to differentiate received from
|
|
|
|
* local property values.
|
|
|
|
*/
|
|
|
|
static int
|
2010-08-26 22:42:43 +04:00
|
|
|
zfs_ioc_objset_recvd_props(zfs_cmd_t *zc)
|
2010-05-29 00:45:14 +04:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
int error = 0;
|
2010-05-29 00:45:14 +04:00
|
|
|
nvlist_t *nv;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Without this check, we would return local property values if the
|
|
|
|
* caller has not already received properties on or after
|
|
|
|
* SPA_VERSION_RECVD_PROPS.
|
|
|
|
*/
|
2013-09-04 16:00:57 +04:00
|
|
|
if (!dsl_prop_get_hasrecvd(zc->zc_name))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
if (zc->zc_nvlist_dst != 0 &&
|
2013-09-04 16:00:57 +04:00
|
|
|
(error = dsl_prop_get_received(zc->zc_name, &nv)) == 0) {
|
2010-05-29 00:45:14 +04:00
|
|
|
error = put_nvlist(zc, nv);
|
|
|
|
nvlist_free(nv);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
nvl_add_zplprop(objset_t *os, nvlist_t *props, zfs_prop_t prop)
|
|
|
|
{
|
|
|
|
uint64_t value;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* zfs_get_zplprop() will either find a value or give us
|
|
|
|
* the default value (if there is one).
|
|
|
|
*/
|
|
|
|
if ((error = zfs_get_zplprop(os, prop, &value)) != 0)
|
|
|
|
return (error);
|
|
|
|
VERIFY(nvlist_add_uint64(props, zfs_prop_to_name(prop), value) == 0);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
* zc_nvlist_dst_size size of buffer for zpl property nvlist
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_nvlist_dst zpl property nvlist
|
|
|
|
* zc_nvlist_dst_size size of zpl property nvlist
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_objset_zplprops(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
objset_t *os;
|
|
|
|
int err;
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
/* XXX reading without owning */
|
2010-08-26 20:52:42 +04:00
|
|
|
if ((err = dmu_objset_hold(zc->zc_name, FTAG, &os)))
|
2008-11-20 23:01:55 +03:00
|
|
|
return (err);
|
|
|
|
|
|
|
|
dmu_objset_fast_stat(os, &zc->zc_objset_stats);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* NB: nvl_add_zplprop() will read the objset contents,
|
2008-12-03 23:09:06 +03:00
|
|
|
* which we aren't supposed to do with a DS_MODE_USER
|
|
|
|
* hold, because it could be inconsistent.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
2010-08-26 20:52:39 +04:00
|
|
|
if (zc->zc_nvlist_dst != 0 &&
|
2008-11-20 23:01:55 +03:00
|
|
|
!zc->zc_objset_stats.dds_inconsistent &&
|
|
|
|
dmu_objset_type(os) == DMU_OST_ZFS) {
|
|
|
|
nvlist_t *nv;
|
|
|
|
|
|
|
|
VERIFY(nvlist_alloc(&nv, NV_UNIQUE_NAME, KM_SLEEP) == 0);
|
|
|
|
if ((err = nvl_add_zplprop(os, nv, ZFS_PROP_VERSION)) == 0 &&
|
|
|
|
(err = nvl_add_zplprop(os, nv, ZFS_PROP_NORMALIZE)) == 0 &&
|
|
|
|
(err = nvl_add_zplprop(os, nv, ZFS_PROP_UTF8ONLY)) == 0 &&
|
|
|
|
(err = nvl_add_zplprop(os, nv, ZFS_PROP_CASE)) == 0)
|
|
|
|
err = put_nvlist(zc, nv);
|
|
|
|
nvlist_free(nv);
|
|
|
|
} else {
|
2013-03-08 22:41:28 +04:00
|
|
|
err = SET_ERROR(ENOENT);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
2010-05-29 00:45:14 +04:00
|
|
|
dmu_objset_rele(os, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
* zc_cookie zap cursor
|
|
|
|
* zc_nvlist_dst_size size of buffer for property nvlist
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_name name of next filesystem
|
2009-07-03 02:44:48 +04:00
|
|
|
* zc_cookie zap cursor
|
2008-11-20 23:01:55 +03:00
|
|
|
* zc_objset_stats stats
|
|
|
|
* zc_nvlist_dst property nvlist
|
|
|
|
* zc_nvlist_dst_size size of property nvlist
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_dataset_list_next(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
objset_t *os;
|
|
|
|
int error;
|
|
|
|
char *p;
|
2010-05-29 00:45:14 +04:00
|
|
|
size_t orig_len = strlen(zc->zc_name);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
top:
|
2010-08-26 20:52:42 +04:00
|
|
|
if ((error = dmu_objset_hold(zc->zc_name, FTAG, &os))) {
|
2008-11-20 23:01:55 +03:00
|
|
|
if (error == ENOENT)
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(ESRCH);
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
p = strrchr(zc->zc_name, '/');
|
|
|
|
if (p == NULL || p[1] != '\0')
|
|
|
|
(void) strlcat(zc->zc_name, "/", sizeof (zc->zc_name));
|
|
|
|
p = zc->zc_name + strlen(zc->zc_name);
|
|
|
|
|
|
|
|
do {
|
|
|
|
error = dmu_dir_list_next(os,
|
|
|
|
sizeof (zc->zc_name) - (p - zc->zc_name), p,
|
|
|
|
NULL, &zc->zc_cookie);
|
|
|
|
if (error == ENOENT)
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(ESRCH);
|
2018-07-10 20:49:50 +03:00
|
|
|
} while (error == 0 && zfs_dataset_name_hidden(zc->zc_name));
|
2010-05-29 00:45:14 +04:00
|
|
|
dmu_objset_rele(os, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
/*
|
|
|
|
* If it's an internal dataset (ie. with a '$' in its name),
|
|
|
|
* don't try to get stats for it, otherwise we'll return ENOENT.
|
|
|
|
*/
|
|
|
|
if (error == 0 && strchr(zc->zc_name, '$') == NULL) {
|
2008-11-20 23:01:55 +03:00
|
|
|
error = zfs_ioc_objset_stats(zc); /* fill in the stats */
|
2010-05-29 00:45:14 +04:00
|
|
|
if (error == ENOENT) {
|
|
|
|
/* We lost a race with destroy, get the next one. */
|
|
|
|
zc->zc_name[orig_len] = '\0';
|
|
|
|
goto top;
|
|
|
|
}
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
* zc_cookie zap cursor
|
2019-03-12 23:13:22 +03:00
|
|
|
* zc_nvlist_src iteration range nvlist
|
|
|
|
* zc_nvlist_src_size size of iteration range nvlist
|
2008-11-20 23:01:55 +03:00
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_name name of next snapshot
|
|
|
|
* zc_objset_stats stats
|
|
|
|
* zc_nvlist_dst property nvlist
|
|
|
|
* zc_nvlist_dst_size size of property nvlist
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_snapshot_list_next(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
int error;
|
2019-03-12 23:13:22 +03:00
|
|
|
objset_t *os, *ossnap;
|
|
|
|
dsl_dataset_t *ds;
|
|
|
|
uint64_t min_txg = 0, max_txg = 0;
|
|
|
|
|
|
|
|
if (zc->zc_nvlist_src_size != 0) {
|
|
|
|
nvlist_t *props = NULL;
|
|
|
|
error = get_nvlist(zc->zc_nvlist_src, zc->zc_nvlist_src_size,
|
|
|
|
zc->zc_iflags, &props);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
(void) nvlist_lookup_uint64(props, SNAP_ITER_MIN_TXG,
|
|
|
|
&min_txg);
|
|
|
|
(void) nvlist_lookup_uint64(props, SNAP_ITER_MAX_TXG,
|
|
|
|
&max_txg);
|
|
|
|
nvlist_free(props);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
error = dmu_objset_hold(zc->zc_name, FTAG, &os);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0) {
|
2020-02-27 03:09:17 +03:00
|
|
|
return (error == ENOENT ? SET_ERROR(ESRCH) : error);
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* A dataset name of maximum length cannot have any snapshots,
|
|
|
|
* so exit immediately.
|
|
|
|
*/
|
2016-06-16 00:28:36 +03:00
|
|
|
if (strlcat(zc->zc_name, "@", sizeof (zc->zc_name)) >=
|
|
|
|
ZFS_MAX_DATASET_NAME_LEN) {
|
2010-05-29 00:45:14 +04:00
|
|
|
dmu_objset_rele(os, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ESRCH));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2019-03-12 23:13:22 +03:00
|
|
|
while (error == 0) {
|
|
|
|
if (issig(JUSTLOOKING) && issig(FORREAL)) {
|
|
|
|
error = SET_ERROR(EINTR);
|
|
|
|
break;
|
|
|
|
}
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2019-03-12 23:13:22 +03:00
|
|
|
error = dmu_snapshot_list_next(os,
|
|
|
|
sizeof (zc->zc_name) - strlen(zc->zc_name),
|
|
|
|
zc->zc_name + strlen(zc->zc_name), &zc->zc_obj,
|
|
|
|
&zc->zc_cookie, NULL);
|
|
|
|
if (error == ENOENT) {
|
|
|
|
error = SET_ERROR(ESRCH);
|
|
|
|
break;
|
|
|
|
} else if (error != 0) {
|
|
|
|
break;
|
|
|
|
}
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2019-03-12 23:13:22 +03:00
|
|
|
error = dsl_dataset_hold_obj(dmu_objset_pool(os), zc->zc_obj,
|
|
|
|
FTAG, &ds);
|
|
|
|
if (error != 0)
|
|
|
|
break;
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2019-03-12 23:13:22 +03:00
|
|
|
if ((min_txg != 0 && dsl_get_creationtxg(ds) < min_txg) ||
|
|
|
|
(max_txg != 0 && dsl_get_creationtxg(ds) > max_txg)) {
|
2010-08-27 01:24:34 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2019-03-12 23:13:22 +03:00
|
|
|
/* undo snapshot name append */
|
|
|
|
*(strchr(zc->zc_name, '@') + 1) = '\0';
|
|
|
|
/* skip snapshot */
|
|
|
|
continue;
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2019-03-12 23:13:22 +03:00
|
|
|
|
|
|
|
if (zc->zc_simple) {
|
2022-12-14 04:27:54 +03:00
|
|
|
dsl_dataset_fast_stat(ds, &zc->zc_objset_stats);
|
2019-03-12 23:13:22 +03:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((error = dmu_objset_from_ds(ds, &ossnap)) != 0) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if ((error = zfs_ioc_objset_stats_impl(zc, ossnap)) != 0) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
break;
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-08-27 01:24:34 +04:00
|
|
|
dmu_objset_rele(os, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
/* if we failed, undo the @ that we tacked on to zc_name */
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
*strchr(zc->zc_name, '@') = '\0';
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
static int
|
|
|
|
zfs_prop_set_userquota(const char *dsname, nvpair_t *pair)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2010-05-29 00:45:14 +04:00
|
|
|
const char *propname = nvpair_name(pair);
|
|
|
|
uint64_t *valary;
|
|
|
|
unsigned int vallen;
|
2020-10-03 03:44:10 +03:00
|
|
|
const char *dash, *domain;
|
2010-05-29 00:45:14 +04:00
|
|
|
zfs_userquota_prop_t type;
|
|
|
|
uint64_t rid;
|
|
|
|
uint64_t quota;
|
2017-03-08 03:21:37 +03:00
|
|
|
zfsvfs_t *zfsvfs;
|
2010-05-29 00:45:14 +04:00
|
|
|
int err;
|
|
|
|
|
|
|
|
if (nvpair_type(pair) == DATA_TYPE_NVLIST) {
|
|
|
|
nvlist_t *attrs;
|
|
|
|
VERIFY(nvpair_value_nvlist(pair, &attrs) == 0);
|
|
|
|
if (nvlist_lookup_nvpair(attrs, ZPROP_VALUE,
|
|
|
|
&pair) != 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
/*
|
2010-05-29 00:45:14 +04:00
|
|
|
* A correctly constructed propname is encoded as
|
|
|
|
* userquota@<rid>-<domain>.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
2010-05-29 00:45:14 +04:00
|
|
|
if ((dash = strchr(propname, '-')) == NULL ||
|
|
|
|
nvpair_value_uint64_array(pair, &valary, &vallen) != 0 ||
|
|
|
|
vallen != 3)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
domain = dash + 1;
|
|
|
|
type = valary[0];
|
|
|
|
rid = valary[1];
|
|
|
|
quota = valary[2];
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2017-03-09 01:56:19 +03:00
|
|
|
err = zfsvfs_hold(dsname, FTAG, &zfsvfs, B_FALSE);
|
2010-05-29 00:45:14 +04:00
|
|
|
if (err == 0) {
|
2017-03-08 03:21:37 +03:00
|
|
|
err = zfs_set_userquota(zfsvfs, type, domain, rid, quota);
|
2017-03-09 01:56:19 +03:00
|
|
|
zfsvfs_rele(zfsvfs, FTAG);
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2009-07-03 02:44:48 +04:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
return (err);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
/*
|
|
|
|
* If the named property is one that has a special function to set its value,
|
|
|
|
* return 0 on success and a positive error code on failure; otherwise if it is
|
|
|
|
* not one of the special properties handled by this function, return -1.
|
|
|
|
*
|
|
|
|
* XXX: It would be better for callers of the property interface if we handled
|
|
|
|
* these special cases in dsl_prop.c (in the dsl layer).
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_prop_set_special(const char *dsname, zprop_source_t source,
|
|
|
|
nvpair_t *pair)
|
|
|
|
{
|
|
|
|
const char *propname = nvpair_name(pair);
|
|
|
|
zfs_prop_t prop = zfs_name_to_prop(propname);
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
uint64_t intval = 0;
|
2020-10-03 03:44:10 +03:00
|
|
|
const char *strval = NULL;
|
2014-11-03 23:15:08 +03:00
|
|
|
int err = -1;
|
2009-07-03 02:44:48 +04:00
|
|
|
|
2022-06-14 21:27:53 +03:00
|
|
|
if (prop == ZPROP_USERPROP) {
|
2010-05-29 00:45:14 +04:00
|
|
|
if (zfs_prop_userquota(propname))
|
|
|
|
return (zfs_prop_set_userquota(dsname, pair));
|
|
|
|
return (-1);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
if (nvpair_type(pair) == DATA_TYPE_NVLIST) {
|
|
|
|
nvlist_t *attrs;
|
|
|
|
VERIFY(nvpair_value_nvlist(pair, &attrs) == 0);
|
|
|
|
VERIFY(nvlist_lookup_nvpair(attrs, ZPROP_VALUE,
|
|
|
|
&pair) == 0);
|
|
|
|
}
|
2008-12-03 23:09:06 +03:00
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
/* all special properties are numeric except for keylocation */
|
|
|
|
if (zfs_prop_get_type(prop) == PROP_TYPE_STRING) {
|
|
|
|
strval = fnvpair_value_string(pair);
|
|
|
|
} else {
|
|
|
|
intval = fnvpair_value_uint64(pair);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
switch (prop) {
|
|
|
|
case ZFS_PROP_QUOTA:
|
|
|
|
err = dsl_dir_set_quota(dsname, source, intval);
|
|
|
|
break;
|
|
|
|
case ZFS_PROP_REFQUOTA:
|
2013-09-04 16:00:57 +04:00
|
|
|
err = dsl_dataset_set_refquota(dsname, source, intval);
|
2010-05-29 00:45:14 +04:00
|
|
|
break;
|
2015-04-01 16:07:48 +03:00
|
|
|
case ZFS_PROP_FILESYSTEM_LIMIT:
|
|
|
|
case ZFS_PROP_SNAPSHOT_LIMIT:
|
|
|
|
if (intval == UINT64_MAX) {
|
|
|
|
/* clearing the limit, just do it */
|
|
|
|
err = 0;
|
|
|
|
} else {
|
|
|
|
err = dsl_dir_activate_fs_ss_limit(dsname);
|
|
|
|
}
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
/*
|
|
|
|
* Set err to -1 to force the zfs_set_prop_nvlist code down the
|
|
|
|
* default path to set the value in the nvlist.
|
|
|
|
*/
|
|
|
|
if (err == 0)
|
|
|
|
err = -1;
|
|
|
|
break;
|
|
|
|
case ZFS_PROP_KEYLOCATION:
|
|
|
|
err = dsl_crypto_can_set_keylocation(dsname, strval);
|
|
|
|
|
2015-04-01 16:07:48 +03:00
|
|
|
/*
|
|
|
|
* Set err to -1 to force the zfs_set_prop_nvlist code down the
|
|
|
|
* default path to set the value in the nvlist.
|
|
|
|
*/
|
|
|
|
if (err == 0)
|
|
|
|
err = -1;
|
|
|
|
break;
|
2010-05-29 00:45:14 +04:00
|
|
|
case ZFS_PROP_RESERVATION:
|
|
|
|
err = dsl_dir_set_reservation(dsname, source, intval);
|
|
|
|
break;
|
|
|
|
case ZFS_PROP_REFRESERVATION:
|
2013-09-04 16:00:57 +04:00
|
|
|
err = dsl_dataset_set_refreservation(dsname, source, intval);
|
2010-05-29 00:45:14 +04:00
|
|
|
break;
|
Add zstd support to zfs
This PR adds two new compression types, based on ZStandard:
- zstd: A basic ZStandard compression algorithm Available compression.
Levels for zstd are zstd-1 through zstd-19, where the compression
increases with every level, but speed decreases.
- zstd-fast: A faster version of the ZStandard compression algorithm
zstd-fast is basically a "negative" level of zstd. The compression
decreases with every level, but speed increases.
Available compression levels for zstd-fast:
- zstd-fast-1 through zstd-fast-10
- zstd-fast-20 through zstd-fast-100 (in increments of 10)
- zstd-fast-500 and zstd-fast-1000
For more information check the man page.
Implementation details:
Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.
The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers). The upper bits are used to store the compression level.
It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.
All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables. Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).
The userspace tools all use the combined/bit-shifted value.
Additional notes:
zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.
ZSTD is included with all current tests and new tests are added
as-needed.
Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born. This is currently only used by zstd but can be
extended as needed.
Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #6247
Closes #9024
Closes #10277
Closes #10278
2020-08-18 20:10:17 +03:00
|
|
|
case ZFS_PROP_COMPRESSION:
|
|
|
|
err = dsl_dataset_set_compression(dsname, source, intval);
|
|
|
|
/*
|
|
|
|
* Set err to -1 to force the zfs_set_prop_nvlist code down the
|
|
|
|
* default path to set the value in the nvlist.
|
|
|
|
*/
|
|
|
|
if (err == 0)
|
|
|
|
err = -1;
|
|
|
|
break;
|
2010-05-29 00:45:14 +04:00
|
|
|
case ZFS_PROP_VOLSIZE:
|
2010-08-26 22:45:02 +04:00
|
|
|
err = zvol_set_volsize(dsname, intval);
|
2010-05-29 00:45:14 +04:00
|
|
|
break;
|
2013-02-14 03:11:59 +04:00
|
|
|
case ZFS_PROP_SNAPDEV:
|
2014-03-22 13:07:14 +04:00
|
|
|
err = zvol_set_snapdev(dsname, source, intval);
|
2013-02-14 03:11:59 +04:00
|
|
|
break;
|
2017-07-12 23:05:37 +03:00
|
|
|
case ZFS_PROP_VOLMODE:
|
|
|
|
err = zvol_set_volmode(dsname, source, intval);
|
|
|
|
break;
|
2010-05-29 00:45:14 +04:00
|
|
|
case ZFS_PROP_VERSION:
|
|
|
|
{
|
2017-03-08 03:21:37 +03:00
|
|
|
zfsvfs_t *zfsvfs;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2017-03-09 01:56:19 +03:00
|
|
|
if ((err = zfsvfs_hold(dsname, FTAG, &zfsvfs, B_TRUE)) != 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
break;
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2017-03-08 03:21:37 +03:00
|
|
|
err = zfs_set_version(zfsvfs, intval);
|
2017-03-09 01:56:19 +03:00
|
|
|
zfsvfs_rele(zfsvfs, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
if (err == 0 && intval >= ZPL_VERSION_USERSPACE) {
|
|
|
|
zfs_cmd_t *zc;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2014-12-03 22:56:32 +03:00
|
|
|
zc = kmem_zalloc(sizeof (zfs_cmd_t), KM_SLEEP);
|
2020-06-07 21:42:12 +03:00
|
|
|
(void) strlcpy(zc->zc_name, dsname,
|
|
|
|
sizeof (zc->zc_name));
|
2010-05-29 00:45:14 +04:00
|
|
|
(void) zfs_ioc_userspace_upgrade(zc);
|
2018-02-14 01:54:54 +03:00
|
|
|
(void) zfs_ioc_id_quota_upgrade(zc);
|
2010-05-29 00:45:14 +04:00
|
|
|
kmem_free(zc, sizeof (zfs_cmd_t));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
2010-05-29 00:45:14 +04:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
default:
|
|
|
|
err = -1;
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
return (err);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
libzfs: On FreeBSD, use MNT_NOWAIT with getfsstat
`getfsstat(2)` is used to retrieve the list of mounted file systems,
which libzfs uses when fetching properties like mountpoint, atime,
setuid, etc. The `mode` parameter may be `MNT_NOWAIT`, which uses
information in the VFS's cache, or `MNT_WAIT`, which effectively does a
`statfs` on every single mounted file system in order to fetch the most
up-to-date information. As far as I can tell, the only fields that
libzfs cares about are the filesystem's name, mountpoint, fstypename,
and mount flags. Those things are always updated on mount and unmount,
so they will always be accurate in the VFS's mount cache except in two
circumstances:
1) When a file system is busy unmounting
2) When a ZFS file system changes the value of a mount-overridable
property like atime or setuid, but doesn't remount the file system.
Right now that only happens when the property is changed by an
unprivileged user who has delegated authority to change the property
but not to mount the dataset. But perhaps libzfs could choose to do
it for other reasons in the future.
Switching to `MNT_NOWAIT` will greatly improve speed with no downside,
as long as we explicitly update the mount cache whenever we change a
mount-overridable property.
For comparison, Illumos gets this information using the native
`getmntany` and `getmntent` functions, which also use cached
information. The illumos function that would refresh the cache,
`resetmnttab`, is never called by libzfs.
And on GNU/Linux, `getmntany` and `getmntent` don't even communicate
with the kernel directly. They simply parse the file they are given,
which is usually /etc/mtab or /proc/mounts. Perhaps the implementation
of /proc/mounts is synchronous, ala MNT_WAIT; I don't know.
Sponsored-by: Axcient
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes: #12091
2021-06-08 16:36:43 +03:00
|
|
|
static boolean_t
|
|
|
|
zfs_is_namespace_prop(zfs_prop_t prop)
|
|
|
|
{
|
|
|
|
switch (prop) {
|
|
|
|
|
|
|
|
case ZFS_PROP_ATIME:
|
|
|
|
case ZFS_PROP_RELATIME:
|
|
|
|
case ZFS_PROP_DEVICES:
|
|
|
|
case ZFS_PROP_EXEC:
|
|
|
|
case ZFS_PROP_SETUID:
|
|
|
|
case ZFS_PROP_READONLY:
|
|
|
|
case ZFS_PROP_XATTR:
|
|
|
|
case ZFS_PROP_NBMAND:
|
|
|
|
return (B_TRUE);
|
|
|
|
|
|
|
|
default:
|
|
|
|
return (B_FALSE);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
/*
|
|
|
|
* This function is best effort. If it fails to set any of the given properties,
|
2013-08-28 15:45:09 +04:00
|
|
|
* it continues to set as many as it can and returns the last error
|
|
|
|
* encountered. If the caller provides a non-NULL errlist, it will be filled in
|
|
|
|
* with the list of names of all the properties that failed along with the
|
|
|
|
* corresponding error numbers.
|
2010-05-29 00:45:14 +04:00
|
|
|
*
|
2013-08-28 15:45:09 +04:00
|
|
|
* If every property is set successfully, zero is returned and errlist is not
|
|
|
|
* modified.
|
2010-05-29 00:45:14 +04:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
zfs_set_prop_nvlist(const char *dsname, zprop_source_t source, nvlist_t *nvl,
|
2013-08-28 15:45:09 +04:00
|
|
|
nvlist_t *errlist)
|
2010-05-29 00:45:14 +04:00
|
|
|
{
|
|
|
|
nvpair_t *pair;
|
|
|
|
nvpair_t *propval;
|
|
|
|
int rv = 0;
|
2022-02-04 22:52:10 +03:00
|
|
|
int err;
|
2010-05-29 00:45:14 +04:00
|
|
|
uint64_t intval;
|
2020-10-03 03:44:10 +03:00
|
|
|
const char *strval;
|
libzfs: On FreeBSD, use MNT_NOWAIT with getfsstat
`getfsstat(2)` is used to retrieve the list of mounted file systems,
which libzfs uses when fetching properties like mountpoint, atime,
setuid, etc. The `mode` parameter may be `MNT_NOWAIT`, which uses
information in the VFS's cache, or `MNT_WAIT`, which effectively does a
`statfs` on every single mounted file system in order to fetch the most
up-to-date information. As far as I can tell, the only fields that
libzfs cares about are the filesystem's name, mountpoint, fstypename,
and mount flags. Those things are always updated on mount and unmount,
so they will always be accurate in the VFS's mount cache except in two
circumstances:
1) When a file system is busy unmounting
2) When a ZFS file system changes the value of a mount-overridable
property like atime or setuid, but doesn't remount the file system.
Right now that only happens when the property is changed by an
unprivileged user who has delegated authority to change the property
but not to mount the dataset. But perhaps libzfs could choose to do
it for other reasons in the future.
Switching to `MNT_NOWAIT` will greatly improve speed with no downside,
as long as we explicitly update the mount cache whenever we change a
mount-overridable property.
For comparison, Illumos gets this information using the native
`getmntany` and `getmntent` functions, which also use cached
information. The illumos function that would refresh the cache,
`resetmnttab`, is never called by libzfs.
And on GNU/Linux, `getmntany` and `getmntent` don't even communicate
with the kernel directly. They simply parse the file they are given,
which is usually /etc/mtab or /proc/mounts. Perhaps the implementation
of /proc/mounts is synchronous, ala MNT_WAIT; I don't know.
Sponsored-by: Axcient
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes: #12091
2021-06-08 16:36:43 +03:00
|
|
|
boolean_t should_update_mount_cache = B_FALSE;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
nvlist_t *genericnvl = fnvlist_alloc();
|
|
|
|
nvlist_t *retrynvl = fnvlist_alloc();
|
2010-05-29 00:45:14 +04:00
|
|
|
retry:
|
|
|
|
pair = NULL;
|
|
|
|
while ((pair = nvlist_next_nvpair(nvl, pair)) != NULL) {
|
|
|
|
const char *propname = nvpair_name(pair);
|
|
|
|
zfs_prop_t prop = zfs_name_to_prop(propname);
|
2022-02-04 22:52:10 +03:00
|
|
|
err = 0;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
/* decode the property value */
|
|
|
|
propval = pair;
|
|
|
|
if (nvpair_type(pair) == DATA_TYPE_NVLIST) {
|
|
|
|
nvlist_t *attrs;
|
2013-08-28 15:45:09 +04:00
|
|
|
attrs = fnvpair_value_nvlist(pair);
|
2010-05-29 00:45:14 +04:00
|
|
|
if (nvlist_lookup_nvpair(attrs, ZPROP_VALUE,
|
|
|
|
&propval) != 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
err = SET_ERROR(EINVAL);
|
2009-07-03 02:44:48 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
/* Validate value type */
|
2017-05-10 02:21:09 +03:00
|
|
|
if (err == 0 && source == ZPROP_SRC_INHERITED) {
|
|
|
|
/* inherited properties are expected to be booleans */
|
|
|
|
if (nvpair_type(propval) != DATA_TYPE_BOOLEAN)
|
|
|
|
err = SET_ERROR(EINVAL);
|
2022-06-14 21:27:53 +03:00
|
|
|
} else if (err == 0 && prop == ZPROP_USERPROP) {
|
2010-05-29 00:45:14 +04:00
|
|
|
if (zfs_prop_user(propname)) {
|
|
|
|
if (nvpair_type(propval) != DATA_TYPE_STRING)
|
2013-03-08 22:41:28 +04:00
|
|
|
err = SET_ERROR(EINVAL);
|
2010-05-29 00:45:14 +04:00
|
|
|
} else if (zfs_prop_userquota(propname)) {
|
|
|
|
if (nvpair_type(propval) !=
|
|
|
|
DATA_TYPE_UINT64_ARRAY)
|
2013-03-08 22:41:28 +04:00
|
|
|
err = SET_ERROR(EINVAL);
|
2011-11-17 22:14:36 +04:00
|
|
|
} else {
|
2013-03-08 22:41:28 +04:00
|
|
|
err = SET_ERROR(EINVAL);
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
} else if (err == 0) {
|
|
|
|
if (nvpair_type(propval) == DATA_TYPE_STRING) {
|
|
|
|
if (zfs_prop_get_type(prop) != PROP_TYPE_STRING)
|
2013-03-08 22:41:28 +04:00
|
|
|
err = SET_ERROR(EINVAL);
|
2010-05-29 00:45:14 +04:00
|
|
|
} else if (nvpair_type(propval) == DATA_TYPE_UINT64) {
|
2008-11-20 23:01:55 +03:00
|
|
|
const char *unused;
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
intval = fnvpair_value_uint64(propval);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
switch (zfs_prop_get_type(prop)) {
|
|
|
|
case PROP_TYPE_NUMBER:
|
|
|
|
break;
|
|
|
|
case PROP_TYPE_STRING:
|
2013-03-08 22:41:28 +04:00
|
|
|
err = SET_ERROR(EINVAL);
|
2010-05-29 00:45:14 +04:00
|
|
|
break;
|
2008-11-20 23:01:55 +03:00
|
|
|
case PROP_TYPE_INDEX:
|
|
|
|
if (zfs_prop_index_to_string(prop,
|
2010-05-29 00:45:14 +04:00
|
|
|
intval, &unused) != 0)
|
2020-08-01 18:41:31 +03:00
|
|
|
err =
|
|
|
|
SET_ERROR(ZFS_ERR_BADPROP);
|
2008-11-20 23:01:55 +03:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
cmn_err(CE_PANIC,
|
|
|
|
"unknown property type");
|
|
|
|
}
|
|
|
|
} else {
|
2013-03-08 22:41:28 +04:00
|
|
|
err = SET_ERROR(EINVAL);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
}
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
/* Validate permissions */
|
|
|
|
if (err == 0)
|
|
|
|
err = zfs_check_settable(dsname, pair, CRED());
|
|
|
|
|
|
|
|
if (err == 0) {
|
2017-05-10 02:21:09 +03:00
|
|
|
if (source == ZPROP_SRC_INHERITED)
|
|
|
|
err = -1; /* does not need special handling */
|
|
|
|
else
|
|
|
|
err = zfs_prop_set_special(dsname, source,
|
|
|
|
pair);
|
2010-05-29 00:45:14 +04:00
|
|
|
if (err == -1) {
|
|
|
|
/*
|
|
|
|
* For better performance we build up a list of
|
|
|
|
* properties to set in a single transaction.
|
|
|
|
*/
|
|
|
|
err = nvlist_add_nvpair(genericnvl, pair);
|
|
|
|
} else if (err != 0 && nvl != retrynvl) {
|
|
|
|
/*
|
|
|
|
* This may be a spurious error caused by
|
|
|
|
* receiving quota and reservation out of order.
|
|
|
|
* Try again in a second pass.
|
|
|
|
*/
|
|
|
|
err = nvlist_add_nvpair(retrynvl, pair);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
if (err != 0) {
|
|
|
|
if (errlist != NULL)
|
|
|
|
fnvlist_add_int32(errlist, propname, err);
|
|
|
|
rv = err;
|
|
|
|
}
|
libzfs: On FreeBSD, use MNT_NOWAIT with getfsstat
`getfsstat(2)` is used to retrieve the list of mounted file systems,
which libzfs uses when fetching properties like mountpoint, atime,
setuid, etc. The `mode` parameter may be `MNT_NOWAIT`, which uses
information in the VFS's cache, or `MNT_WAIT`, which effectively does a
`statfs` on every single mounted file system in order to fetch the most
up-to-date information. As far as I can tell, the only fields that
libzfs cares about are the filesystem's name, mountpoint, fstypename,
and mount flags. Those things are always updated on mount and unmount,
so they will always be accurate in the VFS's mount cache except in two
circumstances:
1) When a file system is busy unmounting
2) When a ZFS file system changes the value of a mount-overridable
property like atime or setuid, but doesn't remount the file system.
Right now that only happens when the property is changed by an
unprivileged user who has delegated authority to change the property
but not to mount the dataset. But perhaps libzfs could choose to do
it for other reasons in the future.
Switching to `MNT_NOWAIT` will greatly improve speed with no downside,
as long as we explicitly update the mount cache whenever we change a
mount-overridable property.
For comparison, Illumos gets this information using the native
`getmntany` and `getmntent` functions, which also use cached
information. The illumos function that would refresh the cache,
`resetmnttab`, is never called by libzfs.
And on GNU/Linux, `getmntany` and `getmntent` don't even communicate
with the kernel directly. They simply parse the file they are given,
which is usually /etc/mtab or /proc/mounts. Perhaps the implementation
of /proc/mounts is synchronous, ala MNT_WAIT; I don't know.
Sponsored-by: Axcient
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes: #12091
2021-06-08 16:36:43 +03:00
|
|
|
|
|
|
|
if (zfs_is_namespace_prop(prop))
|
|
|
|
should_update_mount_cache = B_TRUE;
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
if (nvl != retrynvl && !nvlist_empty(retrynvl)) {
|
|
|
|
nvl = retrynvl;
|
|
|
|
goto retry;
|
|
|
|
}
|
|
|
|
|
2022-02-04 22:52:10 +03:00
|
|
|
if (nvlist_empty(genericnvl))
|
|
|
|
goto out;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2022-02-04 22:52:10 +03:00
|
|
|
/*
|
|
|
|
* Try to set them all in one batch.
|
|
|
|
*/
|
|
|
|
err = dsl_props_set(dsname, source, genericnvl);
|
|
|
|
if (err == 0)
|
|
|
|
goto out;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2022-02-04 22:52:10 +03:00
|
|
|
/*
|
|
|
|
* If batching fails, we still want to set as many properties as we
|
|
|
|
* can, so try setting them individually.
|
|
|
|
*/
|
|
|
|
pair = NULL;
|
|
|
|
while ((pair = nvlist_next_nvpair(genericnvl, pair)) != NULL) {
|
|
|
|
const char *propname = nvpair_name(pair);
|
|
|
|
|
|
|
|
propval = pair;
|
|
|
|
if (nvpair_type(pair) == DATA_TYPE_NVLIST) {
|
|
|
|
nvlist_t *attrs;
|
|
|
|
attrs = fnvpair_value_nvlist(pair);
|
|
|
|
propval = fnvlist_lookup_nvpair(attrs, ZPROP_VALUE);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (nvpair_type(propval) == DATA_TYPE_STRING) {
|
|
|
|
strval = fnvpair_value_string(propval);
|
|
|
|
err = dsl_prop_set_string(dsname, propname,
|
|
|
|
source, strval);
|
|
|
|
} else if (nvpair_type(propval) == DATA_TYPE_BOOLEAN) {
|
|
|
|
err = dsl_prop_inherit(dsname, propname, source);
|
|
|
|
} else {
|
|
|
|
intval = fnvpair_value_uint64(propval);
|
|
|
|
err = dsl_prop_set_int(dsname, propname, source,
|
|
|
|
intval);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (err != 0) {
|
|
|
|
if (errlist != NULL) {
|
|
|
|
fnvlist_add_int32(errlist, propname, err);
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2022-02-04 22:52:10 +03:00
|
|
|
rv = err;
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2009-07-03 02:44:48 +04:00
|
|
|
}
|
2022-02-04 22:52:10 +03:00
|
|
|
|
|
|
|
out:
|
libzfs: On FreeBSD, use MNT_NOWAIT with getfsstat
`getfsstat(2)` is used to retrieve the list of mounted file systems,
which libzfs uses when fetching properties like mountpoint, atime,
setuid, etc. The `mode` parameter may be `MNT_NOWAIT`, which uses
information in the VFS's cache, or `MNT_WAIT`, which effectively does a
`statfs` on every single mounted file system in order to fetch the most
up-to-date information. As far as I can tell, the only fields that
libzfs cares about are the filesystem's name, mountpoint, fstypename,
and mount flags. Those things are always updated on mount and unmount,
so they will always be accurate in the VFS's mount cache except in two
circumstances:
1) When a file system is busy unmounting
2) When a ZFS file system changes the value of a mount-overridable
property like atime or setuid, but doesn't remount the file system.
Right now that only happens when the property is changed by an
unprivileged user who has delegated authority to change the property
but not to mount the dataset. But perhaps libzfs could choose to do
it for other reasons in the future.
Switching to `MNT_NOWAIT` will greatly improve speed with no downside,
as long as we explicitly update the mount cache whenever we change a
mount-overridable property.
For comparison, Illumos gets this information using the native
`getmntany` and `getmntent` functions, which also use cached
information. The illumos function that would refresh the cache,
`resetmnttab`, is never called by libzfs.
And on GNU/Linux, `getmntany` and `getmntent` don't even communicate
with the kernel directly. They simply parse the file they are given,
which is usually /etc/mtab or /proc/mounts. Perhaps the implementation
of /proc/mounts is synchronous, ala MNT_WAIT; I don't know.
Sponsored-by: Axcient
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes: #12091
2021-06-08 16:36:43 +03:00
|
|
|
if (should_update_mount_cache)
|
|
|
|
zfs_ioctl_update_mount_cache(dsname);
|
|
|
|
|
2009-07-03 02:44:48 +04:00
|
|
|
nvlist_free(genericnvl);
|
2010-05-29 00:45:14 +04:00
|
|
|
nvlist_free(retrynvl);
|
|
|
|
|
|
|
|
return (rv);
|
2009-07-03 02:44:48 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check that all the properties are valid user properties.
|
|
|
|
*/
|
|
|
|
static int
|
2019-08-27 23:45:53 +03:00
|
|
|
zfs_check_userprops(nvlist_t *nvl)
|
2009-07-03 02:44:48 +04:00
|
|
|
{
|
2010-05-29 00:45:14 +04:00
|
|
|
nvpair_t *pair = NULL;
|
2009-07-03 02:44:48 +04:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
while ((pair = nvlist_next_nvpair(nvl, pair)) != NULL) {
|
|
|
|
const char *propname = nvpair_name(pair);
|
2009-07-03 02:44:48 +04:00
|
|
|
|
|
|
|
if (!zfs_prop_user(propname) ||
|
2010-05-29 00:45:14 +04:00
|
|
|
nvpair_type(pair) != DATA_TYPE_STRING)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2009-07-03 02:44:48 +04:00
|
|
|
|
|
|
|
if (strlen(propname) >= ZAP_MAXNAMELEN)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENAMETOOLONG));
|
2009-07-03 02:44:48 +04:00
|
|
|
|
2013-12-12 02:33:41 +04:00
|
|
|
if (strlen(fnvpair_value_string(pair)) >= ZAP_MAXVALUELEN)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(E2BIG));
|
2009-07-03 02:44:48 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
static void
|
|
|
|
props_skip(nvlist_t *props, nvlist_t *skipped, nvlist_t **newprops)
|
|
|
|
{
|
|
|
|
nvpair_t *pair;
|
|
|
|
|
|
|
|
VERIFY(nvlist_alloc(newprops, NV_UNIQUE_NAME, KM_SLEEP) == 0);
|
|
|
|
|
|
|
|
pair = NULL;
|
|
|
|
while ((pair = nvlist_next_nvpair(props, pair)) != NULL) {
|
|
|
|
if (nvlist_exists(skipped, nvpair_name(pair)))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
VERIFY(nvlist_add_nvpair(*newprops, pair) == 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-09-04 16:00:57 +04:00
|
|
|
clear_received_props(const char *dsname, nvlist_t *props,
|
2010-05-29 00:45:14 +04:00
|
|
|
nvlist_t *skipped)
|
|
|
|
{
|
|
|
|
int err = 0;
|
|
|
|
nvlist_t *cleared_props = NULL;
|
|
|
|
props_skip(props, skipped, &cleared_props);
|
|
|
|
if (!nvlist_empty(cleared_props)) {
|
|
|
|
/*
|
|
|
|
* Acts on local properties until the dataset has received
|
|
|
|
* properties at least once on or after SPA_VERSION_RECVD_PROPS.
|
|
|
|
*/
|
|
|
|
zprop_source_t flags = (ZPROP_SRC_NONE |
|
2013-09-04 16:00:57 +04:00
|
|
|
(dsl_prop_get_hasrecvd(dsname) ? ZPROP_SRC_RECEIVED : 0));
|
|
|
|
err = zfs_set_prop_nvlist(dsname, flags, cleared_props, NULL);
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
nvlist_free(cleared_props);
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
2009-07-03 02:44:48 +04:00
|
|
|
* zc_value name of property to set
|
2008-11-20 23:01:55 +03:00
|
|
|
* zc_nvlist_src{_size} nvlist of properties to apply
|
2010-05-29 00:45:14 +04:00
|
|
|
* zc_cookie received properties flag
|
2008-11-20 23:01:55 +03:00
|
|
|
*
|
2010-05-29 00:45:14 +04:00
|
|
|
* outputs:
|
|
|
|
* zc_nvlist_dst{_size} error for each unapplied received property
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_set_prop(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
nvlist_t *nvl;
|
2010-05-29 00:45:14 +04:00
|
|
|
boolean_t received = zc->zc_cookie;
|
|
|
|
zprop_source_t source = (received ? ZPROP_SRC_RECEIVED :
|
|
|
|
ZPROP_SRC_LOCAL);
|
2013-08-28 15:45:09 +04:00
|
|
|
nvlist_t *errors;
|
2008-11-20 23:01:55 +03:00
|
|
|
int error;
|
|
|
|
|
|
|
|
if ((error = get_nvlist(zc->zc_nvlist_src, zc->zc_nvlist_src_size,
|
2009-07-03 02:44:48 +04:00
|
|
|
zc->zc_iflags, &nvl)) != 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
if (received) {
|
2008-12-03 23:09:06 +03:00
|
|
|
nvlist_t *origprops;
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (dsl_prop_get_received(zc->zc_name, &origprops) == 0) {
|
|
|
|
(void) clear_received_props(zc->zc_name,
|
|
|
|
origprops, nvl);
|
|
|
|
nvlist_free(origprops);
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
error = dsl_prop_set_hasrecvd(zc->zc_name);
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
errors = fnvlist_alloc();
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error == 0)
|
|
|
|
error = zfs_set_prop_nvlist(zc->zc_name, source, nvl, errors);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-08-26 20:52:39 +04:00
|
|
|
if (zc->zc_nvlist_dst != 0 && errors != NULL) {
|
2010-05-29 00:45:14 +04:00
|
|
|
(void) put_nvlist(zc, errors);
|
|
|
|
}
|
|
|
|
|
|
|
|
nvlist_free(errors);
|
2008-11-20 23:01:55 +03:00
|
|
|
nvlist_free(nvl);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
* zc_value name of property to inherit
|
2010-05-29 00:45:14 +04:00
|
|
|
* zc_cookie revert to received value if TRUE
|
2008-11-20 23:01:55 +03:00
|
|
|
*
|
|
|
|
* outputs: none
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_inherit_prop(zfs_cmd_t *zc)
|
|
|
|
{
|
2010-05-29 00:45:14 +04:00
|
|
|
const char *propname = zc->zc_value;
|
|
|
|
zfs_prop_t prop = zfs_name_to_prop(propname);
|
|
|
|
boolean_t received = zc->zc_cookie;
|
|
|
|
zprop_source_t source = (received
|
|
|
|
? ZPROP_SRC_NONE /* revert to received value, if any */
|
|
|
|
: ZPROP_SRC_INHERITED); /* explicitly inherit */
|
2017-06-02 17:17:00 +03:00
|
|
|
nvlist_t *dummy;
|
|
|
|
nvpair_t *pair;
|
|
|
|
zprop_type_t type;
|
|
|
|
int err;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2017-06-02 17:17:00 +03:00
|
|
|
if (!received) {
|
2017-05-26 21:40:44 +03:00
|
|
|
/*
|
|
|
|
* Only check this in the non-received case. We want to allow
|
|
|
|
* 'inherit -S' to revert non-inheritable properties like quota
|
|
|
|
* and reservation to the received or default values even though
|
|
|
|
* they are not considered inheritable.
|
|
|
|
*/
|
2022-06-14 21:27:53 +03:00
|
|
|
if (prop != ZPROP_USERPROP && !zfs_prop_inheritable(prop))
|
2017-05-26 21:40:44 +03:00
|
|
|
return (SET_ERROR(EINVAL));
|
2017-05-26 02:43:46 +03:00
|
|
|
}
|
|
|
|
|
2022-06-14 21:27:53 +03:00
|
|
|
if (prop == ZPROP_USERPROP) {
|
2017-06-02 17:17:00 +03:00
|
|
|
if (!zfs_prop_user(propname))
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
|
|
|
type = PROP_TYPE_STRING;
|
|
|
|
} else if (prop == ZFS_PROP_VOLSIZE || prop == ZFS_PROP_VERSION) {
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
} else {
|
|
|
|
type = zfs_prop_get_type(prop);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* zfs_prop_set_special() expects properties in the form of an
|
|
|
|
* nvpair with type info.
|
|
|
|
*/
|
|
|
|
dummy = fnvlist_alloc();
|
|
|
|
|
|
|
|
switch (type) {
|
|
|
|
case PROP_TYPE_STRING:
|
|
|
|
VERIFY(0 == nvlist_add_string(dummy, propname, ""));
|
|
|
|
break;
|
|
|
|
case PROP_TYPE_NUMBER:
|
|
|
|
case PROP_TYPE_INDEX:
|
|
|
|
VERIFY(0 == nvlist_add_uint64(dummy, propname, 0));
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
err = SET_ERROR(EINVAL);
|
|
|
|
goto errout;
|
|
|
|
}
|
|
|
|
|
|
|
|
pair = nvlist_next_nvpair(dummy, NULL);
|
|
|
|
if (pair == NULL) {
|
|
|
|
err = SET_ERROR(EINVAL);
|
|
|
|
} else {
|
|
|
|
err = zfs_prop_set_special(zc->zc_name, source, pair);
|
|
|
|
if (err == -1) /* property is not "special", needs handling */
|
|
|
|
err = dsl_prop_inherit(zc->zc_name, zc->zc_value,
|
|
|
|
source);
|
|
|
|
}
|
|
|
|
|
|
|
|
errout:
|
|
|
|
nvlist_free(dummy);
|
|
|
|
return (err);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_pool_set_props(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
nvlist_t *props;
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
2010-05-29 00:45:14 +04:00
|
|
|
nvpair_t *pair;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-08-26 20:52:42 +04:00
|
|
|
if ((error = get_nvlist(zc->zc_nvlist_src, zc->zc_nvlist_src_size,
|
|
|
|
zc->zc_iflags, &props)))
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
|
2009-02-18 23:51:31 +03:00
|
|
|
/*
|
|
|
|
* If the only property is the configfile, then just do a spa_lookup()
|
|
|
|
* to handle the faulted case.
|
|
|
|
*/
|
2010-05-29 00:45:14 +04:00
|
|
|
pair = nvlist_next_nvpair(props, NULL);
|
|
|
|
if (pair != NULL && strcmp(nvpair_name(pair),
|
2009-02-18 23:51:31 +03:00
|
|
|
zpool_prop_to_name(ZPOOL_PROP_CACHEFILE)) == 0 &&
|
2010-05-29 00:45:14 +04:00
|
|
|
nvlist_next_nvpair(props, pair) == NULL) {
|
2009-02-18 23:51:31 +03:00
|
|
|
mutex_enter(&spa_namespace_lock);
|
|
|
|
if ((spa = spa_lookup(zc->zc_name)) != NULL) {
|
|
|
|
spa_configfile_set(spa, props, B_FALSE);
|
2022-09-28 19:48:46 +03:00
|
|
|
spa_write_cachefile(spa, B_FALSE, B_TRUE, B_FALSE);
|
2009-02-18 23:51:31 +03:00
|
|
|
}
|
|
|
|
mutex_exit(&spa_namespace_lock);
|
2010-05-29 00:45:14 +04:00
|
|
|
if (spa != NULL) {
|
|
|
|
nvlist_free(props);
|
2009-02-18 23:51:31 +03:00
|
|
|
return (0);
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2009-02-18 23:51:31 +03:00
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
if ((error = spa_open(zc->zc_name, &spa, FTAG)) != 0) {
|
|
|
|
nvlist_free(props);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
error = spa_prop_set(spa, props);
|
|
|
|
|
|
|
|
nvlist_free(props);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_pool_get_props(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
|
|
|
nvlist_t *nvp = NULL;
|
|
|
|
|
2009-02-18 23:51:31 +03:00
|
|
|
if ((error = spa_open(zc->zc_name, &spa, FTAG)) != 0) {
|
|
|
|
/*
|
|
|
|
* If the pool is faulted, there may be properties we can still
|
|
|
|
* get (such as altroot and cachefile), so attempt to get them
|
|
|
|
* anyway.
|
|
|
|
*/
|
|
|
|
mutex_enter(&spa_namespace_lock);
|
|
|
|
if ((spa = spa_lookup(zc->zc_name)) != NULL)
|
|
|
|
error = spa_prop_get(spa, &nvp);
|
|
|
|
mutex_exit(&spa_namespace_lock);
|
|
|
|
} else {
|
|
|
|
error = spa_prop_get(spa, &nvp);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-08-26 20:52:39 +04:00
|
|
|
if (error == 0 && zc->zc_nvlist_dst != 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
error = put_nvlist(zc, nvp);
|
|
|
|
else
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EFAULT);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2009-02-18 23:51:31 +03:00
|
|
|
nvlist_free(nvp);
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2021-11-30 17:46:25 +03:00
|
|
|
/*
|
|
|
|
* innvl: {
|
|
|
|
* "vdevprops_set_vdev" -> guid
|
|
|
|
* "vdevprops_set_props" -> { prop -> value }
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl: propname -> error code (int32)
|
|
|
|
*/
|
|
|
|
static const zfs_ioc_key_t zfs_keys_vdev_set_props[] = {
|
|
|
|
{ZPOOL_VDEV_PROPS_SET_VDEV, DATA_TYPE_UINT64, 0},
|
|
|
|
{ZPOOL_VDEV_PROPS_SET_PROPS, DATA_TYPE_NVLIST, 0}
|
|
|
|
};
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_vdev_set_props(const char *poolname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
|
|
|
vdev_t *vd;
|
|
|
|
uint64_t vdev_guid;
|
|
|
|
|
|
|
|
/* Early validation */
|
|
|
|
if (nvlist_lookup_uint64(innvl, ZPOOL_VDEV_PROPS_SET_VDEV,
|
|
|
|
&vdev_guid) != 0)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
|
|
|
if (outnvl == NULL)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
|
|
|
if ((error = spa_open(poolname, &spa, FTAG)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
ASSERT(spa_writeable(spa));
|
|
|
|
|
|
|
|
if ((vd = spa_lookup_by_guid(spa, vdev_guid, B_TRUE)) == NULL) {
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (SET_ERROR(ENOENT));
|
|
|
|
}
|
|
|
|
|
|
|
|
error = vdev_prop_set(vd, innvl, outnvl);
|
|
|
|
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* innvl: {
|
|
|
|
* "vdevprops_get_vdev" -> guid
|
|
|
|
* (optional) "vdevprops_get_props" -> { propname -> propid }
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl: propname -> value
|
|
|
|
*/
|
|
|
|
static const zfs_ioc_key_t zfs_keys_vdev_get_props[] = {
|
|
|
|
{ZPOOL_VDEV_PROPS_GET_VDEV, DATA_TYPE_UINT64, 0},
|
|
|
|
{ZPOOL_VDEV_PROPS_GET_PROPS, DATA_TYPE_NVLIST, ZK_OPTIONAL}
|
|
|
|
};
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_vdev_get_props(const char *poolname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
|
|
|
vdev_t *vd;
|
|
|
|
uint64_t vdev_guid;
|
|
|
|
|
|
|
|
/* Early validation */
|
|
|
|
if (nvlist_lookup_uint64(innvl, ZPOOL_VDEV_PROPS_GET_VDEV,
|
|
|
|
&vdev_guid) != 0)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
|
|
|
if (outnvl == NULL)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
|
|
|
if ((error = spa_open(poolname, &spa, FTAG)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
if ((vd = spa_lookup_by_guid(spa, vdev_guid, B_TRUE)) == NULL) {
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (SET_ERROR(ENOENT));
|
|
|
|
}
|
|
|
|
|
|
|
|
error = vdev_prop_get(vd, innvl, outnvl);
|
|
|
|
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
* zc_nvlist_src{_size} nvlist of delegated permissions
|
|
|
|
* zc_perm_action allow/unallow flag
|
|
|
|
*
|
|
|
|
* outputs: none
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_set_fsacl(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
nvlist_t *fsaclnv = NULL;
|
|
|
|
|
|
|
|
if ((error = get_nvlist(zc->zc_nvlist_src, zc->zc_nvlist_src_size,
|
2009-07-03 02:44:48 +04:00
|
|
|
zc->zc_iflags, &fsaclnv)) != 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Verify nvlist is constructed correctly
|
|
|
|
*/
|
Cleanup of dead code suggested by Clang Static Analyzer (#14380)
I recently gained the ability to run Clang's static analyzer on the
linux kernel modules via a few hacks. This extended coverage to code
that was previously missed since Clang's static analyzer only looked at
code that we built in userspace. Running it against the Linux kernel
modules built from my local branch produced a total of 72 reports
against my local branch. Of those, 50 were reports of logic errors and
22 were reports of dead code. Since we already had cleaned up all of
the previous dead code reports, I felt it would be a good next step to
clean up these dead code reports. Clang did a further breakdown of the
dead code reports into:
Dead assignment 15
Dead increment 2
Dead nested assignment 5
The benefit of cleaning these up, especially in the case of dead nested
assignment, is that they can expose places where our error handling is
incorrect. A number of them were fairly straight forward. However
several were not:
In vdev_disk_physio_completion(), not only were we not using the return
value from the static function vdev_disk_dio_put(), but nothing used it,
so I changed it to return void and removed the existing (void) cast in
the other area where we call it in addition to no longer storing it to a
stack value.
In FSE_createDTable(), the function is dead code. Its helper function
FSE_freeDTable() is also dead code, as are the CPP definitions in
`module/zstd/include/zstd_compat_wrapper.h`. We just delete it all.
In zfs_zevent_wait(), we have an optimization opportunity. cv_wait_sig()
returns 0 if there are waiting signals and 1 if there are none. The
Linux SPL version literally returns `signal_pending(current) ? 0 : 1)`
and FreeBSD implements the same semantics, we can just do
`!cv_wait_sig()` in place of `signal_pending(current)` to avoid
unnecessarily calling it again.
zfs_setattr() on FreeBSD version did not have error handling issue
because the code was removed entirely from FreeBSD version. The error is
from updating the attribute directory's files. After some thought, I
decided to propapage errors on it to userspace.
In zfs_secpolicy_tmp_snapshot(), we ignore a lack of permission from the
first check in favor of checking three other permissions. I assume this
is intentional.
In zfs_create_fs(), the return value of zap_update() was not checked
despite setting an important version number. I see no backward
compatibility reason to permit failures, so we add an assertion to catch
failures. Interestingly, Linux is still using ASSERT(error == 0) from
OpenSolaris while FreeBSD has switched to the improved ASSERT0(error)
from illumos, although illumos has yet to adopt it here. ASSERT(error ==
0) was used on Linux while ASSERT0(error) was used on FreeBSD since the
entire file needs conversion and that should be the subject of
another patch.
dnode_move()'s issue was caused by us not having implemented
POINTER_IS_VALID() on Linux. We have a stub in
`include/os/linux/spl/sys/kmem_cache.h` for it, when it really should be
in `include/os/linux/spl/sys/kmem.h` to be consistent with
Illumos/OpenSolaris. FreeBSD put both `POINTER_IS_VALID()` and
`POINTER_INVALIDATE()` in `include/os/freebsd/spl/sys/kmem.h`, so we
copy what it did.
Whenever a report was in platform-specific code, I checked the FreeBSD
version to see if it also applied to FreeBSD, but it was only relevant a
few times.
Lastly, the patch that enabled Clang's static analyzer to be run on the
Linux kernel modules needs more work before it can be put into a PR. I
plan to do that in the future as part of the on-going static analysis
work that I am doing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14380
2023-01-17 20:57:12 +03:00
|
|
|
if (zfs_deleg_verify_nvlist(fsaclnv) != 0) {
|
2008-11-20 23:01:55 +03:00
|
|
|
nvlist_free(fsaclnv);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we don't have PRIV_SYS_MOUNT, then validate
|
|
|
|
* that user is allowed to hand out each permission in
|
|
|
|
* the nvlist(s)
|
|
|
|
*/
|
|
|
|
|
|
|
|
error = secpolicy_zfs(CRED());
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0) {
|
2008-11-20 23:01:55 +03:00
|
|
|
if (zc->zc_perm_action == B_FALSE) {
|
|
|
|
error = dsl_deleg_can_allow(zc->zc_name,
|
|
|
|
fsaclnv, CRED());
|
|
|
|
} else {
|
|
|
|
error = dsl_deleg_can_unallow(zc->zc_name,
|
|
|
|
fsaclnv, CRED());
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (error == 0)
|
|
|
|
error = dsl_deleg_set(zc->zc_name, fsaclnv, zc->zc_perm_action);
|
|
|
|
|
|
|
|
nvlist_free(fsaclnv);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_nvlist_src{_size} nvlist of delegated permissions
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_get_fsacl(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
nvlist_t *nvp;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
if ((error = dsl_deleg_get(zc->zc_name, &nvp)) == 0) {
|
|
|
|
error = put_nvlist(zc, nvp);
|
|
|
|
nvlist_free(nvp);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
zfs_create_cb(objset_t *os, void *arg, cred_t *cr, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
zfs_creat_t *zct = arg;
|
|
|
|
|
|
|
|
zfs_create_fs(os, cr, zct->zct_zplprops, tx);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define ZFS_PROP_UNDEFINED ((uint64_t)-1)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* inputs:
|
2008-12-03 23:09:06 +03:00
|
|
|
* os parent objset pointer (NULL if root fs)
|
2013-06-11 21:12:34 +04:00
|
|
|
* fuids_ok fuids allowed in this version of the spa?
|
|
|
|
* sa_ok SAs allowed in this version of the spa?
|
|
|
|
* createprops list of properties requested by creator
|
2008-11-20 23:01:55 +03:00
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zplprops values for the zplprops we attach to the master node object
|
2008-12-03 23:09:06 +03:00
|
|
|
* is_ci true if requested file system will be purely case-insensitive
|
2008-11-20 23:01:55 +03:00
|
|
|
*
|
|
|
|
* Determine the settings for utf8only, normalization and
|
|
|
|
* casesensitivity. Specific values may have been requested by the
|
|
|
|
* creator and/or we can inherit values from the parent dataset. If
|
|
|
|
* the file system is of too early a vintage, a creator can not
|
|
|
|
* request settings for these properties, even if the requested
|
|
|
|
* setting is the default value. We don't actually want to create dsl
|
|
|
|
* properties for these, so remove them from the source nvlist after
|
|
|
|
* processing.
|
|
|
|
*/
|
|
|
|
static int
|
2009-07-03 02:44:48 +04:00
|
|
|
zfs_fill_zplprops_impl(objset_t *os, uint64_t zplver,
|
2010-05-29 00:45:14 +04:00
|
|
|
boolean_t fuids_ok, boolean_t sa_ok, nvlist_t *createprops,
|
|
|
|
nvlist_t *zplprops, boolean_t *is_ci)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
|
|
|
uint64_t sense = ZFS_PROP_UNDEFINED;
|
|
|
|
uint64_t norm = ZFS_PROP_UNDEFINED;
|
|
|
|
uint64_t u8 = ZFS_PROP_UNDEFINED;
|
2012-04-08 21:18:48 +04:00
|
|
|
int error;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
ASSERT(zplprops != NULL);
|
|
|
|
|
2019-02-09 02:44:15 +03:00
|
|
|
/* parent dataset must be a filesystem */
|
2017-04-14 00:32:08 +03:00
|
|
|
if (os != NULL && os->os_phys->os_type != DMU_OST_ZFS)
|
2019-02-09 02:44:15 +03:00
|
|
|
return (SET_ERROR(ZFS_ERR_WRONG_PARENT));
|
2017-04-14 00:32:08 +03:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* Pull out creator prop choices, if any.
|
|
|
|
*/
|
|
|
|
if (createprops) {
|
2008-12-03 23:09:06 +03:00
|
|
|
(void) nvlist_lookup_uint64(createprops,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_VERSION), &zplver);
|
2008-11-20 23:01:55 +03:00
|
|
|
(void) nvlist_lookup_uint64(createprops,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_NORMALIZE), &norm);
|
|
|
|
(void) nvlist_remove_all(createprops,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_NORMALIZE));
|
|
|
|
(void) nvlist_lookup_uint64(createprops,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_UTF8ONLY), &u8);
|
|
|
|
(void) nvlist_remove_all(createprops,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_UTF8ONLY));
|
|
|
|
(void) nvlist_lookup_uint64(createprops,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_CASE), &sense);
|
|
|
|
(void) nvlist_remove_all(createprops,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_CASE));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2008-12-03 23:09:06 +03:00
|
|
|
* If the zpl version requested is whacky or the file system
|
|
|
|
* or pool is version is too "young" to support normalization
|
|
|
|
* and the creator tried to set a value for one of the props,
|
|
|
|
* error out.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
2008-12-03 23:09:06 +03:00
|
|
|
if ((zplver < ZPL_VERSION_INITIAL || zplver > ZPL_VERSION) ||
|
|
|
|
(zplver >= ZPL_VERSION_FUID && !fuids_ok) ||
|
2010-05-29 00:45:14 +04:00
|
|
|
(zplver >= ZPL_VERSION_SA && !sa_ok) ||
|
2008-12-03 23:09:06 +03:00
|
|
|
(zplver < ZPL_VERSION_NORMALIZATION &&
|
2008-11-20 23:01:55 +03:00
|
|
|
(norm != ZFS_PROP_UNDEFINED || u8 != ZFS_PROP_UNDEFINED ||
|
2008-12-03 23:09:06 +03:00
|
|
|
sense != ZFS_PROP_UNDEFINED)))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Put the version in the zplprops
|
|
|
|
*/
|
|
|
|
VERIFY(nvlist_add_uint64(zplprops,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_VERSION), zplver) == 0);
|
|
|
|
|
2012-04-08 21:18:48 +04:00
|
|
|
if (norm == ZFS_PROP_UNDEFINED &&
|
|
|
|
(error = zfs_get_zplprop(os, ZFS_PROP_NORMALIZE, &norm)) != 0)
|
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
VERIFY(nvlist_add_uint64(zplprops,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_NORMALIZE), norm) == 0);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we're normalizing, names must always be valid UTF-8 strings.
|
|
|
|
*/
|
|
|
|
if (norm)
|
|
|
|
u8 = 1;
|
2012-04-08 21:18:48 +04:00
|
|
|
if (u8 == ZFS_PROP_UNDEFINED &&
|
|
|
|
(error = zfs_get_zplprop(os, ZFS_PROP_UTF8ONLY, &u8)) != 0)
|
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
VERIFY(nvlist_add_uint64(zplprops,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_UTF8ONLY), u8) == 0);
|
|
|
|
|
2012-04-08 21:18:48 +04:00
|
|
|
if (sense == ZFS_PROP_UNDEFINED &&
|
|
|
|
(error = zfs_get_zplprop(os, ZFS_PROP_CASE, &sense)) != 0)
|
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
VERIFY(nvlist_add_uint64(zplprops,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_CASE), sense) == 0);
|
|
|
|
|
|
|
|
if (is_ci)
|
|
|
|
*is_ci = (sense == ZFS_CASE_INSENSITIVE);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
static int
|
|
|
|
zfs_fill_zplprops(const char *dataset, nvlist_t *createprops,
|
|
|
|
nvlist_t *zplprops, boolean_t *is_ci)
|
|
|
|
{
|
2010-05-29 00:45:14 +04:00
|
|
|
boolean_t fuids_ok, sa_ok;
|
2008-12-03 23:09:06 +03:00
|
|
|
uint64_t zplver = ZPL_VERSION;
|
|
|
|
objset_t *os = NULL;
|
2016-06-16 00:28:36 +03:00
|
|
|
char parentname[ZFS_MAX_DATASET_NAME_LEN];
|
2010-05-29 00:45:14 +04:00
|
|
|
spa_t *spa;
|
|
|
|
uint64_t spa_vers;
|
2008-12-03 23:09:06 +03:00
|
|
|
int error;
|
|
|
|
|
2019-02-09 02:44:15 +03:00
|
|
|
zfs_get_parent(dataset, parentname, sizeof (parentname));
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
if ((error = spa_open(dataset, &spa, FTAG)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
spa_vers = spa_version(spa);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
|
|
|
|
zplver = zfs_zpl_version_map(spa_vers);
|
|
|
|
fuids_ok = (zplver >= ZPL_VERSION_FUID);
|
|
|
|
sa_ok = (zplver >= ZPL_VERSION_SA);
|
2008-12-03 23:09:06 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Open parent object set so we can inherit zplprop values.
|
|
|
|
*/
|
2010-05-29 00:45:14 +04:00
|
|
|
if ((error = dmu_objset_hold(parentname, FTAG, &os)) != 0)
|
2008-12-03 23:09:06 +03:00
|
|
|
return (error);
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
error = zfs_fill_zplprops_impl(os, zplver, fuids_ok, sa_ok, createprops,
|
2008-12-03 23:09:06 +03:00
|
|
|
zplprops, is_ci);
|
2010-05-29 00:45:14 +04:00
|
|
|
dmu_objset_rele(os, FTAG);
|
2008-12-03 23:09:06 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_fill_zplprops_root(uint64_t spa_vers, nvlist_t *createprops,
|
|
|
|
nvlist_t *zplprops, boolean_t *is_ci)
|
|
|
|
{
|
2010-05-29 00:45:14 +04:00
|
|
|
boolean_t fuids_ok;
|
|
|
|
boolean_t sa_ok;
|
2008-12-03 23:09:06 +03:00
|
|
|
uint64_t zplver = ZPL_VERSION;
|
|
|
|
int error;
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
zplver = zfs_zpl_version_map(spa_vers);
|
|
|
|
fuids_ok = (zplver >= ZPL_VERSION_FUID);
|
|
|
|
sa_ok = (zplver >= ZPL_VERSION_SA);
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
error = zfs_fill_zplprops_impl(NULL, zplver, fuids_ok, sa_ok,
|
|
|
|
createprops, zplprops, is_ci);
|
2008-12-03 23:09:06 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
2013-08-28 15:45:09 +04:00
|
|
|
* innvl: {
|
|
|
|
* "type" -> dmu_objset_type_t (int32)
|
|
|
|
* (optional) "props" -> { prop -> value }
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
* (optional) "hidden_args" -> { "wkeydata" -> value }
|
|
|
|
* raw uint8_t array of encryption wrapping key data (32 bytes)
|
2013-08-28 15:45:09 +04:00
|
|
|
* }
|
2008-11-20 23:01:55 +03:00
|
|
|
*
|
2013-08-28 15:45:09 +04:00
|
|
|
* outnvl: propname -> error code (int32)
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
|
|
|
|
static const zfs_ioc_key_t zfs_keys_create[] = {
|
|
|
|
{"type", DATA_TYPE_INT32, 0},
|
|
|
|
{"props", DATA_TYPE_NVLIST, ZK_OPTIONAL},
|
|
|
|
{"hidden_args", DATA_TYPE_NVLIST, ZK_OPTIONAL},
|
|
|
|
};
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_ioc_create(const char *fsname, nvlist_t *innvl, nvlist_t *outnvl)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
|
|
|
int error = 0;
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_creat_t zct = { 0 };
|
2008-11-20 23:01:55 +03:00
|
|
|
nvlist_t *nvprops = NULL;
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
nvlist_t *hidden_args = NULL;
|
2008-11-20 23:01:55 +03:00
|
|
|
void (*cbfunc)(objset_t *os, void *arg, cred_t *cr, dmu_tx_t *tx);
|
2013-08-28 15:45:09 +04:00
|
|
|
dmu_objset_type_t type;
|
|
|
|
boolean_t is_insensitive = B_FALSE;
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dsl_crypto_params_t *dcp = NULL;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
type = (dmu_objset_type_t)fnvlist_lookup_int32(innvl, "type");
|
2013-08-28 15:45:09 +04:00
|
|
|
(void) nvlist_lookup_nvlist(innvl, "props", &nvprops);
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
(void) nvlist_lookup_nvlist(innvl, ZPOOL_HIDDEN_ARGS, &hidden_args);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
switch (type) {
|
2008-11-20 23:01:55 +03:00
|
|
|
case DMU_OST_ZFS:
|
|
|
|
cbfunc = zfs_create_cb;
|
|
|
|
break;
|
|
|
|
|
|
|
|
case DMU_OST_ZVOL:
|
|
|
|
cbfunc = zvol_create_cb;
|
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
|
|
|
cbfunc = NULL;
|
|
|
|
break;
|
|
|
|
}
|
2013-08-28 15:45:09 +04:00
|
|
|
if (strchr(fsname, '@') ||
|
|
|
|
strchr(fsname, '%'))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
zct.zct_props = nvprops;
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
if (cbfunc == NULL)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
if (type == DMU_OST_ZVOL) {
|
|
|
|
uint64_t volsize, volblocksize;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
if (nvprops == NULL)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2013-08-28 15:45:09 +04:00
|
|
|
if (nvlist_lookup_uint64(nvprops,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_VOLSIZE), &volsize) != 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
if ((error = nvlist_lookup_uint64(nvprops,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_VOLBLOCKSIZE),
|
|
|
|
&volblocksize)) != 0 && error != ENOENT)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
if (error != 0)
|
|
|
|
volblocksize = zfs_prop_default_numeric(
|
|
|
|
ZFS_PROP_VOLBLOCKSIZE);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2015-08-25 00:18:48 +03:00
|
|
|
if ((error = zvol_check_volblocksize(fsname,
|
2013-08-28 15:45:09 +04:00
|
|
|
volblocksize)) != 0 ||
|
|
|
|
(error = zvol_check_volsize(volsize,
|
|
|
|
volblocksize)) != 0)
|
|
|
|
return (error);
|
|
|
|
} else if (type == DMU_OST_ZFS) {
|
|
|
|
int error;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
/*
|
|
|
|
* We have to have normalization and
|
|
|
|
* case-folding flags correct when we do the
|
|
|
|
* file system creation, so go figure them out
|
|
|
|
* now.
|
|
|
|
*/
|
|
|
|
VERIFY(nvlist_alloc(&zct.zct_zplprops,
|
|
|
|
NV_UNIQUE_NAME, KM_SLEEP) == 0);
|
|
|
|
error = zfs_fill_zplprops(fsname, nvprops,
|
|
|
|
zct.zct_zplprops, &is_insensitive);
|
|
|
|
if (error != 0) {
|
|
|
|
nvlist_free(zct.zct_zplprops);
|
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
error = dsl_crypto_params_create_nvlist(DCP_CMD_NONE, nvprops,
|
|
|
|
hidden_args, &dcp);
|
|
|
|
if (error != 0) {
|
|
|
|
nvlist_free(zct.zct_zplprops);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
error = dmu_objset_create(fsname, type,
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
is_insensitive ? DS_FLAG_CI_DATASET : 0, dcp, cbfunc, &zct);
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
nvlist_free(zct.zct_zplprops);
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dsl_crypto_params_free(dcp, !!error);
|
2013-08-28 15:45:09 +04:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* It would be nice to do this atomically.
|
|
|
|
*/
|
|
|
|
if (error == 0) {
|
2013-08-28 15:45:09 +04:00
|
|
|
error = zfs_set_prop_nvlist(fsname, ZPROP_SRC_LOCAL,
|
|
|
|
nvprops, outnvl);
|
2016-06-07 19:16:52 +03:00
|
|
|
if (error != 0) {
|
|
|
|
spa_t *spa;
|
|
|
|
int error2;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Volumes will return EBUSY and cannot be destroyed
|
async zvol minor node creation interferes with receive
When we finish a zfs receive, dmu_recv_end_sync() calls
zvol_create_minors(async=TRUE). This kicks off some other threads that
create the minor device nodes (in /dev/zvol/poolname/...). These async
threads call zvol_prefetch_minors_impl() and zvol_create_minor(), which
both call dmu_objset_own(), which puts a "long hold" on the dataset.
Since the zvol minor node creation is asynchronous, this can happen
after the `ZFS_IOC_RECV[_NEW]` ioctl and `zfs receive` process have
completed.
After the first receive ioctl has completed, userland may attempt to do
another receive into the same dataset (e.g. the next incremental
stream). This second receive and the asynchronous minor node creation
can interfere with one another in several different ways, because they
both require exclusive access to the dataset:
1. When the second receive is finishing up, dmu_recv_end_check() does
dsl_dataset_handoff_check(), which can fail with EBUSY if the async
minor node creation already has a "long hold" on this dataset. This
causes the 2nd receive to fail.
2. The async udev rule can fail if zvol_id and/or systemd-udevd try to
open the device while the the second receive's async attempt at minor
node creation owns the dataset (via zvol_prefetch_minors_impl). This
causes the minor node (/dev/zd*) to exist, but the udev-generated
/dev/zvol/... to not exist.
3. The async minor node creation can silently fail with EBUSY if the
first receive's zvol_create_minor() trys to own the dataset while the
second receive's zvol_prefetch_minors_impl already owns the dataset.
To address these problems, this change synchronously creates the minor
node. To avoid the lock ordering problems that the asynchrony was
introduced to fix (see #3681), we create the minor nodes from open
context, with no locks held, rather than from syncing contex as was
originally done.
Implementation notes:
We generally do not need to traverse children or prefetch anything (e.g.
when running the recv, snapshot, create, or clone subcommands of zfs).
We only need recursion when importing/opening a pool and when loading
encryption keys. The existing recursive, asynchronous, prefetching code
is preserved for use in these cases.
Channel programs may need to create zvol minor nodes, when creating a
snapshot of a zvol with the snapdev property set. We figure out what
snapshots are created when running the LUA program in syncing context.
In this case we need to remember what snapshots were created, and then
try to create their minor nodes from open context, after the LUA code
has completed.
There are additional zvol use cases that asynchronously own the dataset,
which can cause similar problems. E.g. changing the volmode or snapdev
properties. These are less problematic because they are not recursive
and don't touch datasets that are not involved in the operation, there
is still potential for interference with subsequent operations. In the
future, these cases should be similarly converted to create the zvol
minor node synchronously from open context.
The async tasks of removing and renaming minors do not own the objset,
so they do not have this problem. However, it may make sense to also
convert these operations to happen synchronously from open context, in
the future.
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-65948
Closes #7863
Closes #9885
2020-02-03 20:33:14 +03:00
|
|
|
* until all asynchronous minor handling (e.g. from
|
|
|
|
* setting the volmode property) has completed. Wait for
|
|
|
|
* the spa_zvol_taskq to drain then retry.
|
2016-06-07 19:16:52 +03:00
|
|
|
*/
|
|
|
|
error2 = dsl_destroy_head(fsname);
|
|
|
|
while ((error2 == EBUSY) && (type == DMU_OST_ZVOL)) {
|
|
|
|
error2 = spa_open(fsname, &spa, FTAG);
|
|
|
|
if (error2 == 0) {
|
|
|
|
taskq_wait(spa->spa_zvol_taskq);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
}
|
|
|
|
error2 = dsl_destroy_head(fsname);
|
|
|
|
}
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2013-08-28 15:45:09 +04:00
|
|
|
* innvl: {
|
|
|
|
* "origin" -> name of origin snapshot
|
|
|
|
* (optional) "props" -> { prop -> value }
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
* (optional) "hidden_args" -> { "wkeydata" -> value }
|
|
|
|
* raw uint8_t array of encryption wrapping key data (32 bytes)
|
2013-08-28 15:45:09 +04:00
|
|
|
* }
|
2008-11-20 23:01:55 +03:00
|
|
|
*
|
2010-05-29 00:45:14 +04:00
|
|
|
* outputs:
|
2013-08-28 15:45:09 +04:00
|
|
|
* outnvl: propname -> error code (int32)
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_clone[] = {
|
|
|
|
{"origin", DATA_TYPE_STRING, 0},
|
|
|
|
{"props", DATA_TYPE_NVLIST, ZK_OPTIONAL},
|
|
|
|
{"hidden_args", DATA_TYPE_NVLIST, ZK_OPTIONAL},
|
|
|
|
};
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_ioc_clone(const char *fsname, nvlist_t *innvl, nvlist_t *outnvl)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-08-28 15:45:09 +04:00
|
|
|
int error = 0;
|
2008-12-03 23:09:06 +03:00
|
|
|
nvlist_t *nvprops = NULL;
|
2020-10-03 03:44:10 +03:00
|
|
|
const char *origin_name;
|
2008-12-03 23:09:06 +03:00
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
origin_name = fnvlist_lookup_string(innvl, "origin");
|
2013-08-28 15:45:09 +04:00
|
|
|
(void) nvlist_lookup_nvlist(innvl, "props", &nvprops);
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
if (strchr(fsname, '@') ||
|
|
|
|
strchr(fsname, '%'))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2013-08-28 15:45:09 +04:00
|
|
|
|
|
|
|
if (dataset_namecheck(origin_name, NULL, NULL) != 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dmu_objset_clone(fsname, origin_name);
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
/*
|
|
|
|
* It would be nice to do this atomically.
|
|
|
|
*/
|
|
|
|
if (error == 0) {
|
|
|
|
error = zfs_set_prop_nvlist(fsname, ZPROP_SRC_LOCAL,
|
|
|
|
nvprops, outnvl);
|
|
|
|
if (error != 0)
|
2013-09-04 16:00:57 +04:00
|
|
|
(void) dsl_destroy_head(fsname);
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
2013-08-28 15:45:09 +04:00
|
|
|
return (error);
|
|
|
|
}
|
2009-07-03 02:44:48 +04:00
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_remap[] = {
|
|
|
|
/* no nvl keys */
|
|
|
|
};
|
|
|
|
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_remap(const char *fsname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
2019-06-25 02:44:01 +03:00
|
|
|
/* This IOCTL is no longer supported. */
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) fsname, (void) innvl, (void) outnvl;
|
2019-06-25 02:44:01 +03:00
|
|
|
return (0);
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
/*
|
|
|
|
* innvl: {
|
|
|
|
* "snaps" -> { snapshot1, snapshot2 }
|
|
|
|
* (optional) "props" -> { prop -> value (string) }
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl: snapshot -> error code (int32)
|
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_snapshot[] = {
|
|
|
|
{"snaps", DATA_TYPE_NVLIST, 0},
|
|
|
|
{"props", DATA_TYPE_NVLIST, ZK_OPTIONAL},
|
|
|
|
};
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
static int
|
|
|
|
zfs_ioc_snapshot(const char *poolname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
|
|
|
nvlist_t *snaps;
|
|
|
|
nvlist_t *props = NULL;
|
|
|
|
int error, poollen;
|
2017-11-04 23:25:13 +03:00
|
|
|
nvpair_t *pair;
|
2009-07-03 02:44:48 +04:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
(void) nvlist_lookup_nvlist(innvl, "props", &props);
|
|
|
|
if (!nvlist_empty(props) &&
|
|
|
|
zfs_earlier_version(poolname, SPA_VERSION_SNAP_PROPS))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2019-08-27 23:45:53 +03:00
|
|
|
if ((error = zfs_check_userprops(props)) != 0)
|
|
|
|
return (error);
|
2013-08-28 15:45:09 +04:00
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
snaps = fnvlist_lookup_nvlist(innvl, "snaps");
|
2013-08-28 15:45:09 +04:00
|
|
|
poollen = strlen(poolname);
|
|
|
|
for (pair = nvlist_next_nvpair(snaps, NULL); pair != NULL;
|
|
|
|
pair = nvlist_next_nvpair(snaps, pair)) {
|
|
|
|
const char *name = nvpair_name(pair);
|
2019-08-27 23:45:53 +03:00
|
|
|
char *cp = strchr(name, '@');
|
2013-08-28 15:45:09 +04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The snap name must contain an @, and the part after it must
|
|
|
|
* contain only valid characters.
|
|
|
|
*/
|
2013-12-12 02:33:41 +04:00
|
|
|
if (cp == NULL ||
|
|
|
|
zfs_component_namecheck(cp + 1, NULL, NULL) != 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2013-08-28 15:45:09 +04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The snap must be in the specified pool.
|
|
|
|
*/
|
|
|
|
if (strncmp(name, poolname, poollen) != 0 ||
|
|
|
|
(name[poollen] != '/' && name[poollen] != '@'))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EXDEV));
|
2013-08-28 15:45:09 +04:00
|
|
|
|
2019-08-27 23:45:53 +03:00
|
|
|
/*
|
|
|
|
* Check for permission to set the properties on the fs.
|
|
|
|
*/
|
|
|
|
if (!nvlist_empty(props)) {
|
|
|
|
*cp = '\0';
|
|
|
|
error = zfs_secpolicy_write_perms(name,
|
|
|
|
ZFS_DELEG_PERM_USERPROP, CRED());
|
|
|
|
*cp = '@';
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
/* This must be the only snap of this fs. */
|
2017-11-04 23:25:13 +03:00
|
|
|
for (nvpair_t *pair2 = nvlist_next_nvpair(snaps, pair);
|
2013-08-28 15:45:09 +04:00
|
|
|
pair2 != NULL; pair2 = nvlist_next_nvpair(snaps, pair2)) {
|
|
|
|
if (strncmp(name, nvpair_name(pair2), cp - name + 1)
|
|
|
|
== 0) {
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EXDEV));
|
2013-08-28 15:45:09 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_snapshot(snaps, props, outnvl);
|
2013-12-07 02:20:22 +04:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* innvl: "message" -> string
|
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_log_history[] = {
|
|
|
|
{"message", DATA_TYPE_STRING, 0},
|
|
|
|
};
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
static int
|
|
|
|
zfs_ioc_log_history(const char *unused, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) unused, (void) outnvl;
|
2020-10-03 03:44:10 +03:00
|
|
|
const char *message;
|
|
|
|
char *poolname;
|
2013-08-28 15:45:09 +04:00
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The poolname in the ioctl is not set, we get it from the TSD,
|
|
|
|
* which was set at the end of the last successful ioctl that allows
|
|
|
|
* logging. The secpolicy func already checked that it is set.
|
|
|
|
* Only one log ioctl is allowed after each successful ioctl, so
|
|
|
|
* we clear the TSD here.
|
|
|
|
*/
|
|
|
|
poolname = tsd_get(zfs_allow_log_key);
|
2016-07-27 09:58:17 +03:00
|
|
|
if (poolname == NULL)
|
|
|
|
return (SET_ERROR(EINVAL));
|
2013-08-28 15:45:09 +04:00
|
|
|
(void) tsd_set(zfs_allow_log_key, NULL);
|
|
|
|
error = spa_open(poolname, &spa, FTAG);
|
2019-10-10 19:47:06 +03:00
|
|
|
kmem_strfree(poolname);
|
2013-08-28 15:45:09 +04:00
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
message = fnvlist_lookup_string(innvl, "message");
|
2013-08-28 15:45:09 +04:00
|
|
|
|
|
|
|
if (spa_version(spa) < SPA_VERSION_ZPOOL_HISTORY) {
|
|
|
|
spa_close(spa, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2013-08-28 15:45:09 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
error = spa_history_log(spa, message);
|
|
|
|
spa_close(spa, FTAG);
|
2008-12-03 23:09:06 +03:00
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2020-05-07 19:36:33 +03:00
|
|
|
/*
|
|
|
|
* This ioctl is used to set the bootenv configuration on the current
|
|
|
|
* pool. This configuration is stored in the second padding area of the label,
|
2020-09-16 01:42:27 +03:00
|
|
|
* and it is used by the bootloader(s) to store the bootloader and/or system
|
|
|
|
* specific data.
|
|
|
|
* The data is stored as nvlist data stream, and is protected by
|
|
|
|
* an embedded checksum.
|
|
|
|
* The version can have two possible values:
|
|
|
|
* VB_RAW: nvlist should have key GRUB_ENVMAP, value DATA_TYPE_STRING.
|
|
|
|
* VB_NVLIST: nvlist with arbitrary <key, value> pairs.
|
2020-05-07 19:36:33 +03:00
|
|
|
*/
|
|
|
|
static const zfs_ioc_key_t zfs_keys_set_bootenv[] = {
|
2020-09-16 01:42:27 +03:00
|
|
|
{"version", DATA_TYPE_UINT64, 0},
|
|
|
|
{"<keys>", DATA_TYPE_ANY, ZK_OPTIONAL | ZK_WILDCARDLIST},
|
2020-05-07 19:36:33 +03:00
|
|
|
};
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_set_bootenv(const char *name, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
spa_t *spa;
|
|
|
|
|
|
|
|
if ((error = spa_open(name, &spa, FTAG)) != 0)
|
|
|
|
return (error);
|
|
|
|
spa_vdev_state_enter(spa, SCL_ALL);
|
2020-09-16 01:42:27 +03:00
|
|
|
error = vdev_label_write_bootenv(spa->spa_root_vdev, innvl);
|
2020-05-07 19:36:33 +03:00
|
|
|
(void) spa_vdev_state_exit(spa, NULL, 0);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static const zfs_ioc_key_t zfs_keys_get_bootenv[] = {
|
|
|
|
/* no nvl keys */
|
|
|
|
};
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_get_bootenv(const char *name, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
if ((error = spa_open(name, &spa, FTAG)) != 0)
|
|
|
|
return (error);
|
|
|
|
spa_vdev_state_enter(spa, SCL_ALL);
|
|
|
|
error = vdev_label_read_bootenv(spa->spa_root_vdev, outnvl);
|
|
|
|
(void) spa_vdev_state_exit(spa, NULL, 0);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2011-11-11 11:15:53 +04:00
|
|
|
/*
|
2013-09-04 16:00:57 +04:00
|
|
|
* The dp_config_rwlock must not be held when calling this, because the
|
|
|
|
* unmount may need to write out data.
|
|
|
|
*
|
|
|
|
* This function is best-effort. Callers must deal gracefully if it
|
|
|
|
* remains mounted (or is remounted after this call).
|
2013-06-11 21:13:43 +04:00
|
|
|
*
|
2015-04-25 02:21:13 +03:00
|
|
|
* Returns 0 if the argument is not a snapshot, or it is not currently a
|
|
|
|
* filesystem, or we were able to unmount it. Returns error code otherwise.
|
2011-11-11 11:15:53 +04:00
|
|
|
*/
|
2018-02-08 19:32:45 +03:00
|
|
|
void
|
2013-09-04 16:00:57 +04:00
|
|
|
zfs_unmount_snap(const char *snapname)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2015-04-25 02:21:13 +03:00
|
|
|
if (strchr(snapname, '@') == NULL)
|
2018-02-08 19:32:45 +03:00
|
|
|
return;
|
2011-11-11 11:15:53 +04:00
|
|
|
|
2020-10-03 03:44:10 +03:00
|
|
|
(void) zfsctl_snapshot_unmount(snapname, MNT_FORCE);
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_unmount_snap_cb(const char *snapname, void *arg)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) arg;
|
2018-02-08 19:32:45 +03:00
|
|
|
zfs_unmount_snap(snapname);
|
|
|
|
return (0);
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* When a clone is destroyed, its origin may also need to be destroyed,
|
|
|
|
* in which case it must be unmounted. This routine will do that unmount
|
|
|
|
* if necessary.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
zfs_destroy_unmount_origin(const char *fsname)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
objset_t *os;
|
|
|
|
dsl_dataset_t *ds;
|
|
|
|
|
|
|
|
error = dmu_objset_hold(fsname, FTAG, &os);
|
|
|
|
if (error != 0)
|
|
|
|
return;
|
|
|
|
ds = dmu_objset_ds(os);
|
|
|
|
if (dsl_dir_is_clone(ds->ds_dir) && DS_IS_DEFER_DESTROY(ds->ds_prev)) {
|
2016-06-16 00:28:36 +03:00
|
|
|
char originname[ZFS_MAX_DATASET_NAME_LEN];
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_name(ds->ds_prev, originname);
|
|
|
|
dmu_objset_rele(os, FTAG);
|
2018-02-08 19:32:45 +03:00
|
|
|
zfs_unmount_snap(originname);
|
2013-09-04 16:00:57 +04:00
|
|
|
} else {
|
|
|
|
dmu_objset_rele(os, FTAG);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2013-08-28 15:45:09 +04:00
|
|
|
* innvl: {
|
|
|
|
* "snaps" -> { snapshot1, snapshot2 }
|
|
|
|
* (optional boolean) "defer"
|
|
|
|
* }
|
2008-11-20 23:01:55 +03:00
|
|
|
*
|
2013-08-28 15:45:09 +04:00
|
|
|
* outnvl: snapshot -> error code (int32)
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_destroy_snaps[] = {
|
|
|
|
{"snaps", DATA_TYPE_NVLIST, 0},
|
2019-09-27 20:46:28 +03:00
|
|
|
{"defer", DATA_TYPE_BOOLEAN, ZK_OPTIONAL},
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
};
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_ioc_destroy_snaps(const char *poolname, nvlist_t *innvl, nvlist_t *outnvl)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2019-09-27 20:46:28 +03:00
|
|
|
int poollen;
|
2013-08-28 15:45:09 +04:00
|
|
|
nvlist_t *snaps;
|
2011-11-17 22:14:36 +04:00
|
|
|
nvpair_t *pair;
|
2013-08-28 15:45:09 +04:00
|
|
|
boolean_t defer;
|
2019-09-27 20:46:28 +03:00
|
|
|
spa_t *spa;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
snaps = fnvlist_lookup_nvlist(innvl, "snaps");
|
2013-08-28 15:45:09 +04:00
|
|
|
defer = nvlist_exists(innvl, "defer");
|
2011-11-17 22:14:36 +04:00
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
poollen = strlen(poolname);
|
2013-08-28 15:45:09 +04:00
|
|
|
for (pair = nvlist_next_nvpair(snaps, NULL); pair != NULL;
|
|
|
|
pair = nvlist_next_nvpair(snaps, pair)) {
|
2019-09-27 20:46:28 +03:00
|
|
|
const char *name = nvpair_name(pair);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The snap must be in the specified pool to prevent the
|
|
|
|
* invalid removal of zvol minors below.
|
|
|
|
*/
|
|
|
|
if (strncmp(name, poolname, poollen) != 0 ||
|
|
|
|
(name[poollen] != '/' && name[poollen] != '@'))
|
|
|
|
return (SET_ERROR(EXDEV));
|
|
|
|
|
2018-02-08 19:32:45 +03:00
|
|
|
zfs_unmount_snap(nvpair_name(pair));
|
2019-09-27 20:46:28 +03:00
|
|
|
if (spa_open(name, &spa, FTAG) == 0) {
|
|
|
|
zvol_remove_minors(spa, name, B_TRUE);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
}
|
2013-12-12 02:33:41 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
return (dsl_destroy_snapshots_nvl(snaps, defer, outnvl));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2019-11-11 10:24:14 +03:00
|
|
|
* Create bookmarks. The bookmark names are of the form <fs>#<bmark>.
|
|
|
|
* All bookmarks and snapshots must be in the same pool.
|
|
|
|
* dsl_bookmark_create_nvl_validate describes the nvlist schema in more detail.
|
2013-12-12 02:33:41 +04:00
|
|
|
*
|
|
|
|
* innvl: {
|
2019-11-11 10:24:14 +03:00
|
|
|
* new_bookmark1 -> existing_snapshot,
|
|
|
|
* new_bookmark2 -> existing_bookmark,
|
2013-12-12 02:33:41 +04:00
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl: bookmark -> error code (int32)
|
|
|
|
*
|
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_bookmark[] = {
|
|
|
|
{"<bookmark>...", DATA_TYPE_STRING, ZK_WILDCARDLIST},
|
|
|
|
};
|
|
|
|
|
2013-12-12 02:33:41 +04:00
|
|
|
static int
|
|
|
|
zfs_ioc_bookmark(const char *poolname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) poolname;
|
2013-12-12 02:33:41 +04:00
|
|
|
return (dsl_bookmark_create(innvl, outnvl));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* innvl: {
|
|
|
|
* property 1, property 2, ...
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl: {
|
|
|
|
* bookmark name 1 -> { property 1, property 2, ... },
|
|
|
|
* bookmark name 2 -> { property 1, property 2, ... }
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_get_bookmarks[] = {
|
|
|
|
{"<property>...", DATA_TYPE_BOOLEAN, ZK_WILDCARDLIST | ZK_OPTIONAL},
|
|
|
|
};
|
|
|
|
|
2013-12-12 02:33:41 +04:00
|
|
|
static int
|
|
|
|
zfs_ioc_get_bookmarks(const char *fsname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
|
|
|
return (dsl_get_bookmarks(fsname, innvl, outnvl));
|
|
|
|
}
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
/*
|
|
|
|
* innvl is not used.
|
|
|
|
*
|
|
|
|
* outnvl: {
|
|
|
|
* property 1, property 2, ...
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
static const zfs_ioc_key_t zfs_keys_get_bookmark_props[] = {
|
|
|
|
/* no nvl keys */
|
|
|
|
};
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_get_bookmark_props(const char *bookmark, nvlist_t *innvl,
|
|
|
|
nvlist_t *outnvl)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
char fsname[ZFS_MAX_DATASET_NAME_LEN];
|
|
|
|
char *bmname;
|
|
|
|
|
|
|
|
bmname = strchr(bookmark, '#');
|
|
|
|
if (bmname == NULL)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
bmname++;
|
|
|
|
|
|
|
|
(void) strlcpy(fsname, bookmark, sizeof (fsname));
|
|
|
|
*(strchr(fsname, '#')) = '\0';
|
|
|
|
|
|
|
|
return (dsl_get_bookmark_props(fsname, bmname, outnvl));
|
|
|
|
}
|
|
|
|
|
2013-12-12 02:33:41 +04:00
|
|
|
/*
|
|
|
|
* innvl: {
|
|
|
|
* bookmark name 1, bookmark name 2
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl: bookmark -> error code (int32)
|
|
|
|
*
|
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_destroy_bookmarks[] = {
|
|
|
|
{"<bookmark>...", DATA_TYPE_BOOLEAN, ZK_WILDCARDLIST},
|
|
|
|
};
|
|
|
|
|
2013-12-12 02:33:41 +04:00
|
|
|
static int
|
|
|
|
zfs_ioc_destroy_bookmarks(const char *poolname, nvlist_t *innvl,
|
|
|
|
nvlist_t *outnvl)
|
|
|
|
{
|
|
|
|
int error, poollen;
|
|
|
|
|
|
|
|
poollen = strlen(poolname);
|
2017-11-04 23:25:13 +03:00
|
|
|
for (nvpair_t *pair = nvlist_next_nvpair(innvl, NULL);
|
2013-12-12 02:33:41 +04:00
|
|
|
pair != NULL; pair = nvlist_next_nvpair(innvl, pair)) {
|
2011-11-17 22:14:36 +04:00
|
|
|
const char *name = nvpair_name(pair);
|
2013-12-12 02:33:41 +04:00
|
|
|
const char *cp = strchr(name, '#');
|
2013-08-28 15:45:09 +04:00
|
|
|
|
2011-11-17 22:14:36 +04:00
|
|
|
/*
|
2013-12-12 02:33:41 +04:00
|
|
|
* The bookmark name must contain an #, and the part after it
|
|
|
|
* must contain only valid characters.
|
|
|
|
*/
|
|
|
|
if (cp == NULL ||
|
|
|
|
zfs_component_namecheck(cp + 1, NULL, NULL) != 0)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The bookmark must be in the specified pool.
|
2011-11-17 22:14:36 +04:00
|
|
|
*/
|
2013-08-28 15:45:09 +04:00
|
|
|
if (strncmp(name, poolname, poollen) != 0 ||
|
2013-12-12 02:33:41 +04:00
|
|
|
(name[poollen] != '/' && name[poollen] != '#'))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EXDEV));
|
2011-11-17 22:14:36 +04:00
|
|
|
}
|
|
|
|
|
2013-12-12 02:33:41 +04:00
|
|
|
error = dsl_bookmark_destroy(innvl, outnvl);
|
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
|
|
|
|
{"program", DATA_TYPE_STRING, 0},
|
|
|
|
{"arg", DATA_TYPE_ANY, 0},
|
|
|
|
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
|
|
|
|
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
|
|
|
|
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
|
|
|
|
};
|
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_channel_program(const char *poolname, nvlist_t *innvl,
|
|
|
|
nvlist_t *outnvl)
|
|
|
|
{
|
2023-03-11 21:39:24 +03:00
|
|
|
const char *program;
|
2018-02-08 19:16:23 +03:00
|
|
|
uint64_t instrlimit, memlimit;
|
2018-02-08 19:35:09 +03:00
|
|
|
boolean_t sync_flag;
|
2018-02-08 19:16:23 +03:00
|
|
|
nvpair_t *nvarg = NULL;
|
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
program = fnvlist_lookup_string(innvl, ZCP_ARG_PROGRAM);
|
2018-02-08 19:35:09 +03:00
|
|
|
if (0 != nvlist_lookup_boolean_value(innvl, ZCP_ARG_SYNC, &sync_flag)) {
|
|
|
|
sync_flag = B_TRUE;
|
|
|
|
}
|
2018-02-08 19:16:23 +03:00
|
|
|
if (0 != nvlist_lookup_uint64(innvl, ZCP_ARG_INSTRLIMIT, &instrlimit)) {
|
|
|
|
instrlimit = ZCP_DEFAULT_INSTRLIMIT;
|
|
|
|
}
|
|
|
|
if (0 != nvlist_lookup_uint64(innvl, ZCP_ARG_MEMLIMIT, &memlimit)) {
|
|
|
|
memlimit = ZCP_DEFAULT_MEMLIMIT;
|
|
|
|
}
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
nvarg = fnvlist_lookup_nvpair(innvl, ZCP_ARG_ARGLIST);
|
2018-02-08 19:16:23 +03:00
|
|
|
|
|
|
|
if (instrlimit == 0 || instrlimit > zfs_lua_max_instrlimit)
|
2019-09-27 20:46:28 +03:00
|
|
|
return (SET_ERROR(EINVAL));
|
2018-02-08 19:24:39 +03:00
|
|
|
if (memlimit == 0 || memlimit > zfs_lua_max_memlimit)
|
2019-09-27 20:46:28 +03:00
|
|
|
return (SET_ERROR(EINVAL));
|
2018-02-08 19:16:23 +03:00
|
|
|
|
2018-02-08 19:35:09 +03:00
|
|
|
return (zcp_eval(poolname, program, sync_flag, instrlimit, memlimit,
|
2018-02-08 19:16:23 +03:00
|
|
|
nvarg, outnvl));
|
|
|
|
}
|
|
|
|
|
2016-12-17 01:11:29 +03:00
|
|
|
/*
|
|
|
|
* innvl: unused
|
|
|
|
* outnvl: empty
|
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_pool_checkpoint[] = {
|
|
|
|
/* no nvl keys */
|
|
|
|
};
|
|
|
|
|
2016-12-17 01:11:29 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_pool_checkpoint(const char *poolname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl, (void) outnvl;
|
2016-12-17 01:11:29 +03:00
|
|
|
return (spa_checkpoint(poolname));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* innvl: unused
|
|
|
|
* outnvl: empty
|
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_pool_discard_checkpoint[] = {
|
|
|
|
/* no nvl keys */
|
|
|
|
};
|
|
|
|
|
2016-12-17 01:11:29 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_pool_discard_checkpoint(const char *poolname, nvlist_t *innvl,
|
|
|
|
nvlist_t *outnvl)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl, (void) outnvl;
|
2016-12-17 01:11:29 +03:00
|
|
|
return (spa_checkpoint_discard(poolname));
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of dataset to destroy
|
2009-08-18 22:43:27 +04:00
|
|
|
* zc_defer_destroy mark for deferred destroy
|
2008-11-20 23:01:55 +03:00
|
|
|
*
|
|
|
|
* outputs: none
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_destroy(zfs_cmd_t *zc)
|
|
|
|
{
|
2018-06-28 00:37:54 +03:00
|
|
|
objset_t *os;
|
|
|
|
dmu_objset_type_t ost;
|
2010-05-29 00:45:14 +04:00
|
|
|
int err;
|
2013-06-11 21:13:43 +04:00
|
|
|
|
2018-06-28 00:37:54 +03:00
|
|
|
err = dmu_objset_hold(zc->zc_name, FTAG, &os);
|
|
|
|
if (err != 0)
|
|
|
|
return (err);
|
|
|
|
ost = dmu_objset_type(os);
|
|
|
|
dmu_objset_rele(os, FTAG);
|
|
|
|
|
|
|
|
if (ost == DMU_OST_ZFS)
|
2018-02-08 19:32:45 +03:00
|
|
|
zfs_unmount_snap(zc->zc_name);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2016-07-12 20:53:53 +03:00
|
|
|
if (strchr(zc->zc_name, '@')) {
|
2013-09-04 16:00:57 +04:00
|
|
|
err = dsl_destroy_snapshot(zc->zc_name, zc->zc_defer_destroy);
|
2016-07-12 20:53:53 +03:00
|
|
|
} else {
|
2013-09-04 16:00:57 +04:00
|
|
|
err = dsl_destroy_head(zc->zc_name);
|
2016-07-12 20:53:53 +03:00
|
|
|
if (err == EEXIST) {
|
|
|
|
/*
|
|
|
|
* It is possible that the given DS may have
|
|
|
|
* hidden child (%recv) datasets - "leftovers"
|
|
|
|
* resulting from the previously interrupted
|
|
|
|
* 'zfs receive'.
|
|
|
|
*
|
|
|
|
* 6 extra bytes for /%recv
|
|
|
|
*/
|
|
|
|
char namebuf[ZFS_MAX_DATASET_NAME_LEN + 6];
|
|
|
|
|
2017-06-28 20:05:16 +03:00
|
|
|
if (snprintf(namebuf, sizeof (namebuf), "%s/%s",
|
|
|
|
zc->zc_name, recv_clone_name) >=
|
|
|
|
sizeof (namebuf))
|
|
|
|
return (SET_ERROR(EINVAL));
|
2016-07-12 20:53:53 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Try to remove the hidden child (%recv) and after
|
|
|
|
* that try to remove the target dataset.
|
|
|
|
* If the hidden child (%recv) does not exist
|
|
|
|
* the original error (EEXIST) will be returned
|
|
|
|
*/
|
|
|
|
err = dsl_destroy_head(namebuf);
|
|
|
|
if (err == 0)
|
|
|
|
err = dsl_destroy_head(zc->zc_name);
|
|
|
|
else if (err == ENOENT)
|
2017-08-03 07:16:12 +03:00
|
|
|
err = SET_ERROR(EEXIST);
|
2016-07-12 20:53:53 +03:00
|
|
|
}
|
|
|
|
}
|
2014-03-22 13:07:14 +04:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
return (err);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 17:54:59 +03:00
|
|
|
/*
|
|
|
|
* innvl: {
|
2019-03-29 19:13:20 +03:00
|
|
|
* "initialize_command" -> POOL_INITIALIZE_{CANCEL|START|SUSPEND} (uint64)
|
2018-12-19 19:20:39 +03:00
|
|
|
* "initialize_vdevs": { -> guids to initialize (nvlist)
|
|
|
|
* "vdev_path_1": vdev_guid_1, (uint64),
|
|
|
|
* "vdev_path_2": vdev_guid_2, (uint64),
|
|
|
|
* ...
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 17:54:59 +03:00
|
|
|
* },
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl: {
|
2018-12-19 19:20:39 +03:00
|
|
|
* "initialize_vdevs": { -> initialization errors (nvlist)
|
|
|
|
* "vdev_path_1": errno, see function body for possible errnos (uint64)
|
|
|
|
* "vdev_path_2": errno, ... (uint64)
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 17:54:59 +03:00
|
|
|
* ...
|
2018-12-19 19:20:39 +03:00
|
|
|
* }
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 17:54:59 +03:00
|
|
|
* }
|
|
|
|
*
|
2018-12-19 19:20:39 +03:00
|
|
|
* EINVAL is returned for an unknown commands or if any of the provided vdev
|
|
|
|
* guids have be specified with a type other than uint64.
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 17:54:59 +03:00
|
|
|
*/
|
|
|
|
static const zfs_ioc_key_t zfs_keys_pool_initialize[] = {
|
2018-12-19 19:20:39 +03:00
|
|
|
{ZPOOL_INITIALIZE_COMMAND, DATA_TYPE_UINT64, 0},
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 17:54:59 +03:00
|
|
|
{ZPOOL_INITIALIZE_VDEVS, DATA_TYPE_NVLIST, 0}
|
|
|
|
};
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_pool_initialize(const char *poolname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
|
|
|
uint64_t cmd_type;
|
|
|
|
if (nvlist_lookup_uint64(innvl, ZPOOL_INITIALIZE_COMMAND,
|
|
|
|
&cmd_type) != 0) {
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
}
|
2018-12-19 19:20:39 +03:00
|
|
|
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 17:54:59 +03:00
|
|
|
if (!(cmd_type == POOL_INITIALIZE_CANCEL ||
|
2019-03-29 19:13:20 +03:00
|
|
|
cmd_type == POOL_INITIALIZE_START ||
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 17:54:59 +03:00
|
|
|
cmd_type == POOL_INITIALIZE_SUSPEND)) {
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
}
|
|
|
|
|
|
|
|
nvlist_t *vdev_guids;
|
|
|
|
if (nvlist_lookup_nvlist(innvl, ZPOOL_INITIALIZE_VDEVS,
|
|
|
|
&vdev_guids) != 0) {
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
}
|
|
|
|
|
|
|
|
for (nvpair_t *pair = nvlist_next_nvpair(vdev_guids, NULL);
|
|
|
|
pair != NULL; pair = nvlist_next_nvpair(vdev_guids, pair)) {
|
2018-12-19 19:20:39 +03:00
|
|
|
uint64_t vdev_guid;
|
|
|
|
if (nvpair_value_uint64(pair, &vdev_guid) != 0) {
|
|
|
|
return (SET_ERROR(EINVAL));
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 17:54:59 +03:00
|
|
|
}
|
|
|
|
}
|
2018-12-19 19:20:39 +03:00
|
|
|
|
|
|
|
spa_t *spa;
|
|
|
|
int error = spa_open(poolname, &spa, FTAG);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
nvlist_t *vdev_errlist = fnvlist_alloc();
|
|
|
|
int total_errors = spa_vdev_initialize(spa, vdev_guids, cmd_type,
|
|
|
|
vdev_errlist);
|
|
|
|
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 17:54:59 +03:00
|
|
|
if (fnvlist_size(vdev_errlist) > 0) {
|
|
|
|
fnvlist_add_nvlist(outnvl, ZPOOL_INITIALIZE_VDEVS,
|
|
|
|
vdev_errlist);
|
|
|
|
}
|
|
|
|
fnvlist_free(vdev_errlist);
|
|
|
|
|
|
|
|
spa_close(spa, FTAG);
|
2021-02-24 20:51:10 +03:00
|
|
|
return (total_errors > 0 ? SET_ERROR(EINVAL) : 0);
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 17:54:59 +03:00
|
|
|
}
|
|
|
|
|
2019-03-29 19:13:20 +03:00
|
|
|
/*
|
|
|
|
* innvl: {
|
|
|
|
* "trim_command" -> POOL_TRIM_{CANCEL|START|SUSPEND} (uint64)
|
|
|
|
* "trim_vdevs": { -> guids to TRIM (nvlist)
|
|
|
|
* "vdev_path_1": vdev_guid_1, (uint64),
|
|
|
|
* "vdev_path_2": vdev_guid_2, (uint64),
|
|
|
|
* ...
|
|
|
|
* },
|
|
|
|
* "trim_rate" -> Target TRIM rate in bytes/sec.
|
|
|
|
* "trim_secure" -> Set to request a secure TRIM.
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl: {
|
|
|
|
* "trim_vdevs": { -> TRIM errors (nvlist)
|
|
|
|
* "vdev_path_1": errno, see function body for possible errnos (uint64)
|
|
|
|
* "vdev_path_2": errno, ... (uint64)
|
|
|
|
* ...
|
|
|
|
* }
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* EINVAL is returned for an unknown commands or if any of the provided vdev
|
|
|
|
* guids have be specified with a type other than uint64.
|
|
|
|
*/
|
|
|
|
static const zfs_ioc_key_t zfs_keys_pool_trim[] = {
|
|
|
|
{ZPOOL_TRIM_COMMAND, DATA_TYPE_UINT64, 0},
|
|
|
|
{ZPOOL_TRIM_VDEVS, DATA_TYPE_NVLIST, 0},
|
|
|
|
{ZPOOL_TRIM_RATE, DATA_TYPE_UINT64, ZK_OPTIONAL},
|
|
|
|
{ZPOOL_TRIM_SECURE, DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
|
|
|
|
};
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_pool_trim(const char *poolname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
|
|
|
uint64_t cmd_type;
|
|
|
|
if (nvlist_lookup_uint64(innvl, ZPOOL_TRIM_COMMAND, &cmd_type) != 0)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
|
|
|
if (!(cmd_type == POOL_TRIM_CANCEL ||
|
|
|
|
cmd_type == POOL_TRIM_START ||
|
|
|
|
cmd_type == POOL_TRIM_SUSPEND)) {
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
}
|
|
|
|
|
|
|
|
nvlist_t *vdev_guids;
|
|
|
|
if (nvlist_lookup_nvlist(innvl, ZPOOL_TRIM_VDEVS, &vdev_guids) != 0)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
|
|
|
for (nvpair_t *pair = nvlist_next_nvpair(vdev_guids, NULL);
|
|
|
|
pair != NULL; pair = nvlist_next_nvpair(vdev_guids, pair)) {
|
|
|
|
uint64_t vdev_guid;
|
|
|
|
if (nvpair_value_uint64(pair, &vdev_guid) != 0) {
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Optional, defaults to maximum rate when not provided */
|
|
|
|
uint64_t rate;
|
|
|
|
if (nvlist_lookup_uint64(innvl, ZPOOL_TRIM_RATE, &rate) != 0)
|
|
|
|
rate = 0;
|
|
|
|
|
|
|
|
/* Optional, defaults to standard TRIM when not provided */
|
|
|
|
boolean_t secure;
|
|
|
|
if (nvlist_lookup_boolean_value(innvl, ZPOOL_TRIM_SECURE,
|
|
|
|
&secure) != 0) {
|
|
|
|
secure = B_FALSE;
|
|
|
|
}
|
|
|
|
|
|
|
|
spa_t *spa;
|
|
|
|
int error = spa_open(poolname, &spa, FTAG);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
nvlist_t *vdev_errlist = fnvlist_alloc();
|
|
|
|
int total_errors = spa_vdev_trim(spa, vdev_guids, cmd_type,
|
|
|
|
rate, !!zfs_trim_metaslab_skip, secure, vdev_errlist);
|
|
|
|
|
|
|
|
if (fnvlist_size(vdev_errlist) > 0)
|
|
|
|
fnvlist_add_nvlist(outnvl, ZPOOL_TRIM_VDEVS, vdev_errlist);
|
|
|
|
|
|
|
|
fnvlist_free(vdev_errlist);
|
|
|
|
|
|
|
|
spa_close(spa, FTAG);
|
2021-02-24 20:51:10 +03:00
|
|
|
return (total_errors > 0 ? SET_ERROR(EINVAL) : 0);
|
2019-03-29 19:13:20 +03:00
|
|
|
}
|
|
|
|
|
Add subcommand to wait for background zfs activity to complete
Currently the best way to wait for the completion of a long-running
operation in a pool, like a scrub or device removal, is to poll 'zpool
status' and parse its output, which is neither efficient nor convenient.
This change adds a 'wait' subcommand to the zpool command. When invoked,
'zpool wait' will block until a specified type of background activity
completes. Currently, this subcommand can wait for any of the following:
- Scrubs or resilvers to complete
- Devices to initialized
- Devices to be replaced
- Devices to be removed
- Checkpoints to be discarded
- Background freeing to complete
For example, a scrub that is in progress could be waited for by running
zpool wait -t scrub <pool>
This also adds a -w flag to the attach, checkpoint, initialize, replace,
remove, and scrub subcommands. When used, this flag makes the operations
kicked off by these subcommands synchronous instead of asynchronous.
This functionality is implemented using a new ioctl. The type of
activity to wait for is provided as input to the ioctl, and the ioctl
blocks until all activity of that type has completed. An ioctl was used
over other methods of kernel-userspace communiction primarily for the
sake of portability.
Porting Notes:
This is ported from Delphix OS change DLPX-44432. The following changes
were made while porting:
- Added ZoL-style ioctl input declaration.
- Reorganized error handling in zpool_initialize in libzfs to integrate
better with changes made for TRIM support.
- Fixed check for whether a checkpoint discard is in progress.
Previously it also waited if the pool had a checkpoint, instead of
just if a checkpoint was being discarded.
- Exposed zfs_initialize_chunk_size as a ZoL-style tunable.
- Updated more existing tests to make use of new 'zpool wait'
functionality, tests that don't exist in Delphix OS.
- Used existing ZoL tunable zfs_scan_suspend_progress, together with
zinject, in place of a new tunable zfs_scan_max_blks_per_txg.
- Added support for a non-integral interval argument to zpool wait.
Future work:
ZoL has support for trimming devices, which Delphix OS does not. In the
future, 'zpool wait' could be extended to add the ability to wait for
trim operations to complete.
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
Closes #9162
2019-09-14 04:09:06 +03:00
|
|
|
/*
|
|
|
|
* This ioctl waits for activity of a particular type to complete. If there is
|
|
|
|
* no activity of that type in progress, it returns immediately, and the
|
|
|
|
* returned value "waited" is false. If there is activity in progress, and no
|
|
|
|
* tag is passed in, the ioctl blocks until all activity of that type is
|
|
|
|
* complete, and then returns with "waited" set to true.
|
|
|
|
*
|
|
|
|
* If a tag is provided, it identifies a particular instance of an activity to
|
|
|
|
* wait for. Currently, this is only valid for use with 'initialize', because
|
|
|
|
* that is the only activity for which there can be multiple instances running
|
|
|
|
* concurrently. In the case of 'initialize', the tag corresponds to the guid of
|
|
|
|
* the vdev on which to wait.
|
|
|
|
*
|
|
|
|
* If a thread waiting in the ioctl receives a signal, the call will return
|
|
|
|
* immediately, and the return value will be EINTR.
|
|
|
|
*
|
|
|
|
* innvl: {
|
|
|
|
* "wait_activity" -> int32_t
|
|
|
|
* (optional) "wait_tag" -> uint64_t
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl: "waited" -> boolean_t
|
|
|
|
*/
|
|
|
|
static const zfs_ioc_key_t zfs_keys_pool_wait[] = {
|
|
|
|
{ZPOOL_WAIT_ACTIVITY, DATA_TYPE_INT32, 0},
|
|
|
|
{ZPOOL_WAIT_TAG, DATA_TYPE_UINT64, ZK_OPTIONAL},
|
|
|
|
};
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_wait(const char *name, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
|
|
|
int32_t activity;
|
|
|
|
uint64_t tag;
|
|
|
|
boolean_t waited;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
if (nvlist_lookup_int32(innvl, ZPOOL_WAIT_ACTIVITY, &activity) != 0)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
if (nvlist_lookup_uint64(innvl, ZPOOL_WAIT_TAG, &tag) == 0)
|
|
|
|
error = spa_wait_tag(name, activity, tag, &waited);
|
|
|
|
else
|
|
|
|
error = spa_wait(name, activity, &waited);
|
|
|
|
|
|
|
|
if (error == 0)
|
|
|
|
fnvlist_add_boolean_value(outnvl, ZPOOL_WAIT_WAITED, waited);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2020-04-01 20:02:06 +03:00
|
|
|
/*
|
|
|
|
* This ioctl waits for activity of a particular type to complete. If there is
|
|
|
|
* no activity of that type in progress, it returns immediately, and the
|
|
|
|
* returned value "waited" is false. If there is activity in progress, and no
|
|
|
|
* tag is passed in, the ioctl blocks until all activity of that type is
|
|
|
|
* complete, and then returns with "waited" set to true.
|
|
|
|
*
|
|
|
|
* If a thread waiting in the ioctl receives a signal, the call will return
|
|
|
|
* immediately, and the return value will be EINTR.
|
|
|
|
*
|
|
|
|
* innvl: {
|
|
|
|
* "wait_activity" -> int32_t
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl: "waited" -> boolean_t
|
|
|
|
*/
|
|
|
|
static const zfs_ioc_key_t zfs_keys_fs_wait[] = {
|
|
|
|
{ZFS_WAIT_ACTIVITY, DATA_TYPE_INT32, 0},
|
|
|
|
};
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_wait_fs(const char *name, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
|
|
|
int32_t activity;
|
|
|
|
boolean_t waited = B_FALSE;
|
|
|
|
int error;
|
|
|
|
dsl_pool_t *dp;
|
|
|
|
dsl_dir_t *dd;
|
|
|
|
dsl_dataset_t *ds;
|
|
|
|
|
|
|
|
if (nvlist_lookup_int32(innvl, ZFS_WAIT_ACTIVITY, &activity) != 0)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
|
|
|
if (activity >= ZFS_WAIT_NUM_ACTIVITIES || activity < 0)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
|
|
|
if ((error = dsl_pool_hold(name, FTAG, &dp)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
if ((error = dsl_dataset_hold(dp, name, FTAG, &ds)) != 0) {
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
dd = ds->ds_dir;
|
|
|
|
mutex_enter(&dd->dd_activity_lock);
|
|
|
|
dd->dd_activity_waiters++;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We get a long-hold here so that the dsl_dataset_t and dsl_dir_t
|
|
|
|
* aren't evicted while we're waiting. Normally this is prevented by
|
|
|
|
* holding the pool, but we can't do that while we're waiting since
|
|
|
|
* that would prevent TXGs from syncing out. Some of the functionality
|
|
|
|
* of long-holds (e.g. preventing deletion) is unnecessary for this
|
|
|
|
* case, since we would cancel the waiters before proceeding with a
|
|
|
|
* deletion. An alternative mechanism for keeping the dataset around
|
|
|
|
* could be developed but this is simpler.
|
|
|
|
*/
|
|
|
|
dsl_dataset_long_hold(ds, FTAG);
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
|
|
|
|
error = dsl_dir_wait(dd, ds, activity, &waited);
|
|
|
|
|
|
|
|
dsl_dataset_long_rele(ds, FTAG);
|
|
|
|
dd->dd_activity_waiters--;
|
|
|
|
if (dd->dd_activity_waiters == 0)
|
|
|
|
cv_signal(&dd->dd_activity_cv);
|
|
|
|
mutex_exit(&dd->dd_activity_lock);
|
|
|
|
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
|
|
|
|
if (error == 0)
|
|
|
|
fnvlist_add_boolean_value(outnvl, ZFS_WAIT_WAITED, waited);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
2013-08-14 23:42:31 +04:00
|
|
|
* fsname is name of dataset to rollback (to most recent snapshot)
|
2008-11-20 23:01:55 +03:00
|
|
|
*
|
2017-03-11 21:26:47 +03:00
|
|
|
* innvl may contain name of expected target snapshot
|
2013-08-14 23:42:31 +04:00
|
|
|
*
|
|
|
|
* outnvl: "target" -> name of most recent snapshot
|
|
|
|
* }
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_rollback[] = {
|
|
|
|
{"target", DATA_TYPE_STRING, ZK_OPTIONAL},
|
|
|
|
};
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
2017-03-11 21:26:47 +03:00
|
|
|
zfs_ioc_rollback(const char *fsname, nvlist_t *innvl, nvlist_t *outnvl)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2017-03-08 03:21:37 +03:00
|
|
|
zfsvfs_t *zfsvfs;
|
2019-09-25 19:20:30 +03:00
|
|
|
zvol_state_handle_t *zv;
|
2023-03-11 21:39:24 +03:00
|
|
|
const char *target = NULL;
|
2013-09-04 16:00:57 +04:00
|
|
|
int error;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2017-03-11 21:26:47 +03:00
|
|
|
(void) nvlist_lookup_string(innvl, "target", &target);
|
|
|
|
if (target != NULL) {
|
2017-07-27 15:58:52 +03:00
|
|
|
const char *cp = strchr(target, '@');
|
2017-03-11 21:26:47 +03:00
|
|
|
|
2017-07-27 15:58:52 +03:00
|
|
|
/*
|
|
|
|
* The snap name must contain an @, and the part after it must
|
|
|
|
* contain only valid characters.
|
|
|
|
*/
|
|
|
|
if (cp == NULL ||
|
|
|
|
zfs_component_namecheck(cp + 1, NULL, NULL) != 0)
|
2017-03-11 21:26:47 +03:00
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
}
|
|
|
|
|
2017-03-09 01:56:19 +03:00
|
|
|
if (getzfsvfs(fsname, &zfsvfs) == 0) {
|
2017-01-23 21:53:46 +03:00
|
|
|
dsl_dataset_t *ds;
|
|
|
|
|
2017-03-08 03:21:37 +03:00
|
|
|
ds = dmu_objset_ds(zfsvfs->z_os);
|
|
|
|
error = zfs_suspend_fs(zfsvfs);
|
2008-11-20 23:01:55 +03:00
|
|
|
if (error == 0) {
|
|
|
|
int resume_err;
|
|
|
|
|
2017-03-11 21:26:47 +03:00
|
|
|
error = dsl_dataset_rollback(fsname, target, zfsvfs,
|
|
|
|
outnvl);
|
2017-03-08 03:21:37 +03:00
|
|
|
resume_err = zfs_resume_fs(zfsvfs, ds);
|
2008-11-20 23:01:55 +03:00
|
|
|
error = error ? error : resume_err;
|
|
|
|
}
|
2019-12-10 20:21:07 +03:00
|
|
|
zfs_vfs_rele(zfsvfs);
|
2017-01-20 00:56:36 +03:00
|
|
|
} else if ((zv = zvol_suspend(fsname)) != NULL) {
|
2017-03-11 21:26:47 +03:00
|
|
|
error = dsl_dataset_rollback(fsname, target, zvol_tag(zv),
|
|
|
|
outnvl);
|
2017-01-20 00:56:36 +03:00
|
|
|
zvol_resume(zv);
|
2008-11-20 23:01:55 +03:00
|
|
|
} else {
|
2017-03-11 21:26:47 +03:00
|
|
|
error = dsl_dataset_rollback(fsname, target, NULL, outnvl);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
return (error);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
static int
|
|
|
|
recursive_unmount(const char *fsname, void *arg)
|
|
|
|
{
|
|
|
|
const char *snapname = arg;
|
|
|
|
char *fullname;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
fullname = kmem_asprintf("%s@%s", fsname, snapname);
|
2018-02-08 19:32:45 +03:00
|
|
|
zfs_unmount_snap(fullname);
|
2019-10-10 19:47:06 +03:00
|
|
|
kmem_strfree(fullname);
|
2013-11-26 18:21:23 +04:00
|
|
|
|
2018-02-08 19:32:45 +03:00
|
|
|
return (0);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
/*
|
|
|
|
*
|
|
|
|
* snapname is the snapshot to redact.
|
|
|
|
* innvl: {
|
|
|
|
* "bookname" -> (string)
|
2019-11-11 10:24:14 +03:00
|
|
|
* shortname of the redaction bookmark to generate
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
* "snapnv" -> (nvlist, values ignored)
|
|
|
|
* snapshots to redact snapname with respect to
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl is unused
|
|
|
|
*/
|
|
|
|
|
|
|
|
static const zfs_ioc_key_t zfs_keys_redact[] = {
|
|
|
|
{"bookname", DATA_TYPE_STRING, 0},
|
|
|
|
{"snapnv", DATA_TYPE_NVLIST, 0},
|
|
|
|
};
|
2022-02-16 04:38:43 +03:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_redact(const char *snapname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) outnvl;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
nvlist_t *redactnvl = NULL;
|
2023-03-11 21:39:24 +03:00
|
|
|
const char *redactbook = NULL;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
|
|
|
|
if (nvlist_lookup_nvlist(innvl, "snapnv", &redactnvl) != 0)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
if (fnvlist_num_pairs(redactnvl) == 0)
|
|
|
|
return (SET_ERROR(ENXIO));
|
|
|
|
if (nvlist_lookup_string(innvl, "bookname", &redactbook) != 0)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
|
|
|
return (dmu_redact_snap(snapname, redactnvl, redactbook));
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name old name of dataset
|
|
|
|
* zc_value new name of dataset
|
|
|
|
* zc_cookie recursive flag (only valid for snapshots)
|
|
|
|
*
|
|
|
|
* outputs: none
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_rename(zfs_cmd_t *zc)
|
|
|
|
{
|
2018-06-28 00:37:54 +03:00
|
|
|
objset_t *os;
|
|
|
|
dmu_objset_type_t ost;
|
2008-11-20 23:01:55 +03:00
|
|
|
boolean_t recursive = zc->zc_cookie & 1;
|
2020-09-02 02:14:16 +03:00
|
|
|
boolean_t nounmount = !!(zc->zc_cookie & 2);
|
2013-09-04 16:00:57 +04:00
|
|
|
char *at;
|
2018-06-28 00:37:54 +03:00
|
|
|
int err;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2017-07-29 00:12:34 +03:00
|
|
|
/* "zfs rename" from and to ...%recv datasets should both fail */
|
|
|
|
zc->zc_name[sizeof (zc->zc_name) - 1] = '\0';
|
2008-11-20 23:01:55 +03:00
|
|
|
zc->zc_value[sizeof (zc->zc_value) - 1] = '\0';
|
2017-07-29 00:12:34 +03:00
|
|
|
if (dataset_namecheck(zc->zc_name, NULL, NULL) != 0 ||
|
|
|
|
dataset_namecheck(zc->zc_value, NULL, NULL) != 0 ||
|
|
|
|
strchr(zc->zc_name, '%') || strchr(zc->zc_value, '%'))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2018-06-28 00:37:54 +03:00
|
|
|
err = dmu_objset_hold(zc->zc_name, FTAG, &os);
|
|
|
|
if (err != 0)
|
|
|
|
return (err);
|
|
|
|
ost = dmu_objset_type(os);
|
|
|
|
dmu_objset_rele(os, FTAG);
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
at = strchr(zc->zc_name, '@');
|
|
|
|
if (at != NULL) {
|
|
|
|
/* snaps must be in same fs */
|
2013-08-06 21:50:40 +04:00
|
|
|
int error;
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (strncmp(zc->zc_name, zc->zc_value, at - zc->zc_name + 1))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EXDEV));
|
2013-09-04 16:00:57 +04:00
|
|
|
*at = '\0';
|
2020-09-02 02:14:16 +03:00
|
|
|
if (ost == DMU_OST_ZFS && !nounmount) {
|
2013-08-06 21:50:40 +04:00
|
|
|
error = dmu_objset_find(zc->zc_name,
|
2013-09-04 16:00:57 +04:00
|
|
|
recursive_unmount, at + 1,
|
|
|
|
recursive ? DS_FIND_CHILDREN : 0);
|
2013-08-06 21:50:40 +04:00
|
|
|
if (error != 0) {
|
|
|
|
*at = '@';
|
2013-09-04 16:00:57 +04:00
|
|
|
return (error);
|
2013-08-06 21:50:40 +04:00
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
2013-08-06 21:50:40 +04:00
|
|
|
error = dsl_dataset_rename_snapshot(zc->zc_name,
|
|
|
|
at + 1, strchr(zc->zc_value, '@') + 1, recursive);
|
|
|
|
*at = '@';
|
|
|
|
|
|
|
|
return (error);
|
2013-09-04 16:00:57 +04:00
|
|
|
} else {
|
2013-12-07 02:20:22 +04:00
|
|
|
return (dsl_dir_rename(zc->zc_name, zc->zc_value));
|
2011-01-07 23:24:03 +03:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
static int
|
|
|
|
zfs_check_settable(const char *dsname, nvpair_t *pair, cred_t *cr)
|
|
|
|
{
|
|
|
|
const char *propname = nvpair_name(pair);
|
|
|
|
boolean_t issnap = (strchr(dsname, '@') != NULL);
|
|
|
|
zfs_prop_t prop = zfs_name_to_prop(propname);
|
Add zstd support to zfs
This PR adds two new compression types, based on ZStandard:
- zstd: A basic ZStandard compression algorithm Available compression.
Levels for zstd are zstd-1 through zstd-19, where the compression
increases with every level, but speed decreases.
- zstd-fast: A faster version of the ZStandard compression algorithm
zstd-fast is basically a "negative" level of zstd. The compression
decreases with every level, but speed increases.
Available compression levels for zstd-fast:
- zstd-fast-1 through zstd-fast-10
- zstd-fast-20 through zstd-fast-100 (in increments of 10)
- zstd-fast-500 and zstd-fast-1000
For more information check the man page.
Implementation details:
Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.
The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers). The upper bits are used to store the compression level.
It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.
All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables. Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).
The userspace tools all use the combined/bit-shifted value.
Additional notes:
zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.
ZSTD is included with all current tests and new tests are added
as-needed.
Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born. This is currently only used by zstd but can be
extended as needed.
Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #6247
Closes #9024
Closes #10277
Closes #10278
2020-08-18 20:10:17 +03:00
|
|
|
uint64_t intval, compval;
|
2010-05-29 00:45:14 +04:00
|
|
|
int err;
|
|
|
|
|
2022-06-14 21:27:53 +03:00
|
|
|
if (prop == ZPROP_USERPROP) {
|
2010-05-29 00:45:14 +04:00
|
|
|
if (zfs_prop_user(propname)) {
|
2010-08-26 20:52:42 +04:00
|
|
|
if ((err = zfs_secpolicy_write_perms(dsname,
|
|
|
|
ZFS_DELEG_PERM_USERPROP, cr)))
|
2010-05-29 00:45:14 +04:00
|
|
|
return (err);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!issnap && zfs_prop_userquota(propname)) {
|
|
|
|
const char *perm = NULL;
|
|
|
|
const char *uq_prefix =
|
|
|
|
zfs_userquota_prop_prefixes[ZFS_PROP_USERQUOTA];
|
|
|
|
const char *gq_prefix =
|
|
|
|
zfs_userquota_prop_prefixes[ZFS_PROP_GROUPQUOTA];
|
2016-10-04 21:46:10 +03:00
|
|
|
const char *uiq_prefix =
|
|
|
|
zfs_userquota_prop_prefixes[ZFS_PROP_USEROBJQUOTA];
|
|
|
|
const char *giq_prefix =
|
|
|
|
zfs_userquota_prop_prefixes[ZFS_PROP_GROUPOBJQUOTA];
|
2018-02-14 01:54:54 +03:00
|
|
|
const char *pq_prefix =
|
|
|
|
zfs_userquota_prop_prefixes[ZFS_PROP_PROJECTQUOTA];
|
|
|
|
const char *piq_prefix = zfs_userquota_prop_prefixes[\
|
|
|
|
ZFS_PROP_PROJECTOBJQUOTA];
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
if (strncmp(propname, uq_prefix,
|
|
|
|
strlen(uq_prefix)) == 0) {
|
|
|
|
perm = ZFS_DELEG_PERM_USERQUOTA;
|
2016-10-04 21:46:10 +03:00
|
|
|
} else if (strncmp(propname, uiq_prefix,
|
|
|
|
strlen(uiq_prefix)) == 0) {
|
|
|
|
perm = ZFS_DELEG_PERM_USEROBJQUOTA;
|
2010-05-29 00:45:14 +04:00
|
|
|
} else if (strncmp(propname, gq_prefix,
|
|
|
|
strlen(gq_prefix)) == 0) {
|
|
|
|
perm = ZFS_DELEG_PERM_GROUPQUOTA;
|
2016-10-04 21:46:10 +03:00
|
|
|
} else if (strncmp(propname, giq_prefix,
|
|
|
|
strlen(giq_prefix)) == 0) {
|
|
|
|
perm = ZFS_DELEG_PERM_GROUPOBJQUOTA;
|
2018-02-14 01:54:54 +03:00
|
|
|
} else if (strncmp(propname, pq_prefix,
|
|
|
|
strlen(pq_prefix)) == 0) {
|
|
|
|
perm = ZFS_DELEG_PERM_PROJECTQUOTA;
|
|
|
|
} else if (strncmp(propname, piq_prefix,
|
|
|
|
strlen(piq_prefix)) == 0) {
|
|
|
|
perm = ZFS_DELEG_PERM_PROJECTOBJQUOTA;
|
2010-05-29 00:45:14 +04:00
|
|
|
} else {
|
2018-02-14 01:54:54 +03:00
|
|
|
/* {USER|GROUP|PROJECT}USED are read-only */
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
|
2010-08-26 20:52:42 +04:00
|
|
|
if ((err = zfs_secpolicy_write_perms(dsname, perm, cr)))
|
2010-05-29 00:45:14 +04:00
|
|
|
return (err);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
if (issnap)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
if (nvpair_type(pair) == DATA_TYPE_NVLIST) {
|
|
|
|
/*
|
|
|
|
* dsl_prop_get_all_impl() returns properties in this
|
|
|
|
* format.
|
|
|
|
*/
|
|
|
|
nvlist_t *attrs;
|
|
|
|
VERIFY(nvpair_value_nvlist(pair, &attrs) == 0);
|
|
|
|
VERIFY(nvlist_lookup_nvpair(attrs, ZPROP_VALUE,
|
|
|
|
&pair) == 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check that this value is valid for this pool version
|
|
|
|
*/
|
|
|
|
switch (prop) {
|
|
|
|
case ZFS_PROP_COMPRESSION:
|
|
|
|
/*
|
|
|
|
* If the user specified gzip compression, make sure
|
|
|
|
* the SPA supports it. We ignore any errors here since
|
|
|
|
* we'll catch them later.
|
|
|
|
*/
|
2014-11-03 23:15:08 +03:00
|
|
|
if (nvpair_value_uint64(pair, &intval) == 0) {
|
Add zstd support to zfs
This PR adds two new compression types, based on ZStandard:
- zstd: A basic ZStandard compression algorithm Available compression.
Levels for zstd are zstd-1 through zstd-19, where the compression
increases with every level, but speed decreases.
- zstd-fast: A faster version of the ZStandard compression algorithm
zstd-fast is basically a "negative" level of zstd. The compression
decreases with every level, but speed increases.
Available compression levels for zstd-fast:
- zstd-fast-1 through zstd-fast-10
- zstd-fast-20 through zstd-fast-100 (in increments of 10)
- zstd-fast-500 and zstd-fast-1000
For more information check the man page.
Implementation details:
Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.
The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers). The upper bits are used to store the compression level.
It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.
All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables. Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).
The userspace tools all use the combined/bit-shifted value.
Additional notes:
zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.
ZSTD is included with all current tests and new tests are added
as-needed.
Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born. This is currently only used by zstd but can be
extended as needed.
Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #6247
Closes #9024
Closes #10277
Closes #10278
2020-08-18 20:10:17 +03:00
|
|
|
compval = ZIO_COMPRESS_ALGO(intval);
|
|
|
|
if (compval >= ZIO_COMPRESS_GZIP_1 &&
|
|
|
|
compval <= ZIO_COMPRESS_GZIP_9 &&
|
2010-05-29 00:45:14 +04:00
|
|
|
zfs_earlier_version(dsname,
|
|
|
|
SPA_VERSION_GZIP_COMPRESSION)) {
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
|
Add zstd support to zfs
This PR adds two new compression types, based on ZStandard:
- zstd: A basic ZStandard compression algorithm Available compression.
Levels for zstd are zstd-1 through zstd-19, where the compression
increases with every level, but speed decreases.
- zstd-fast: A faster version of the ZStandard compression algorithm
zstd-fast is basically a "negative" level of zstd. The compression
decreases with every level, but speed increases.
Available compression levels for zstd-fast:
- zstd-fast-1 through zstd-fast-10
- zstd-fast-20 through zstd-fast-100 (in increments of 10)
- zstd-fast-500 and zstd-fast-1000
For more information check the man page.
Implementation details:
Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.
The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers). The upper bits are used to store the compression level.
It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.
All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables. Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).
The userspace tools all use the combined/bit-shifted value.
Additional notes:
zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.
ZSTD is included with all current tests and new tests are added
as-needed.
Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born. This is currently only used by zstd but can be
extended as needed.
Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #6247
Closes #9024
Closes #10277
Closes #10278
2020-08-18 20:10:17 +03:00
|
|
|
if (compval == ZIO_COMPRESS_ZLE &&
|
2010-05-29 00:45:14 +04:00
|
|
|
zfs_earlier_version(dsname,
|
|
|
|
SPA_VERSION_ZLE_COMPRESSION))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2010-05-29 00:45:14 +04:00
|
|
|
|
Add zstd support to zfs
This PR adds two new compression types, based on ZStandard:
- zstd: A basic ZStandard compression algorithm Available compression.
Levels for zstd are zstd-1 through zstd-19, where the compression
increases with every level, but speed decreases.
- zstd-fast: A faster version of the ZStandard compression algorithm
zstd-fast is basically a "negative" level of zstd. The compression
decreases with every level, but speed increases.
Available compression levels for zstd-fast:
- zstd-fast-1 through zstd-fast-10
- zstd-fast-20 through zstd-fast-100 (in increments of 10)
- zstd-fast-500 and zstd-fast-1000
For more information check the man page.
Implementation details:
Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.
The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers). The upper bits are used to store the compression level.
It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.
All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables. Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).
The userspace tools all use the combined/bit-shifted value.
Additional notes:
zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.
ZSTD is included with all current tests and new tests are added
as-needed.
Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born. This is currently only used by zstd but can be
extended as needed.
Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #6247
Closes #9024
Closes #10277
Closes #10278
2020-08-18 20:10:17 +03:00
|
|
|
if (compval == ZIO_COMPRESS_LZ4) {
|
2013-01-23 13:54:30 +04:00
|
|
|
spa_t *spa;
|
|
|
|
|
|
|
|
if ((err = spa_open(dsname, &spa, FTAG)) != 0)
|
|
|
|
return (err);
|
|
|
|
|
2013-10-08 21:13:05 +04:00
|
|
|
if (!spa_feature_is_enabled(spa,
|
|
|
|
SPA_FEATURE_LZ4_COMPRESS)) {
|
2013-01-23 13:54:30 +04:00
|
|
|
spa_close(spa, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2013-01-23 13:54:30 +04:00
|
|
|
}
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
}
|
Add zstd support to zfs
This PR adds two new compression types, based on ZStandard:
- zstd: A basic ZStandard compression algorithm Available compression.
Levels for zstd are zstd-1 through zstd-19, where the compression
increases with every level, but speed decreases.
- zstd-fast: A faster version of the ZStandard compression algorithm
zstd-fast is basically a "negative" level of zstd. The compression
decreases with every level, but speed increases.
Available compression levels for zstd-fast:
- zstd-fast-1 through zstd-fast-10
- zstd-fast-20 through zstd-fast-100 (in increments of 10)
- zstd-fast-500 and zstd-fast-1000
For more information check the man page.
Implementation details:
Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.
The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers). The upper bits are used to store the compression level.
It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.
All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables. Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).
The userspace tools all use the combined/bit-shifted value.
Additional notes:
zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.
ZSTD is included with all current tests and new tests are added
as-needed.
Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born. This is currently only used by zstd but can be
extended as needed.
Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #6247
Closes #9024
Closes #10277
Closes #10278
2020-08-18 20:10:17 +03:00
|
|
|
|
|
|
|
if (compval == ZIO_COMPRESS_ZSTD) {
|
|
|
|
spa_t *spa;
|
|
|
|
|
|
|
|
if ((err = spa_open(dsname, &spa, FTAG)) != 0)
|
|
|
|
return (err);
|
|
|
|
|
|
|
|
if (!spa_feature_is_enabled(spa,
|
|
|
|
SPA_FEATURE_ZSTD_COMPRESS)) {
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (SET_ERROR(ENOTSUP));
|
|
|
|
}
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
}
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
|
|
|
case ZFS_PROP_COPIES:
|
|
|
|
if (zfs_earlier_version(dsname, SPA_VERSION_DITTO_BLOCKS))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2010-05-29 00:45:14 +04:00
|
|
|
break;
|
|
|
|
|
2015-08-25 00:18:48 +03:00
|
|
|
case ZFS_PROP_VOLBLOCKSIZE:
|
2014-11-03 23:15:08 +03:00
|
|
|
case ZFS_PROP_RECORDSIZE:
|
|
|
|
/* Record sizes above 128k need the feature to be enabled */
|
|
|
|
if (nvpair_value_uint64(pair, &intval) == 0 &&
|
|
|
|
intval > SPA_OLD_MAXBLOCKSIZE) {
|
|
|
|
spa_t *spa;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We don't allow setting the property above 1MB,
|
|
|
|
* unless the tunable has been changed.
|
|
|
|
*/
|
|
|
|
if (intval > zfs_max_recordsize ||
|
|
|
|
intval > SPA_MAXBLOCKSIZE)
|
2016-05-06 02:19:12 +03:00
|
|
|
return (SET_ERROR(ERANGE));
|
2014-11-03 23:15:08 +03:00
|
|
|
|
|
|
|
if ((err = spa_open(dsname, &spa, FTAG)) != 0)
|
|
|
|
return (err);
|
|
|
|
|
|
|
|
if (!spa_feature_is_enabled(spa,
|
|
|
|
SPA_FEATURE_LARGE_BLOCKS)) {
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (SET_ERROR(ENOTSUP));
|
|
|
|
}
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
Implement large_dnode pool feature
Justification
-------------
This feature adds support for variable length dnodes. Our motivation is
to eliminate the overhead associated with using spill blocks. Spill
blocks are used to store system attribute data (i.e. file metadata) that
does not fit in the dnode's bonus buffer. By allowing a larger bonus
buffer area the use of a spill block can be avoided. Spill blocks
potentially incur an additional read I/O for every dnode in a dnode
block. As a worst case example, reading 32 dnodes from a 16k dnode block
and all of the spill blocks could issue 33 separate reads. Now suppose
those dnodes have size 1024 and therefore don't need spill blocks. Then
the worst case number of blocks read is reduced to from 33 to two--one
per dnode block. In practice spill blocks may tend to be co-located on
disk with the dnode blocks so the reduction in I/O would not be this
drastic. In a badly fragmented pool, however, the improvement could be
significant.
ZFS-on-Linux systems that make heavy use of extended attributes would
benefit from this feature. In particular, ZFS-on-Linux supports the
xattr=sa dataset property which allows file extended attribute data
to be stored in the dnode bonus buffer as an alternative to the
traditional directory-based format. Workloads such as SELinux and the
Lustre distributed filesystem often store enough xattr data to force
spill bocks when xattr=sa is in effect. Large dnodes may therefore
provide a performance benefit to such systems.
Other use cases that may benefit from this feature include files with
large ACLs and symbolic links with long target names. Furthermore,
this feature may be desirable on other platforms in case future
applications or features are developed that could make use of a
larger bonus buffer area.
Implementation
--------------
The size of a dnode may be a multiple of 512 bytes up to the size of
a dnode block (currently 16384 bytes). A dn_extra_slots field was
added to the current on-disk dnode_phys_t structure to describe the
size of the physical dnode on disk. The 8 bits for this field were
taken from the zero filled dn_pad2 field. The field represents how
many "extra" dnode_phys_t slots a dnode consumes in its dnode block.
This convention results in a value of 0 for 512 byte dnodes which
preserves on-disk format compatibility with older software.
Similarly, the in-memory dnode_t structure has a new dn_num_slots field
to represent the total number of dnode_phys_t slots consumed on disk.
Thus dn->dn_num_slots is 1 greater than the corresponding
dnp->dn_extra_slots. This difference in convention was adopted
because, unlike on-disk structures, backward compatibility is not a
concern for in-memory objects, so we used a more natural way to
represent size for a dnode_t.
The default size for newly created dnodes is determined by the value of
a new "dnodesize" dataset property. By default the property is set to
"legacy" which is compatible with older software. Setting the property
to "auto" will allow the filesystem to choose the most suitable dnode
size. Currently this just sets the default dnode size to 1k, but future
code improvements could dynamically choose a size based on observed
workload patterns. Dnodes of varying sizes can coexist within the same
dataset and even within the same dnode block. For example, to enable
automatically-sized dnodes, run
# zfs set dnodesize=auto tank/fish
The user can also specify literal values for the dnodesize property.
These are currently limited to powers of two from 1k to 16k. The
power-of-2 limitation is only for simplicity of the user interface.
Internally the implementation can handle any multiple of 512 up to 16k,
and consumers of the DMU API can specify any legal dnode value.
The size of a new dnode is determined at object allocation time and
stored as a new field in the znode in-memory structure. New DMU
interfaces are added to allow the consumer to specify the dnode size
that a newly allocated object should use. Existing interfaces are
unchanged to avoid having to update every call site and to preserve
compatibility with external consumers such as Lustre. The new
interfaces names are given below. The versions of these functions that
don't take a dnodesize parameter now just call the _dnsize() versions
with a dnodesize of 0, which means use the legacy dnode size.
New DMU interfaces:
dmu_object_alloc_dnsize()
dmu_object_claim_dnsize()
dmu_object_reclaim_dnsize()
New ZAP interfaces:
zap_create_dnsize()
zap_create_norm_dnsize()
zap_create_flags_dnsize()
zap_create_claim_norm_dnsize()
zap_create_link_dnsize()
The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The
spa_maxdnodesize() function should be used to determine the maximum
bonus length for a pool.
These are a few noteworthy changes to key functions:
* The prototype for dnode_hold_impl() now takes a "slots" parameter.
When the DNODE_MUST_BE_FREE flag is set, this parameter is used to
ensure the hole at the specified object offset is large enough to
hold the dnode being created. The slots parameter is also used
to ensure a dnode does not span multiple dnode blocks. In both of
these cases, if a failure occurs, ENOSPC is returned. Keep in mind,
these failure cases are only possible when using DNODE_MUST_BE_FREE.
If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0.
dnode_hold_impl() will check if the requested dnode is already
consumed as an extra dnode slot by an large dnode, in which case
it returns ENOENT.
* The function dmu_object_alloc() advances to the next dnode block
if dnode_hold_impl() returns an error for a requested object.
This is because the beginning of the next dnode block is the only
location it can safely assume to either be a hole or a valid
starting point for a dnode.
* dnode_next_offset_level() and other functions that iterate
through dnode blocks may no longer use a simple array indexing
scheme. These now use the current dnode's dn_num_slots field to
advance to the next dnode in the block. This is to ensure we
properly skip the current dnode's bonus area and don't interpret it
as a valid dnode.
zdb
---
The zdb command was updated to display a dnode's size under the
"dnsize" column when the object is dumped.
For ZIL create log records, zdb will now display the slot count for
the object.
ztest
-----
Ztest chooses a random dnodesize for every newly created object. The
random distribution is more heavily weighted toward small dnodes to
better simulate real-world datasets.
Unused bonus buffer space is filled with non-zero values computed from
the object number, dataset id, offset, and generation number. This
helps ensure that the dnode traversal code properly skips the interior
regions of large dnodes, and that these interior regions are not
overwritten by data belonging to other dnodes. A new test visits each
object in a dataset. It verifies that the actual dnode size matches what
was stored in the ztest block tag when it was created. It also verifies
that the unused bonus buffer space is filled with the expected data
patterns.
ZFS Test Suite
--------------
Added six new large dnode-specific tests, and integrated the dnodesize
property into existing tests for zfs allow and send/recv.
Send/Receive
------------
ZFS send streams for datasets containing large dnodes cannot be received
on pools that don't support the large_dnode feature. A send stream with
large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be
unrecognized by an incompatible receiving pool so that the zfs receive
will fail gracefully.
While not implemented here, it may be possible to generate a
backward-compatible send stream from a dataset containing large
dnodes. The implementation may be tricky, however, because the send
object record for a large dnode would need to be resized to a 512
byte dnode, possibly kicking in a spill block in the process. This
means we would need to construct a new SA layout and possibly
register it in the SA layout object. The SA layout is normally just
sent as an ordinary object record. But if we are constructing new
layouts while generating the send stream we'd have to build the SA
layout object dynamically and send it at the end of the stream.
For sending and receiving between pools that do support large dnodes,
the drr_object send record type is extended with a new field to store
the dnode slot count. This field was repurposed from unused padding
in the structure.
ZIL Replay
----------
The dnode slot count is stored in the uppermost 8 bits of the lr_foid
field. The bits were unused as the object id is currently capped at
48 bits.
Resizing Dnodes
---------------
It should be possible to resize a dnode when it is dirtied if the
current dnodesize dataset property differs from the dnode's size, but
this functionality is not currently implemented. Clearly a dnode can
only grow if there are sufficient contiguous unused slots in the
dnode block, but it should always be possible to shrink a dnode.
Growing dnodes may be useful to reduce fragmentation in a pool with
many spill blocks in use. Shrinking dnodes may be useful to allow
sending a dataset to a pool that doesn't support the large_dnode
feature.
Feature Reference Counting
--------------------------
The reference count for the large_dnode pool feature tracks the
number of datasets that have ever contained a dnode of size larger
than 512 bytes. The first time a large dnode is created in a dataset
the dataset is converted to an extensible dataset. This is a one-way
operation and the only way to decrement the feature count is to
destroy the dataset, even if the dataset no longer contains any large
dnodes. The complexity of reference counting on a per-dnode basis was
too high, so we chose to track it on a per-dataset basis similarly to
the large_block feature.
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3542
2016-03-17 04:25:34 +03:00
|
|
|
case ZFS_PROP_DNODESIZE:
|
|
|
|
/* Dnode sizes above 512 need the feature to be enabled */
|
|
|
|
if (nvpair_value_uint64(pair, &intval) == 0 &&
|
|
|
|
intval != ZFS_DNSIZE_LEGACY) {
|
|
|
|
spa_t *spa;
|
|
|
|
|
|
|
|
if ((err = spa_open(dsname, &spa, FTAG)) != 0)
|
|
|
|
return (err);
|
|
|
|
|
|
|
|
if (!spa_feature_is_enabled(spa,
|
|
|
|
SPA_FEATURE_LARGE_DNODE)) {
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (SET_ERROR(ENOTSUP));
|
|
|
|
}
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
2018-09-06 04:33:36 +03:00
|
|
|
case ZFS_PROP_SPECIAL_SMALL_BLOCKS:
|
|
|
|
/*
|
|
|
|
* This property could require the allocation classes
|
|
|
|
* feature to be active for setting, however we allow
|
|
|
|
* it so that tests of settable properties succeed.
|
|
|
|
* The CLI will issue a warning in this case.
|
|
|
|
*/
|
|
|
|
break;
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
case ZFS_PROP_SHARESMB:
|
|
|
|
if (zpl_earlier_version(dsname, ZPL_VERSION_FUID))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2010-05-29 00:45:14 +04:00
|
|
|
break;
|
|
|
|
|
|
|
|
case ZFS_PROP_ACLINHERIT:
|
|
|
|
if (nvpair_type(pair) == DATA_TYPE_UINT64 &&
|
|
|
|
nvpair_value_uint64(pair, &intval) == 0) {
|
|
|
|
if (intval == ZFS_ACL_PASSTHROUGH_X &&
|
|
|
|
zfs_earlier_version(dsname,
|
|
|
|
SPA_VERSION_PASSTHROUGH_X))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
break;
|
2016-06-16 01:47:05 +03:00
|
|
|
case ZFS_PROP_CHECKSUM:
|
|
|
|
case ZFS_PROP_DEDUP:
|
|
|
|
{
|
|
|
|
spa_feature_t feature;
|
|
|
|
spa_t *spa;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
/* dedup feature version checks */
|
|
|
|
if (prop == ZFS_PROP_DEDUP &&
|
|
|
|
zfs_earlier_version(dsname, SPA_VERSION_DEDUP))
|
|
|
|
return (SET_ERROR(ENOTSUP));
|
|
|
|
|
2018-08-04 00:56:25 +03:00
|
|
|
if (nvpair_type(pair) == DATA_TYPE_UINT64 &&
|
|
|
|
nvpair_value_uint64(pair, &intval) == 0) {
|
|
|
|
/* check prop value is enabled in features */
|
|
|
|
feature = zio_checksum_to_feature(
|
|
|
|
intval & ZIO_CHECKSUM_MASK);
|
|
|
|
if (feature == SPA_FEATURE_NONE)
|
|
|
|
break;
|
2016-06-16 01:47:05 +03:00
|
|
|
|
2018-08-04 00:56:25 +03:00
|
|
|
if ((err = spa_open(dsname, &spa, FTAG)) != 0)
|
|
|
|
return (err);
|
2016-06-01 19:18:10 +03:00
|
|
|
|
2018-08-04 00:56:25 +03:00
|
|
|
if (!spa_feature_is_enabled(spa, feature)) {
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (SET_ERROR(ENOTSUP));
|
|
|
|
}
|
2016-06-16 01:47:05 +03:00
|
|
|
spa_close(spa, FTAG);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2010-08-26 20:52:41 +04:00
|
|
|
default:
|
|
|
|
break;
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
return (zfs_secpolicy_setprop(dsname, prop, pair, CRED()));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Removes properties from the given props list that fail permission checks
|
|
|
|
* needed to clear them and to restore them in case of a receive error. For each
|
|
|
|
* property, make sure we have both set and inherit permissions.
|
|
|
|
*
|
|
|
|
* Returns the first error encountered if any permission checks fail. If the
|
|
|
|
* caller provides a non-NULL errlist, it also gives the complete list of names
|
|
|
|
* of all the properties that failed a permission check along with the
|
|
|
|
* corresponding error numbers. The caller is responsible for freeing the
|
|
|
|
* returned errlist.
|
|
|
|
*
|
|
|
|
* If every property checks out successfully, zero is returned and the list
|
|
|
|
* pointed at by errlist is NULL.
|
|
|
|
*/
|
|
|
|
static int
|
2020-10-03 03:44:10 +03:00
|
|
|
zfs_check_clearable(const char *dataset, nvlist_t *props, nvlist_t **errlist)
|
2008-12-03 23:09:06 +03:00
|
|
|
{
|
|
|
|
zfs_cmd_t *zc;
|
2010-05-29 00:45:14 +04:00
|
|
|
nvpair_t *pair, *next_pair;
|
|
|
|
nvlist_t *errors;
|
|
|
|
int err, rv = 0;
|
2008-12-03 23:09:06 +03:00
|
|
|
|
|
|
|
if (props == NULL)
|
2010-05-29 00:45:14 +04:00
|
|
|
return (0);
|
|
|
|
|
|
|
|
VERIFY(nvlist_alloc(&errors, NV_UNIQUE_NAME, KM_SLEEP) == 0);
|
|
|
|
|
2014-12-03 22:56:32 +03:00
|
|
|
zc = kmem_alloc(sizeof (zfs_cmd_t), KM_SLEEP);
|
2016-09-26 01:08:28 +03:00
|
|
|
(void) strlcpy(zc->zc_name, dataset, sizeof (zc->zc_name));
|
2010-05-29 00:45:14 +04:00
|
|
|
pair = nvlist_next_nvpair(props, NULL);
|
|
|
|
while (pair != NULL) {
|
|
|
|
next_pair = nvlist_next_nvpair(props, pair);
|
|
|
|
|
2016-09-26 01:08:28 +03:00
|
|
|
(void) strlcpy(zc->zc_value, nvpair_name(pair),
|
|
|
|
sizeof (zc->zc_value));
|
2010-05-29 00:45:14 +04:00
|
|
|
if ((err = zfs_check_settable(dataset, pair, CRED())) != 0 ||
|
2013-08-28 15:45:09 +04:00
|
|
|
(err = zfs_secpolicy_inherit_prop(zc, NULL, CRED())) != 0) {
|
2010-05-29 00:45:14 +04:00
|
|
|
VERIFY(nvlist_remove_nvpair(props, pair) == 0);
|
|
|
|
VERIFY(nvlist_add_int32(errors,
|
|
|
|
zc->zc_value, err) == 0);
|
|
|
|
}
|
|
|
|
pair = next_pair;
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
|
|
|
kmem_free(zc, sizeof (zfs_cmd_t));
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
if ((pair = nvlist_next_nvpair(errors, NULL)) == NULL) {
|
|
|
|
nvlist_free(errors);
|
|
|
|
errors = NULL;
|
|
|
|
} else {
|
|
|
|
VERIFY(nvpair_value_int32(pair, &rv) == 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (errlist == NULL)
|
|
|
|
nvlist_free(errors);
|
|
|
|
else
|
|
|
|
*errlist = errors;
|
|
|
|
|
|
|
|
return (rv);
|
|
|
|
}
|
|
|
|
|
|
|
|
static boolean_t
|
|
|
|
propval_equals(nvpair_t *p1, nvpair_t *p2)
|
|
|
|
{
|
|
|
|
if (nvpair_type(p1) == DATA_TYPE_NVLIST) {
|
|
|
|
/* dsl_prop_get_all_impl() format */
|
|
|
|
nvlist_t *attrs;
|
|
|
|
VERIFY(nvpair_value_nvlist(p1, &attrs) == 0);
|
|
|
|
VERIFY(nvlist_lookup_nvpair(attrs, ZPROP_VALUE,
|
|
|
|
&p1) == 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (nvpair_type(p2) == DATA_TYPE_NVLIST) {
|
|
|
|
nvlist_t *attrs;
|
|
|
|
VERIFY(nvpair_value_nvlist(p2, &attrs) == 0);
|
|
|
|
VERIFY(nvlist_lookup_nvpair(attrs, ZPROP_VALUE,
|
|
|
|
&p2) == 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (nvpair_type(p1) != nvpair_type(p2))
|
|
|
|
return (B_FALSE);
|
|
|
|
|
|
|
|
if (nvpair_type(p1) == DATA_TYPE_STRING) {
|
2023-03-11 21:39:24 +03:00
|
|
|
const char *valstr1, *valstr2;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2023-03-11 21:39:24 +03:00
|
|
|
VERIFY(nvpair_value_string(p1, &valstr1) == 0);
|
|
|
|
VERIFY(nvpair_value_string(p2, &valstr2) == 0);
|
2010-05-29 00:45:14 +04:00
|
|
|
return (strcmp(valstr1, valstr2) == 0);
|
|
|
|
} else {
|
|
|
|
uint64_t intval1, intval2;
|
|
|
|
|
|
|
|
VERIFY(nvpair_value_uint64(p1, &intval1) == 0);
|
|
|
|
VERIFY(nvpair_value_uint64(p2, &intval2) == 0);
|
|
|
|
return (intval1 == intval2);
|
|
|
|
}
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
/*
|
|
|
|
* Remove properties from props if they are not going to change (as determined
|
|
|
|
* by comparison with origprops). Remove them from origprops as well, since we
|
|
|
|
* do not need to clear or restore properties that won't change.
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
props_reduce(nvlist_t *props, nvlist_t *origprops)
|
|
|
|
{
|
|
|
|
nvpair_t *pair, *next_pair;
|
|
|
|
|
|
|
|
if (origprops == NULL)
|
|
|
|
return; /* all props need to be received */
|
|
|
|
|
|
|
|
pair = nvlist_next_nvpair(props, NULL);
|
|
|
|
while (pair != NULL) {
|
|
|
|
const char *propname = nvpair_name(pair);
|
|
|
|
nvpair_t *match;
|
|
|
|
|
|
|
|
next_pair = nvlist_next_nvpair(props, pair);
|
|
|
|
|
|
|
|
if ((nvlist_lookup_nvpair(origprops, propname,
|
|
|
|
&match) != 0) || !propval_equals(pair, match))
|
|
|
|
goto next; /* need to set received value */
|
|
|
|
|
|
|
|
/* don't clear the existing received value */
|
|
|
|
(void) nvlist_remove_nvpair(origprops, match);
|
|
|
|
/* don't bother receiving the property */
|
|
|
|
(void) nvlist_remove_nvpair(props, pair);
|
|
|
|
next:
|
|
|
|
pair = next_pair;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-06-09 22:24:29 +03:00
|
|
|
/*
|
|
|
|
* Extract properties that cannot be set PRIOR to the receipt of a dataset.
|
|
|
|
* For example, refquota cannot be set until after the receipt of a dataset,
|
|
|
|
* because in replication streams, an older/earlier snapshot may exceed the
|
|
|
|
* refquota. We want to receive the older/earlier snapshot, but setting
|
|
|
|
* refquota pre-receipt will set the dsl's ACTUAL quota, which will prevent
|
|
|
|
* the older/earlier snapshot from being received (with EDQUOT).
|
|
|
|
*
|
|
|
|
* The ZFS test "zfs_receive_011_pos" demonstrates such a scenario.
|
|
|
|
*
|
|
|
|
* libzfs will need to be judicious handling errors encountered by props
|
|
|
|
* extracted by this function.
|
|
|
|
*/
|
|
|
|
static nvlist_t *
|
|
|
|
extract_delay_props(nvlist_t *props)
|
|
|
|
{
|
|
|
|
nvlist_t *delayprops;
|
|
|
|
nvpair_t *nvp, *tmp;
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
static const zfs_prop_t delayable[] = {
|
|
|
|
ZFS_PROP_REFQUOTA,
|
|
|
|
ZFS_PROP_KEYLOCATION,
|
2022-09-21 01:19:05 +03:00
|
|
|
/*
|
|
|
|
* Setting ZFS_PROP_SHARESMB requires the objset type to be
|
|
|
|
* known, which is not possible prior to receipt of raw sends.
|
|
|
|
*/
|
|
|
|
ZFS_PROP_SHARESMB,
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
0
|
|
|
|
};
|
2016-06-09 22:24:29 +03:00
|
|
|
int i;
|
|
|
|
|
|
|
|
VERIFY(nvlist_alloc(&delayprops, NV_UNIQUE_NAME, KM_SLEEP) == 0);
|
|
|
|
|
|
|
|
for (nvp = nvlist_next_nvpair(props, NULL); nvp != NULL;
|
|
|
|
nvp = nvlist_next_nvpair(props, nvp)) {
|
|
|
|
/*
|
|
|
|
* strcmp() is safe because zfs_prop_to_name() always returns
|
|
|
|
* a bounded string.
|
|
|
|
*/
|
|
|
|
for (i = 0; delayable[i] != 0; i++) {
|
|
|
|
if (strcmp(zfs_prop_to_name(delayable[i]),
|
|
|
|
nvpair_name(nvp)) == 0) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (delayable[i] != 0) {
|
|
|
|
tmp = nvlist_prev_nvpair(props, nvp);
|
|
|
|
VERIFY(nvlist_add_nvpair(delayprops, nvp) == 0);
|
|
|
|
VERIFY(nvlist_remove_nvpair(props, nvp) == 0);
|
|
|
|
nvp = tmp;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (nvlist_empty(delayprops)) {
|
|
|
|
nvlist_free(delayprops);
|
|
|
|
delayprops = NULL;
|
|
|
|
}
|
|
|
|
return (delayprops);
|
|
|
|
}
|
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
static void
|
|
|
|
zfs_allow_log_destroy(void *arg)
|
|
|
|
{
|
|
|
|
char *poolname = arg;
|
|
|
|
|
|
|
|
if (poolname != NULL)
|
2019-10-10 19:47:06 +03:00
|
|
|
kmem_strfree(poolname);
|
2019-09-27 20:46:28 +03:00
|
|
|
}
|
|
|
|
|
2020-07-26 06:07:44 +03:00
|
|
|
#ifdef ZFS_DEBUG
|
2010-05-29 00:45:14 +04:00
|
|
|
static boolean_t zfs_ioc_recv_inject_err;
|
|
|
|
#endif
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
2016-07-09 19:37:11 +03:00
|
|
|
* nvlist 'errors' is always allocated. It will contain descriptions of
|
|
|
|
* encountered errors, if any. It's the callers responsibility to free.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
|
|
|
static int
|
2023-03-11 21:39:24 +03:00
|
|
|
zfs_ioc_recv_impl(char *tofs, char *tosnap, const char *origin,
|
|
|
|
nvlist_t *recvprops, nvlist_t *localprops, nvlist_t *hidden_args,
|
|
|
|
boolean_t force, boolean_t heal, boolean_t resumable, int input_fd,
|
2020-04-23 20:06:57 +03:00
|
|
|
dmu_replay_record_t *begin_record, uint64_t *read_bytes,
|
|
|
|
uint64_t *errflags, nvlist_t **errors)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
|
|
|
dmu_recv_cookie_t drc;
|
2010-05-29 00:45:14 +04:00
|
|
|
int error = 0;
|
|
|
|
int props_error = 0;
|
2019-11-21 20:32:57 +03:00
|
|
|
offset_t off, noff;
|
2017-10-13 20:09:04 +03:00
|
|
|
nvlist_t *local_delayprops = NULL;
|
|
|
|
nvlist_t *recv_delayprops = NULL;
|
2022-09-21 01:19:05 +03:00
|
|
|
nvlist_t *inherited_delayprops = NULL;
|
2016-06-10 03:04:12 +03:00
|
|
|
nvlist_t *origprops = NULL; /* existing properties */
|
2017-05-10 02:21:09 +03:00
|
|
|
nvlist_t *origrecvd = NULL; /* existing received properties */
|
2010-05-29 00:45:14 +04:00
|
|
|
boolean_t first_recvd_props = B_FALSE;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
boolean_t tofs_was_redacted;
|
2019-11-21 20:32:57 +03:00
|
|
|
zfs_file_t *input_fp;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2016-07-09 19:37:11 +03:00
|
|
|
*read_bytes = 0;
|
|
|
|
*errflags = 0;
|
|
|
|
*errors = fnvlist_alloc();
|
2019-11-21 20:32:57 +03:00
|
|
|
off = 0;
|
2016-07-09 19:37:11 +03:00
|
|
|
|
2021-07-11 04:00:37 +03:00
|
|
|
if ((input_fp = zfs_file_get(input_fd)) == NULL)
|
|
|
|
return (SET_ERROR(EBADF));
|
2013-09-04 16:00:57 +04:00
|
|
|
|
2019-11-21 20:32:57 +03:00
|
|
|
noff = off = zfs_file_off(input_fp);
|
2022-07-29 01:52:46 +03:00
|
|
|
error = dmu_recv_begin(tofs, tosnap, begin_record, force, heal,
|
2019-11-21 20:32:57 +03:00
|
|
|
resumable, localprops, hidden_args, origin, &drc, input_fp,
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
&off);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
|
|
|
goto out;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
tofs_was_redacted = dsl_get_redacted(drc.drc_ds);
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Set properties before we receive the stream so that they are applied
|
|
|
|
* to the new data. Note that we must call dmu_recv_stream() if
|
|
|
|
* dmu_recv_begin() succeeds.
|
|
|
|
*/
|
2017-05-10 02:21:09 +03:00
|
|
|
if (recvprops != NULL && !drc.drc_newfs) {
|
2013-09-04 16:00:57 +04:00
|
|
|
if (spa_version(dsl_dataset_get_spa(drc.drc_ds)) >=
|
|
|
|
SPA_VERSION_RECVD_PROPS &&
|
|
|
|
!dsl_prop_get_hasrecvd(tofs))
|
2010-05-29 00:45:14 +04:00
|
|
|
first_recvd_props = B_TRUE;
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
/*
|
2010-05-29 00:45:14 +04:00
|
|
|
* If new received properties are supplied, they are to
|
2017-10-13 20:09:04 +03:00
|
|
|
* completely replace the existing received properties,
|
|
|
|
* so stash away the existing ones.
|
2008-12-03 23:09:06 +03:00
|
|
|
*/
|
2017-05-10 02:21:09 +03:00
|
|
|
if (dsl_prop_get_received(tofs, &origrecvd) == 0) {
|
2010-05-29 00:45:14 +04:00
|
|
|
nvlist_t *errlist = NULL;
|
|
|
|
/*
|
|
|
|
* Don't bother writing a property if its value won't
|
|
|
|
* change (and avoid the unnecessary security checks).
|
|
|
|
*
|
|
|
|
* The first receive after SPA_VERSION_RECVD_PROPS is a
|
|
|
|
* special case where we blow away all local properties
|
|
|
|
* regardless.
|
|
|
|
*/
|
|
|
|
if (!first_recvd_props)
|
2017-05-10 02:21:09 +03:00
|
|
|
props_reduce(recvprops, origrecvd);
|
|
|
|
if (zfs_check_clearable(tofs, origrecvd, &errlist) != 0)
|
2016-06-10 03:04:12 +03:00
|
|
|
(void) nvlist_merge(*errors, errlist, 0);
|
2010-05-29 00:45:14 +04:00
|
|
|
nvlist_free(errlist);
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2017-05-10 02:21:09 +03:00
|
|
|
if (clear_received_props(tofs, origrecvd,
|
|
|
|
first_recvd_props ? NULL : recvprops) != 0)
|
|
|
|
*errflags |= ZPROP_ERR_NOCLEAR;
|
|
|
|
} else {
|
|
|
|
*errflags |= ZPROP_ERR_NOCLEAR;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Stash away existing properties so we can restore them on error unless
|
|
|
|
* we're doing the first receive after SPA_VERSION_RECVD_PROPS, in which
|
|
|
|
* case "origrecvd" will take care of that.
|
|
|
|
*/
|
|
|
|
if (localprops != NULL && !drc.drc_newfs && !first_recvd_props) {
|
|
|
|
objset_t *os;
|
|
|
|
if (dmu_objset_hold(tofs, FTAG, &os) == 0) {
|
|
|
|
if (dsl_prop_get_all(os, &origprops) != 0) {
|
2016-06-10 03:04:12 +03:00
|
|
|
*errflags |= ZPROP_ERR_NOCLEAR;
|
2017-05-10 02:21:09 +03:00
|
|
|
}
|
|
|
|
dmu_objset_rele(os, FTAG);
|
2013-09-04 16:00:57 +04:00
|
|
|
} else {
|
2016-06-10 03:04:12 +03:00
|
|
|
*errflags |= ZPROP_ERR_NOCLEAR;
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
|
|
|
|
2017-05-10 02:21:09 +03:00
|
|
|
if (recvprops != NULL) {
|
2013-09-04 16:00:57 +04:00
|
|
|
props_error = dsl_prop_set_hasrecvd(tofs);
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (props_error == 0) {
|
2017-10-13 20:09:04 +03:00
|
|
|
recv_delayprops = extract_delay_props(recvprops);
|
2013-09-04 16:00:57 +04:00
|
|
|
(void) zfs_set_prop_nvlist(tofs, ZPROP_SRC_RECEIVED,
|
2017-05-10 02:21:09 +03:00
|
|
|
recvprops, *errors);
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
|
2017-05-10 02:21:09 +03:00
|
|
|
if (localprops != NULL) {
|
|
|
|
nvlist_t *oprops = fnvlist_alloc();
|
|
|
|
nvlist_t *xprops = fnvlist_alloc();
|
|
|
|
nvpair_t *nvp = NULL;
|
|
|
|
|
|
|
|
while ((nvp = nvlist_next_nvpair(localprops, nvp)) != NULL) {
|
|
|
|
if (nvpair_type(nvp) == DATA_TYPE_BOOLEAN) {
|
|
|
|
/* -x property */
|
|
|
|
const char *name = nvpair_name(nvp);
|
|
|
|
zfs_prop_t prop = zfs_name_to_prop(name);
|
2022-06-14 21:27:53 +03:00
|
|
|
if (prop != ZPROP_USERPROP) {
|
2017-05-10 02:21:09 +03:00
|
|
|
if (!zfs_prop_inheritable(prop))
|
|
|
|
continue;
|
|
|
|
} else if (!zfs_prop_user(name))
|
|
|
|
continue;
|
|
|
|
fnvlist_add_boolean(xprops, name);
|
|
|
|
} else {
|
|
|
|
/* -o property=value */
|
|
|
|
fnvlist_add_nvpair(oprops, nvp);
|
|
|
|
}
|
|
|
|
}
|
2017-10-13 20:09:04 +03:00
|
|
|
|
|
|
|
local_delayprops = extract_delay_props(oprops);
|
2017-05-10 02:21:09 +03:00
|
|
|
(void) zfs_set_prop_nvlist(tofs, ZPROP_SRC_LOCAL,
|
|
|
|
oprops, *errors);
|
2022-09-21 01:19:05 +03:00
|
|
|
inherited_delayprops = extract_delay_props(xprops);
|
2017-05-10 02:21:09 +03:00
|
|
|
(void) zfs_set_prop_nvlist(tofs, ZPROP_SRC_INHERITED,
|
|
|
|
xprops, *errors);
|
|
|
|
|
|
|
|
nvlist_free(oprops);
|
|
|
|
nvlist_free(xprops);
|
|
|
|
}
|
|
|
|
|
2020-04-23 20:06:57 +03:00
|
|
|
error = dmu_recv_stream(&drc, &off);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2009-08-18 22:43:27 +04:00
|
|
|
if (error == 0) {
|
2017-03-08 03:21:37 +03:00
|
|
|
zfsvfs_t *zfsvfs = NULL;
|
2019-09-25 19:20:30 +03:00
|
|
|
zvol_state_handle_t *zv = NULL;
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2017-03-09 01:56:19 +03:00
|
|
|
if (getzfsvfs(tofs, &zfsvfs) == 0) {
|
2009-08-18 22:43:27 +04:00
|
|
|
/* online recv */
|
2017-01-23 21:53:46 +03:00
|
|
|
dsl_dataset_t *ds;
|
2009-08-18 22:43:27 +04:00
|
|
|
int end_err;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
boolean_t stream_is_redacted = DMU_GET_FEATUREFLAGS(
|
|
|
|
begin_record->drr_u.drr_begin.
|
|
|
|
drr_versioninfo) & DMU_BACKUP_FEATURE_REDACTED;
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2017-03-08 03:21:37 +03:00
|
|
|
ds = dmu_objset_ds(zfsvfs->z_os);
|
|
|
|
error = zfs_suspend_fs(zfsvfs);
|
2009-08-18 22:43:27 +04:00
|
|
|
/*
|
|
|
|
* If the suspend fails, then the recv_end will
|
|
|
|
* likely also fail, and clean up after itself.
|
|
|
|
*/
|
2017-03-08 03:21:37 +03:00
|
|
|
end_err = dmu_recv_end(&drc, zfsvfs);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
/*
|
|
|
|
* If the dataset was not redacted, but we received a
|
|
|
|
* redacted stream onto it, we need to unmount the
|
|
|
|
* dataset. Otherwise, resume the filesystem.
|
|
|
|
*/
|
|
|
|
if (error == 0 && !drc.drc_newfs &&
|
|
|
|
stream_is_redacted && !tofs_was_redacted) {
|
|
|
|
error = zfs_end_fs(zfsvfs, ds);
|
|
|
|
} else if (error == 0) {
|
2017-03-08 03:21:37 +03:00
|
|
|
error = zfs_resume_fs(zfsvfs, ds);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
}
|
2009-08-18 22:43:27 +04:00
|
|
|
error = error ? error : end_err;
|
2019-12-10 20:21:07 +03:00
|
|
|
zfs_vfs_rele(zfsvfs);
|
2017-01-20 00:56:36 +03:00
|
|
|
} else if ((zv = zvol_suspend(tofs)) != NULL) {
|
|
|
|
error = dmu_recv_end(&drc, zvol_tag(zv));
|
|
|
|
zvol_resume(zv);
|
2008-12-03 23:09:06 +03:00
|
|
|
} else {
|
2013-07-27 21:50:07 +04:00
|
|
|
error = dmu_recv_end(&drc, NULL);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
2016-06-09 22:24:29 +03:00
|
|
|
|
|
|
|
/* Set delayed properties now, after we're done receiving. */
|
2017-10-13 20:09:04 +03:00
|
|
|
if (recv_delayprops != NULL && error == 0) {
|
2016-06-09 22:24:29 +03:00
|
|
|
(void) zfs_set_prop_nvlist(tofs, ZPROP_SRC_RECEIVED,
|
2017-10-13 20:09:04 +03:00
|
|
|
recv_delayprops, *errors);
|
|
|
|
}
|
|
|
|
if (local_delayprops != NULL && error == 0) {
|
|
|
|
(void) zfs_set_prop_nvlist(tofs, ZPROP_SRC_LOCAL,
|
|
|
|
local_delayprops, *errors);
|
2016-06-09 22:24:29 +03:00
|
|
|
}
|
2022-09-21 01:19:05 +03:00
|
|
|
if (inherited_delayprops != NULL && error == 0) {
|
|
|
|
(void) zfs_set_prop_nvlist(tofs, ZPROP_SRC_INHERITED,
|
|
|
|
inherited_delayprops, *errors);
|
|
|
|
}
|
2016-06-09 22:24:29 +03:00
|
|
|
}
|
|
|
|
|
2017-10-13 20:09:04 +03:00
|
|
|
/*
|
|
|
|
* Merge delayed props back in with initial props, in case
|
|
|
|
* we're DEBUG and zfs_ioc_recv_inject_err is set (which means
|
|
|
|
* we have to make sure clear_received_props() includes
|
|
|
|
* the delayed properties).
|
|
|
|
*
|
|
|
|
* Since zfs_ioc_recv_inject_err is only in DEBUG kernels,
|
|
|
|
* using ASSERT() will be just like a VERIFY.
|
|
|
|
*/
|
|
|
|
if (recv_delayprops != NULL) {
|
|
|
|
ASSERT(nvlist_merge(recvprops, recv_delayprops, 0) == 0);
|
|
|
|
nvlist_free(recv_delayprops);
|
|
|
|
}
|
|
|
|
if (local_delayprops != NULL) {
|
|
|
|
ASSERT(nvlist_merge(localprops, local_delayprops, 0) == 0);
|
|
|
|
nvlist_free(local_delayprops);
|
2016-06-09 22:24:29 +03:00
|
|
|
}
|
2022-09-21 01:19:05 +03:00
|
|
|
if (inherited_delayprops != NULL) {
|
|
|
|
ASSERT(nvlist_merge(localprops, inherited_delayprops, 0) == 0);
|
|
|
|
nvlist_free(inherited_delayprops);
|
|
|
|
}
|
2019-11-21 20:32:57 +03:00
|
|
|
*read_bytes = off - noff;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2020-07-26 06:07:44 +03:00
|
|
|
#ifdef ZFS_DEBUG
|
2010-05-29 00:45:14 +04:00
|
|
|
if (zfs_ioc_recv_inject_err) {
|
|
|
|
zfs_ioc_recv_inject_err = B_FALSE;
|
|
|
|
error = 1;
|
|
|
|
}
|
|
|
|
#endif
|
2013-12-07 02:20:22 +04:00
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
/*
|
|
|
|
* On error, restore the original props.
|
|
|
|
*/
|
2017-05-10 02:21:09 +03:00
|
|
|
if (error != 0 && recvprops != NULL && !drc.drc_newfs) {
|
|
|
|
if (clear_received_props(tofs, recvprops, NULL) != 0) {
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* We failed to clear the received properties.
|
|
|
|
* Since we may have left a $recvd value on the
|
|
|
|
* system, we can't clear the $hasrecvd flag.
|
|
|
|
*/
|
2016-06-10 03:04:12 +03:00
|
|
|
*errflags |= ZPROP_ERR_NORESTORE;
|
2013-09-04 16:00:57 +04:00
|
|
|
} else if (first_recvd_props) {
|
|
|
|
dsl_prop_unset_hasrecvd(tofs);
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
|
2017-05-10 02:21:09 +03:00
|
|
|
if (origrecvd == NULL && !drc.drc_newfs) {
|
2010-05-29 00:45:14 +04:00
|
|
|
/* We failed to stash the original properties. */
|
2016-06-10 03:04:12 +03:00
|
|
|
*errflags |= ZPROP_ERR_NORESTORE;
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* dsl_props_set() will not convert RECEIVED to LOCAL on or
|
|
|
|
* after SPA_VERSION_RECVD_PROPS, so we need to specify LOCAL
|
2017-01-03 20:31:18 +03:00
|
|
|
* explicitly if we're restoring local properties cleared in the
|
2010-05-29 00:45:14 +04:00
|
|
|
* first new-style receive.
|
|
|
|
*/
|
2017-05-10 02:21:09 +03:00
|
|
|
if (origrecvd != NULL &&
|
2010-05-29 00:45:14 +04:00
|
|
|
zfs_set_prop_nvlist(tofs, (first_recvd_props ?
|
|
|
|
ZPROP_SRC_LOCAL : ZPROP_SRC_RECEIVED),
|
2017-05-10 02:21:09 +03:00
|
|
|
origrecvd, NULL) != 0) {
|
2010-05-29 00:45:14 +04:00
|
|
|
/*
|
|
|
|
* We stashed the original properties but failed to
|
|
|
|
* restore them.
|
|
|
|
*/
|
2016-06-10 03:04:12 +03:00
|
|
|
*errflags |= ZPROP_ERR_NORESTORE;
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
2017-05-10 02:21:09 +03:00
|
|
|
if (error != 0 && localprops != NULL && !drc.drc_newfs &&
|
|
|
|
!first_recvd_props) {
|
|
|
|
nvlist_t *setprops;
|
|
|
|
nvlist_t *inheritprops;
|
|
|
|
nvpair_t *nvp;
|
|
|
|
|
|
|
|
if (origprops == NULL) {
|
|
|
|
/* We failed to stash the original properties. */
|
|
|
|
*errflags |= ZPROP_ERR_NORESTORE;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Restore original props */
|
|
|
|
setprops = fnvlist_alloc();
|
|
|
|
inheritprops = fnvlist_alloc();
|
|
|
|
nvp = NULL;
|
|
|
|
while ((nvp = nvlist_next_nvpair(localprops, nvp)) != NULL) {
|
|
|
|
const char *name = nvpair_name(nvp);
|
|
|
|
const char *source;
|
|
|
|
nvlist_t *attrs;
|
|
|
|
|
|
|
|
if (!nvlist_exists(origprops, name)) {
|
|
|
|
/*
|
|
|
|
* Property was not present or was explicitly
|
|
|
|
* inherited before the receive, restore this.
|
|
|
|
*/
|
|
|
|
fnvlist_add_boolean(inheritprops, name);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
attrs = fnvlist_lookup_nvlist(origprops, name);
|
|
|
|
source = fnvlist_lookup_string(attrs, ZPROP_SOURCE);
|
|
|
|
|
|
|
|
/* Skip received properties */
|
|
|
|
if (strcmp(source, ZPROP_SOURCE_VAL_RECVD) == 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (strcmp(source, tofs) == 0) {
|
|
|
|
/* Property was locally set */
|
|
|
|
fnvlist_add_nvlist(setprops, name, attrs);
|
|
|
|
} else {
|
|
|
|
/* Property was implicitly inherited */
|
|
|
|
fnvlist_add_boolean(inheritprops, name);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (zfs_set_prop_nvlist(tofs, ZPROP_SRC_LOCAL, setprops,
|
|
|
|
NULL) != 0)
|
|
|
|
*errflags |= ZPROP_ERR_NORESTORE;
|
|
|
|
if (zfs_set_prop_nvlist(tofs, ZPROP_SRC_INHERITED, inheritprops,
|
|
|
|
NULL) != 0)
|
|
|
|
*errflags |= ZPROP_ERR_NORESTORE;
|
|
|
|
|
|
|
|
nvlist_free(setprops);
|
|
|
|
nvlist_free(inheritprops);
|
|
|
|
}
|
2008-12-03 23:09:06 +03:00
|
|
|
out:
|
2021-07-11 04:00:37 +03:00
|
|
|
zfs_file_put(input_fp);
|
2017-05-10 02:21:09 +03:00
|
|
|
nvlist_free(origrecvd);
|
2008-12-03 23:09:06 +03:00
|
|
|
nvlist_free(origprops);
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
if (error == 0)
|
|
|
|
error = props_error;
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2016-06-10 03:04:12 +03:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of containing filesystem (unused)
|
|
|
|
* zc_nvlist_src{_size} nvlist of properties to apply
|
2017-05-10 02:21:09 +03:00
|
|
|
* zc_nvlist_conf{_size} nvlist of properties to exclude
|
|
|
|
* (DATA_TYPE_BOOLEAN) and override (everything else)
|
2016-06-10 03:04:12 +03:00
|
|
|
* zc_value name of snapshot to create
|
|
|
|
* zc_string name of clone origin (if DRR_FLAG_CLONE)
|
|
|
|
* zc_cookie file descriptor to recv from
|
|
|
|
* zc_begin_record the BEGIN record of the stream (not byteswapped)
|
|
|
|
* zc_guid force flag
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_cookie number of bytes read
|
|
|
|
* zc_obj zprop_errflags_t
|
|
|
|
* zc_nvlist_dst{_size} error for each unapplied received property
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_recv(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
dmu_replay_record_t begin_record;
|
|
|
|
nvlist_t *errors = NULL;
|
2017-05-10 02:21:09 +03:00
|
|
|
nvlist_t *recvdprops = NULL;
|
|
|
|
nvlist_t *localprops = NULL;
|
2023-03-11 21:39:24 +03:00
|
|
|
const char *origin = NULL;
|
2016-06-10 03:04:12 +03:00
|
|
|
char *tosnap;
|
2016-06-16 00:28:36 +03:00
|
|
|
char tofs[ZFS_MAX_DATASET_NAME_LEN];
|
2016-06-10 03:04:12 +03:00
|
|
|
int error = 0;
|
|
|
|
|
|
|
|
if (dataset_namecheck(zc->zc_value, NULL, NULL) != 0 ||
|
|
|
|
strchr(zc->zc_value, '@') == NULL ||
|
|
|
|
strchr(zc->zc_value, '%'))
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
2016-09-18 01:08:54 +03:00
|
|
|
(void) strlcpy(tofs, zc->zc_value, sizeof (tofs));
|
2016-06-10 03:04:12 +03:00
|
|
|
tosnap = strchr(tofs, '@');
|
|
|
|
*tosnap++ = '\0';
|
|
|
|
|
|
|
|
if (zc->zc_nvlist_src != 0 &&
|
|
|
|
(error = get_nvlist(zc->zc_nvlist_src, zc->zc_nvlist_src_size,
|
2017-05-10 02:21:09 +03:00
|
|
|
zc->zc_iflags, &recvdprops)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
if (zc->zc_nvlist_conf != 0 &&
|
|
|
|
(error = get_nvlist(zc->zc_nvlist_conf, zc->zc_nvlist_conf_size,
|
|
|
|
zc->zc_iflags, &localprops)) != 0)
|
2016-06-10 03:04:12 +03:00
|
|
|
return (error);
|
|
|
|
|
|
|
|
if (zc->zc_string[0])
|
|
|
|
origin = zc->zc_string;
|
|
|
|
|
|
|
|
begin_record.drr_type = DRR_BEGIN;
|
|
|
|
begin_record.drr_payloadlen = 0;
|
|
|
|
begin_record.drr_u.drr_begin = zc->zc_begin_record;
|
|
|
|
|
2017-05-10 02:21:09 +03:00
|
|
|
error = zfs_ioc_recv_impl(tofs, tosnap, origin, recvdprops, localprops,
|
2022-07-29 01:52:46 +03:00
|
|
|
NULL, zc->zc_guid, B_FALSE, B_FALSE, zc->zc_cookie, &begin_record,
|
2020-04-23 20:06:57 +03:00
|
|
|
&zc->zc_cookie, &zc->zc_obj, &errors);
|
2017-05-10 02:21:09 +03:00
|
|
|
nvlist_free(recvdprops);
|
|
|
|
nvlist_free(localprops);
|
2016-06-10 03:04:12 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Now that all props, initial and delayed, are set, report the prop
|
|
|
|
* errors to the caller.
|
|
|
|
*/
|
|
|
|
if (zc->zc_nvlist_dst_size != 0 && errors != NULL &&
|
|
|
|
(nvlist_smush(errors, zc->zc_nvlist_dst_size) != 0 ||
|
|
|
|
put_nvlist(zc, errors) != 0)) {
|
|
|
|
/*
|
|
|
|
* Caller made zc->zc_nvlist_dst less than the minimum expected
|
|
|
|
* size or supplied an invalid address.
|
|
|
|
*/
|
|
|
|
error = SET_ERROR(EINVAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
nvlist_free(errors);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* innvl: {
|
|
|
|
* "snapname" -> full name of the snapshot to create
|
2017-05-10 02:21:09 +03:00
|
|
|
* (optional) "props" -> received properties to set (nvlist)
|
|
|
|
* (optional) "localprops" -> override and exclude properties (nvlist)
|
2016-06-10 03:04:12 +03:00
|
|
|
* (optional) "origin" -> name of clone origin (DRR_FLAG_CLONE)
|
|
|
|
* "begin_record" -> non-byteswapped dmu_replay_record_t
|
|
|
|
* "input_fd" -> file descriptor to read stream from (int32)
|
|
|
|
* (optional) "force" -> force flag (value ignored)
|
2022-07-29 01:52:46 +03:00
|
|
|
* (optional) "heal" -> use send stream to heal data corruption
|
2016-06-10 03:04:12 +03:00
|
|
|
* (optional) "resumable" -> resumable flag (value ignored)
|
2020-04-23 20:06:57 +03:00
|
|
|
* (optional) "cleanup_fd" -> unused
|
|
|
|
* (optional) "action_handle" -> unused
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
* (optional) "hidden_args" -> { "wkeydata" -> value }
|
2016-06-10 03:04:12 +03:00
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl: {
|
|
|
|
* "read_bytes" -> number of bytes read
|
|
|
|
* "error_flags" -> zprop_errflags_t
|
|
|
|
* "errors" -> error for each unapplied received property (nvlist)
|
|
|
|
* }
|
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_recv_new[] = {
|
|
|
|
{"snapname", DATA_TYPE_STRING, 0},
|
|
|
|
{"props", DATA_TYPE_NVLIST, ZK_OPTIONAL},
|
|
|
|
{"localprops", DATA_TYPE_NVLIST, ZK_OPTIONAL},
|
|
|
|
{"origin", DATA_TYPE_STRING, ZK_OPTIONAL},
|
|
|
|
{"begin_record", DATA_TYPE_BYTE_ARRAY, 0},
|
|
|
|
{"input_fd", DATA_TYPE_INT32, 0},
|
|
|
|
{"force", DATA_TYPE_BOOLEAN, ZK_OPTIONAL},
|
2022-07-29 01:52:46 +03:00
|
|
|
{"heal", DATA_TYPE_BOOLEAN, ZK_OPTIONAL},
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
{"resumable", DATA_TYPE_BOOLEAN, ZK_OPTIONAL},
|
|
|
|
{"cleanup_fd", DATA_TYPE_INT32, ZK_OPTIONAL},
|
|
|
|
{"action_handle", DATA_TYPE_UINT64, ZK_OPTIONAL},
|
|
|
|
{"hidden_args", DATA_TYPE_NVLIST, ZK_OPTIONAL},
|
|
|
|
};
|
|
|
|
|
2016-06-10 03:04:12 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_recv_new(const char *fsname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
|
|
|
dmu_replay_record_t *begin_record;
|
|
|
|
uint_t begin_record_size;
|
|
|
|
nvlist_t *errors = NULL;
|
2017-05-10 02:21:09 +03:00
|
|
|
nvlist_t *recvprops = NULL;
|
|
|
|
nvlist_t *localprops = NULL;
|
2017-10-13 20:09:04 +03:00
|
|
|
nvlist_t *hidden_args = NULL;
|
2023-03-11 21:39:24 +03:00
|
|
|
const char *snapname;
|
|
|
|
const char *origin = NULL;
|
2016-06-10 03:04:12 +03:00
|
|
|
char *tosnap;
|
2016-06-16 00:28:36 +03:00
|
|
|
char tofs[ZFS_MAX_DATASET_NAME_LEN];
|
2016-06-10 03:04:12 +03:00
|
|
|
boolean_t force;
|
2022-07-29 01:52:46 +03:00
|
|
|
boolean_t heal;
|
2016-06-10 03:04:12 +03:00
|
|
|
boolean_t resumable;
|
|
|
|
uint64_t read_bytes = 0;
|
|
|
|
uint64_t errflags = 0;
|
|
|
|
int input_fd = -1;
|
|
|
|
int error;
|
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
snapname = fnvlist_lookup_string(innvl, "snapname");
|
2016-06-10 03:04:12 +03:00
|
|
|
|
|
|
|
if (dataset_namecheck(snapname, NULL, NULL) != 0 ||
|
|
|
|
strchr(snapname, '@') == NULL ||
|
|
|
|
strchr(snapname, '%'))
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
2020-06-07 21:42:12 +03:00
|
|
|
(void) strlcpy(tofs, snapname, sizeof (tofs));
|
2016-06-10 03:04:12 +03:00
|
|
|
tosnap = strchr(tofs, '@');
|
|
|
|
*tosnap++ = '\0';
|
|
|
|
|
|
|
|
error = nvlist_lookup_string(innvl, "origin", &origin);
|
|
|
|
if (error && error != ENOENT)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
error = nvlist_lookup_byte_array(innvl, "begin_record",
|
2016-12-12 21:46:26 +03:00
|
|
|
(uchar_t **)&begin_record, &begin_record_size);
|
2016-06-10 03:04:12 +03:00
|
|
|
if (error != 0 || begin_record_size != sizeof (*begin_record))
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
input_fd = fnvlist_lookup_int32(innvl, "input_fd");
|
2016-06-10 03:04:12 +03:00
|
|
|
|
|
|
|
force = nvlist_exists(innvl, "force");
|
2022-07-29 01:52:46 +03:00
|
|
|
heal = nvlist_exists(innvl, "heal");
|
2016-06-10 03:04:12 +03:00
|
|
|
resumable = nvlist_exists(innvl, "resumable");
|
|
|
|
|
2017-05-10 02:21:09 +03:00
|
|
|
/* we still use "props" here for backwards compatibility */
|
|
|
|
error = nvlist_lookup_nvlist(innvl, "props", &recvprops);
|
2016-06-10 03:04:12 +03:00
|
|
|
if (error && error != ENOENT)
|
|
|
|
return (error);
|
|
|
|
|
2017-05-10 02:21:09 +03:00
|
|
|
error = nvlist_lookup_nvlist(innvl, "localprops", &localprops);
|
|
|
|
if (error && error != ENOENT)
|
|
|
|
return (error);
|
|
|
|
|
2017-10-13 20:09:04 +03:00
|
|
|
error = nvlist_lookup_nvlist(innvl, ZPOOL_HIDDEN_ARGS, &hidden_args);
|
|
|
|
if (error && error != ENOENT)
|
|
|
|
return (error);
|
|
|
|
|
2017-05-10 02:21:09 +03:00
|
|
|
error = zfs_ioc_recv_impl(tofs, tosnap, origin, recvprops, localprops,
|
2022-07-29 01:52:46 +03:00
|
|
|
hidden_args, force, heal, resumable, input_fd, begin_record,
|
2020-04-23 20:06:57 +03:00
|
|
|
&read_bytes, &errflags, &errors);
|
2016-06-10 03:04:12 +03:00
|
|
|
|
|
|
|
fnvlist_add_uint64(outnvl, "read_bytes", read_bytes);
|
|
|
|
fnvlist_add_uint64(outnvl, "error_flags", errflags);
|
|
|
|
fnvlist_add_nvlist(outnvl, "errors", errors);
|
|
|
|
|
|
|
|
nvlist_free(errors);
|
2017-05-10 02:21:09 +03:00
|
|
|
nvlist_free(recvprops);
|
|
|
|
nvlist_free(localprops);
|
2016-06-10 03:04:12 +03:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
typedef struct dump_bytes_io {
|
2019-11-21 20:32:57 +03:00
|
|
|
zfs_file_t *dbi_fp;
|
|
|
|
caddr_t dbi_buf;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
int dbi_len;
|
|
|
|
int dbi_err;
|
|
|
|
} dump_bytes_io_t;
|
|
|
|
|
|
|
|
static void
|
|
|
|
dump_bytes_cb(void *arg)
|
|
|
|
{
|
|
|
|
dump_bytes_io_t *dbi = (dump_bytes_io_t *)arg;
|
2019-11-21 20:32:57 +03:00
|
|
|
zfs_file_t *fp;
|
|
|
|
caddr_t buf;
|
|
|
|
|
|
|
|
fp = dbi->dbi_fp;
|
|
|
|
buf = dbi->dbi_buf;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
|
2019-11-21 20:32:57 +03:00
|
|
|
dbi->dbi_err = zfs_file_write(fp, buf, dbi->dbi_len, NULL);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
dump_bytes(objset_t *os, void *buf, int len, void *arg)
|
|
|
|
{
|
|
|
|
dump_bytes_io_t dbi;
|
|
|
|
|
2019-11-21 20:32:57 +03:00
|
|
|
dbi.dbi_fp = arg;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dbi.dbi_buf = buf;
|
|
|
|
dbi.dbi_len = len;
|
|
|
|
|
|
|
|
#if defined(HAVE_LARGE_STACKS)
|
|
|
|
dump_bytes_cb(&dbi);
|
|
|
|
#else
|
|
|
|
/*
|
|
|
|
* The vn_rdwr() call is performed in a taskq to ensure that there is
|
|
|
|
* always enough stack space to write safely to the target filesystem.
|
|
|
|
* The ZIO_TYPE_FREE threads are used because there can be a lot of
|
|
|
|
* them and they are used in vdev_file.c for a similar purpose.
|
|
|
|
*/
|
|
|
|
spa_taskq_dispatch_sync(dmu_objset_spa(os), ZIO_TYPE_FREE,
|
|
|
|
ZIO_TASKQ_ISSUE, dump_bytes_cb, &dbi, TQ_SLEEP);
|
|
|
|
#endif /* HAVE_LARGE_STACKS */
|
|
|
|
|
|
|
|
return (dbi.dbi_err);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of snapshot to send
|
|
|
|
* zc_cookie file descriptor to send stream to
|
2010-08-27 01:24:34 +04:00
|
|
|
* zc_obj fromorigin flag (mutually exclusive with zc_fromobj)
|
|
|
|
* zc_sendobj objsetid of snapshot to send
|
|
|
|
* zc_fromobj objsetid of incremental fromsnap (may be zero)
|
2011-11-17 22:14:36 +04:00
|
|
|
* zc_guid if set, estimate size of stream only. zc_cookie is ignored.
|
|
|
|
* output size in zc_objset_type.
|
2014-11-03 23:15:08 +03:00
|
|
|
* zc_flags lzc_send_flags
|
2008-11-20 23:01:55 +03:00
|
|
|
*
|
2013-12-12 02:33:41 +04:00
|
|
|
* outputs:
|
|
|
|
* zc_objset_type estimated size, if zc_guid is set
|
2017-08-31 19:00:35 +03:00
|
|
|
*
|
|
|
|
* NOTE: This is no longer the preferred interface, any new functionality
|
|
|
|
* should be added to zfs_ioc_send_new() instead.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_send(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
offset_t off;
|
2011-11-17 22:14:36 +04:00
|
|
|
boolean_t estimate = (zc->zc_guid != 0);
|
2014-06-06 01:19:08 +04:00
|
|
|
boolean_t embedok = (zc->zc_flags & 0x1);
|
2014-11-03 23:15:08 +03:00
|
|
|
boolean_t large_block_ok = (zc->zc_flags & 0x2);
|
2016-07-11 20:45:52 +03:00
|
|
|
boolean_t compressok = (zc->zc_flags & 0x4);
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
boolean_t rawok = (zc->zc_flags & 0x8);
|
2020-01-10 21:16:58 +03:00
|
|
|
boolean_t savedok = (zc->zc_flags & 0x10);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (zc->zc_obj != 0) {
|
|
|
|
dsl_pool_t *dp;
|
|
|
|
dsl_dataset_t *tosnap;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_pool_hold(zc->zc_name, FTAG, &dp);
|
|
|
|
if (error != 0)
|
2010-08-27 01:24:34 +04:00
|
|
|
return (error);
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
error = dsl_dataset_hold_obj(dp, zc->zc_sendobj, FTAG, &tosnap);
|
|
|
|
if (error != 0) {
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
if (dsl_dir_is_clone(tosnap->ds_dir))
|
2015-04-01 18:14:34 +03:00
|
|
|
zc->zc_fromobj =
|
|
|
|
dsl_dir_phys(tosnap->ds_dir)->dd_origin_obj;
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(tosnap, FTAG);
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
2013-08-28 15:45:09 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (estimate) {
|
|
|
|
dsl_pool_t *dp;
|
|
|
|
dsl_dataset_t *tosnap;
|
|
|
|
dsl_dataset_t *fromsnap = NULL;
|
|
|
|
|
|
|
|
error = dsl_pool_hold(zc->zc_name, FTAG, &dp);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
2013-08-28 15:45:09 +04:00
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
error = dsl_dataset_hold_obj(dp, zc->zc_sendobj,
|
|
|
|
FTAG, &tosnap);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0) {
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
return (error);
|
2013-08-28 15:45:09 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (zc->zc_fromobj != 0) {
|
|
|
|
error = dsl_dataset_hold_obj(dp, zc->zc_fromobj,
|
|
|
|
FTAG, &fromsnap);
|
|
|
|
if (error != 0) {
|
|
|
|
dsl_dataset_rele(tosnap, FTAG);
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
2013-08-28 15:45:09 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
error = dmu_send_estimate_fast(tosnap, fromsnap, NULL,
|
2020-01-10 21:16:58 +03:00
|
|
|
compressok || rawok, savedok, &zc->zc_objset_type);
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
if (fromsnap != NULL)
|
|
|
|
dsl_dataset_rele(fromsnap, FTAG);
|
|
|
|
dsl_dataset_rele(tosnap, FTAG);
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
2011-11-17 22:14:36 +04:00
|
|
|
} else {
|
2019-11-21 20:32:57 +03:00
|
|
|
zfs_file_t *fp;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dmu_send_outparams_t out = {0};
|
2019-11-21 20:32:57 +03:00
|
|
|
|
2021-07-11 04:00:37 +03:00
|
|
|
if ((fp = zfs_file_get(zc->zc_cookie)) == NULL)
|
|
|
|
return (SET_ERROR(EBADF));
|
2019-11-21 20:32:57 +03:00
|
|
|
|
|
|
|
off = zfs_file_off(fp);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
out.dso_outfunc = dump_bytes;
|
2019-11-21 20:32:57 +03:00
|
|
|
out.dso_arg = fp;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
out.dso_dryrun = B_FALSE;
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dmu_send_obj(zc->zc_name, zc->zc_sendobj,
|
2020-01-10 21:16:58 +03:00
|
|
|
zc->zc_fromobj, embedok, large_block_ok, compressok,
|
|
|
|
rawok, savedok, zc->zc_cookie, &off, &out);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2021-07-11 04:00:37 +03:00
|
|
|
zfs_file_put(fp);
|
2011-11-17 22:14:36 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2012-05-10 02:05:14 +04:00
|
|
|
/*
|
|
|
|
* inputs:
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
* zc_name name of snapshot on which to report progress
|
|
|
|
* zc_cookie file descriptor of send stream
|
2012-05-10 02:05:14 +04:00
|
|
|
*
|
|
|
|
* outputs:
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
* zc_cookie number of bytes written in send stream thus far
|
|
|
|
* zc_objset_type logical size of data traversed by send thus far
|
2012-05-10 02:05:14 +04:00
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_send_progress(zfs_cmd_t *zc)
|
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_t *dp;
|
2012-05-10 02:05:14 +04:00
|
|
|
dsl_dataset_t *ds;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dmu_sendstatus_t *dsp = NULL;
|
2012-05-10 02:05:14 +04:00
|
|
|
int error;
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_pool_hold(zc->zc_name, FTAG, &dp);
|
|
|
|
if (error != 0)
|
2012-05-10 02:05:14 +04:00
|
|
|
return (error);
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_hold(dp, zc->zc_name, FTAG, &ds);
|
|
|
|
if (error != 0) {
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2012-05-10 02:05:14 +04:00
|
|
|
mutex_enter(&ds->ds_sendstream_lock);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Iterate over all the send streams currently active on this dataset.
|
|
|
|
* If there's one which matches the specified file descriptor _and_ the
|
|
|
|
* stream was started by the current process, return the progress of
|
|
|
|
* that stream.
|
|
|
|
*/
|
|
|
|
|
|
|
|
for (dsp = list_head(&ds->ds_sendstreams); dsp != NULL;
|
|
|
|
dsp = list_next(&ds->ds_sendstreams, dsp)) {
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (dsp->dss_outfd == zc->zc_cookie &&
|
2020-04-20 20:12:48 +03:00
|
|
|
zfs_proc_is_caller(dsp->dss_proc))
|
2012-05-10 02:05:14 +04:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (dsp != NULL) {
|
|
|
|
zc->zc_cookie = atomic_cas_64((volatile uint64_t *)dsp->dss_off,
|
|
|
|
0, 0);
|
|
|
|
/* This is the closest thing we have to atomic_read_64. */
|
|
|
|
zc->zc_objset_type = atomic_cas_64(&dsp->dss_blocks, 0, 0);
|
|
|
|
} else {
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(ENOENT);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
}
|
2012-05-10 02:05:14 +04:00
|
|
|
|
|
|
|
mutex_exit(&ds->ds_sendstream_lock);
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_rele(dp, FTAG);
|
2012-05-10 02:05:14 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_inject_fault(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
int id, error;
|
|
|
|
|
|
|
|
error = zio_inject_fault(zc->zc_name, (int)zc->zc_guid, &id,
|
|
|
|
&zc->zc_inject_record);
|
|
|
|
|
|
|
|
if (error == 0)
|
|
|
|
zc->zc_guid = (uint64_t)id;
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_clear_fault(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
return (zio_clear_fault((int)zc->zc_guid));
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_inject_list_next(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
int id = (int)zc->zc_guid;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
error = zio_inject_list_next(&id, zc->zc_name, sizeof (zc->zc_name),
|
|
|
|
&zc->zc_inject_record);
|
|
|
|
|
|
|
|
zc->zc_guid = id;
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_error_log(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
if ((error = spa_open(zc->zc_name, &spa, FTAG)) != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
error = spa_get_errlog(spa, (void *)(uintptr_t)zc->zc_nvlist_dst,
|
2022-12-22 22:48:49 +03:00
|
|
|
&zc->zc_nvlist_dst_size);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
zfs_ioc_clear(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
vdev_t *vd;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
/*
|
2008-12-03 23:09:06 +03:00
|
|
|
* On zpool clear we also fix up missing slogs
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
2008-12-03 23:09:06 +03:00
|
|
|
mutex_enter(&spa_namespace_lock);
|
|
|
|
spa = spa_lookup(zc->zc_name);
|
|
|
|
if (spa == NULL) {
|
|
|
|
mutex_exit(&spa_namespace_lock);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EIO));
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
2010-05-29 00:45:14 +04:00
|
|
|
if (spa_get_log_state(spa) == SPA_LOG_MISSING) {
|
2008-12-03 23:09:06 +03:00
|
|
|
/* we need to let spa_open/spa_load clear the chains */
|
2010-05-29 00:45:14 +04:00
|
|
|
spa_set_log_state(spa, SPA_LOG_CLEAR);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
2010-05-29 00:45:14 +04:00
|
|
|
spa->spa_last_open_failed = 0;
|
2008-12-03 23:09:06 +03:00
|
|
|
mutex_exit(&spa_namespace_lock);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
if (zc->zc_cookie & ZPOOL_NO_REWIND) {
|
|
|
|
error = spa_open(zc->zc_name, &spa, FTAG);
|
|
|
|
} else {
|
|
|
|
nvlist_t *policy;
|
|
|
|
nvlist_t *config = NULL;
|
|
|
|
|
2010-08-26 20:52:39 +04:00
|
|
|
if (zc->zc_nvlist_src == 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
if ((error = get_nvlist(zc->zc_nvlist_src,
|
|
|
|
zc->zc_nvlist_src_size, zc->zc_iflags, &policy)) == 0) {
|
|
|
|
error = spa_open_rewind(zc->zc_name, &spa, FTAG,
|
|
|
|
policy, &config);
|
|
|
|
if (config != NULL) {
|
2010-08-27 01:24:34 +04:00
|
|
|
int err;
|
|
|
|
|
|
|
|
if ((err = put_nvlist(zc, config)) != 0)
|
|
|
|
error = err;
|
2010-05-29 00:45:14 +04:00
|
|
|
nvlist_free(config);
|
|
|
|
}
|
|
|
|
nvlist_free(policy);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
2008-12-03 23:09:06 +03:00
|
|
|
return (error);
|
|
|
|
|
2019-03-01 04:56:19 +03:00
|
|
|
/*
|
|
|
|
* If multihost is enabled, resuming I/O is unsafe as another
|
|
|
|
* host may have imported the pool.
|
|
|
|
*/
|
|
|
|
if (spa_multihost(spa) && spa_suspended(spa))
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
spa_vdev_state_enter(spa, SCL_NONE);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
if (zc->zc_guid == 0) {
|
|
|
|
vd = NULL;
|
2008-12-03 23:09:06 +03:00
|
|
|
} else {
|
|
|
|
vd = spa_lookup_by_guid(spa, zc->zc_guid, B_TRUE);
|
2008-11-20 23:01:55 +03:00
|
|
|
if (vd == NULL) {
|
2020-02-27 03:09:17 +03:00
|
|
|
error = SET_ERROR(ENODEV);
|
|
|
|
(void) spa_vdev_state_exit(spa, NULL, error);
|
2008-11-20 23:01:55 +03:00
|
|
|
spa_close(spa, FTAG);
|
2020-02-27 03:09:17 +03:00
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
vdev_clear(spa, vd);
|
|
|
|
|
2017-07-25 22:20:52 +03:00
|
|
|
(void) spa_vdev_state_exit(spa, spa_suspended(spa) ?
|
|
|
|
NULL : spa->spa_root_vdev, 0);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
/*
|
|
|
|
* Resume any suspended I/Os.
|
|
|
|
*/
|
2009-07-03 02:44:48 +04:00
|
|
|
if (zio_resume(spa) != 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EIO);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
|
2009-07-03 02:44:48 +04:00
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2017-10-26 22:26:09 +03:00
|
|
|
/*
|
|
|
|
* Reopen all the vdevs associated with the pool.
|
|
|
|
*
|
|
|
|
* innvl: {
|
|
|
|
* "scrub_restart" -> when true and scrub is running, allow to restart
|
|
|
|
* scrub as the side effect of the reopen (boolean).
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl is unused
|
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_pool_reopen[] = {
|
2019-09-27 20:46:28 +03:00
|
|
|
{"scrub_restart", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
};
|
|
|
|
|
2012-01-24 06:43:32 +04:00
|
|
|
static int
|
2017-10-26 22:26:09 +03:00
|
|
|
zfs_ioc_pool_reopen(const char *pool, nvlist_t *innvl, nvlist_t *outnvl)
|
2012-01-24 06:43:32 +04:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) outnvl;
|
2012-01-24 06:43:32 +04:00
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
2019-09-27 20:46:28 +03:00
|
|
|
boolean_t rc, scrub_restart = B_TRUE;
|
2012-01-24 06:43:32 +04:00
|
|
|
|
2017-10-26 22:26:09 +03:00
|
|
|
if (innvl) {
|
2019-09-27 20:46:28 +03:00
|
|
|
error = nvlist_lookup_boolean_value(innvl,
|
|
|
|
"scrub_restart", &rc);
|
|
|
|
if (error == 0)
|
|
|
|
scrub_restart = rc;
|
2017-10-26 22:26:09 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
error = spa_open(pool, &spa, FTAG);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
2012-01-24 06:43:32 +04:00
|
|
|
return (error);
|
|
|
|
|
|
|
|
spa_vdev_state_enter(spa, SCL_NONE);
|
2012-09-02 00:44:00 +04:00
|
|
|
|
|
|
|
/*
|
2017-10-26 22:26:09 +03:00
|
|
|
* If the scrub_restart flag is B_FALSE and a scrub is already
|
|
|
|
* in progress then set spa_scrub_reopen flag to B_TRUE so that
|
|
|
|
* we don't restart the scrub as a side effect of the reopen.
|
|
|
|
* Otherwise, let vdev_open() decided if a resilver is required.
|
2012-09-02 00:44:00 +04:00
|
|
|
*/
|
2017-10-26 22:26:09 +03:00
|
|
|
|
|
|
|
spa->spa_scrub_reopen = (!scrub_restart &&
|
|
|
|
dsl_scan_scrubbing(spa->spa_dsl_pool));
|
2012-01-24 06:43:32 +04:00
|
|
|
vdev_reopen(spa->spa_root_vdev);
|
2012-09-02 00:44:00 +04:00
|
|
|
spa->spa_scrub_reopen = B_FALSE;
|
|
|
|
|
2012-01-24 06:43:32 +04:00
|
|
|
(void) spa_vdev_state_exit(spa, NULL, 0);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
return (0);
|
|
|
|
}
|
2017-10-26 22:26:09 +03:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
*
|
2010-05-29 00:45:14 +04:00
|
|
|
* outputs:
|
|
|
|
* zc_string name of conflicting snapshot, if there is one
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_promote(zfs_cmd_t *zc)
|
|
|
|
{
|
2017-06-27 02:56:09 +03:00
|
|
|
dsl_pool_t *dp;
|
|
|
|
dsl_dataset_t *ds, *ods;
|
|
|
|
char origin[ZFS_MAX_DATASET_NAME_LEN];
|
2008-11-20 23:01:55 +03:00
|
|
|
char *cp;
|
2017-06-27 02:56:09 +03:00
|
|
|
int error;
|
|
|
|
|
2017-07-29 00:12:34 +03:00
|
|
|
zc->zc_name[sizeof (zc->zc_name) - 1] = '\0';
|
|
|
|
if (dataset_namecheck(zc->zc_name, NULL, NULL) != 0 ||
|
|
|
|
strchr(zc->zc_name, '%'))
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
2017-06-27 02:56:09 +03:00
|
|
|
error = dsl_pool_hold(zc->zc_name, FTAG, &dp);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
error = dsl_dataset_hold(dp, zc->zc_name, FTAG, &ds);
|
|
|
|
if (error != 0) {
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!dsl_dir_is_clone(ds->ds_dir)) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
}
|
|
|
|
|
|
|
|
error = dsl_dataset_hold_obj(dp,
|
|
|
|
dsl_dir_phys(ds->ds_dir)->dd_origin_obj, FTAG, &ods);
|
|
|
|
if (error != 0) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
dsl_dataset_name(ods, origin);
|
|
|
|
dsl_dataset_rele(ods, FTAG);
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We don't need to unmount *all* the origin fs's snapshots, but
|
|
|
|
* it's easier.
|
|
|
|
*/
|
2017-06-27 02:56:09 +03:00
|
|
|
cp = strchr(origin, '@');
|
2008-11-20 23:01:55 +03:00
|
|
|
if (cp)
|
|
|
|
*cp = '\0';
|
2017-06-27 02:56:09 +03:00
|
|
|
(void) dmu_objset_find(origin,
|
2013-09-04 16:00:57 +04:00
|
|
|
zfs_unmount_snap_cb, NULL, DS_FIND_SNAPSHOTS);
|
2010-05-29 00:45:14 +04:00
|
|
|
return (dsl_dataset_promote(zc->zc_name, zc->zc_string));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2009-07-03 02:44:48 +04:00
|
|
|
/*
|
2018-02-14 01:54:54 +03:00
|
|
|
* Retrieve a single {user|group|project}{used|quota}@... property.
|
2009-07-03 02:44:48 +04:00
|
|
|
*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
* zc_objset_type zfs_userquota_prop_t
|
|
|
|
* zc_value domain name (eg. "S-1-234-567-89")
|
|
|
|
* zc_guid RID/UID/GID
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_cookie property value
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_userspace_one(zfs_cmd_t *zc)
|
|
|
|
{
|
2017-03-08 03:21:37 +03:00
|
|
|
zfsvfs_t *zfsvfs;
|
2009-07-03 02:44:48 +04:00
|
|
|
int error;
|
|
|
|
|
|
|
|
if (zc->zc_objset_type >= ZFS_NUM_USERQUOTA_PROPS)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2009-07-03 02:44:48 +04:00
|
|
|
|
2017-03-09 01:56:19 +03:00
|
|
|
error = zfsvfs_hold(zc->zc_name, FTAG, &zfsvfs, B_FALSE);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
2009-07-03 02:44:48 +04:00
|
|
|
return (error);
|
|
|
|
|
2017-03-08 03:21:37 +03:00
|
|
|
error = zfs_userspace_one(zfsvfs,
|
2009-07-03 02:44:48 +04:00
|
|
|
zc->zc_objset_type, zc->zc_value, zc->zc_guid, &zc->zc_cookie);
|
2017-03-09 01:56:19 +03:00
|
|
|
zfsvfs_rele(zfsvfs, FTAG);
|
2009-07-03 02:44:48 +04:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
* zc_cookie zap cursor
|
|
|
|
* zc_objset_type zfs_userquota_prop_t
|
|
|
|
* zc_nvlist_dst[_size] buffer to fill (not really an nvlist)
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_nvlist_dst[_size] data buffer (array of zfs_useracct_t)
|
|
|
|
* zc_cookie zap cursor
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_userspace_many(zfs_cmd_t *zc)
|
|
|
|
{
|
2017-03-08 03:21:37 +03:00
|
|
|
zfsvfs_t *zfsvfs;
|
2010-05-29 00:45:14 +04:00
|
|
|
int bufsize = zc->zc_nvlist_dst_size;
|
|
|
|
|
|
|
|
if (bufsize <= 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOMEM));
|
2009-07-03 02:44:48 +04:00
|
|
|
|
2017-11-04 23:25:13 +03:00
|
|
|
int error = zfsvfs_hold(zc->zc_name, FTAG, &zfsvfs, B_FALSE);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
2009-07-03 02:44:48 +04:00
|
|
|
return (error);
|
|
|
|
|
2017-11-04 23:25:13 +03:00
|
|
|
void *buf = vmem_alloc(bufsize, KM_SLEEP);
|
2009-07-03 02:44:48 +04:00
|
|
|
|
2017-03-08 03:21:37 +03:00
|
|
|
error = zfs_userspace_many(zfsvfs, zc->zc_objset_type, &zc->zc_cookie,
|
2009-07-03 02:44:48 +04:00
|
|
|
buf, &zc->zc_nvlist_dst_size);
|
|
|
|
|
|
|
|
if (error == 0) {
|
|
|
|
error = xcopyout(buf,
|
|
|
|
(void *)(uintptr_t)zc->zc_nvlist_dst,
|
|
|
|
zc->zc_nvlist_dst_size);
|
|
|
|
}
|
2011-05-21 01:23:18 +04:00
|
|
|
vmem_free(buf, bufsize);
|
2017-03-09 01:56:19 +03:00
|
|
|
zfsvfs_rele(zfsvfs, FTAG);
|
2009-07-03 02:44:48 +04:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* none
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_userspace_upgrade(zfs_cmd_t *zc)
|
|
|
|
{
|
2010-05-29 00:45:14 +04:00
|
|
|
int error = 0;
|
2017-03-08 03:21:37 +03:00
|
|
|
zfsvfs_t *zfsvfs;
|
2009-07-03 02:44:48 +04:00
|
|
|
|
2017-03-09 01:56:19 +03:00
|
|
|
if (getzfsvfs(zc->zc_name, &zfsvfs) == 0) {
|
2017-03-08 03:21:37 +03:00
|
|
|
if (!dmu_objset_userused_enabled(zfsvfs->z_os)) {
|
2009-07-03 02:44:48 +04:00
|
|
|
/*
|
|
|
|
* If userused is not enabled, it may be because the
|
|
|
|
* objset needs to be closed & reopened (to grow the
|
|
|
|
* objset_phys_t). Suspend/resume the fs will do that.
|
|
|
|
*/
|
2018-02-21 15:55:55 +03:00
|
|
|
dsl_dataset_t *ds, *newds;
|
2017-01-23 21:53:46 +03:00
|
|
|
|
2017-03-08 03:21:37 +03:00
|
|
|
ds = dmu_objset_ds(zfsvfs->z_os);
|
|
|
|
error = zfs_suspend_fs(zfsvfs);
|
2013-07-27 21:50:07 +04:00
|
|
|
if (error == 0) {
|
2018-02-21 15:55:55 +03:00
|
|
|
dmu_objset_refresh_ownership(ds, &newds,
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
B_TRUE, zfsvfs);
|
2018-02-21 15:55:55 +03:00
|
|
|
error = zfs_resume_fs(zfsvfs, newds);
|
2013-07-27 21:50:07 +04:00
|
|
|
}
|
2009-07-03 02:44:48 +04:00
|
|
|
}
|
2020-11-16 20:10:29 +03:00
|
|
|
if (error == 0) {
|
|
|
|
mutex_enter(&zfsvfs->z_os->os_upgrade_lock);
|
|
|
|
if (zfsvfs->z_os->os_upgrade_id == 0) {
|
|
|
|
/* clear potential error code and retry */
|
|
|
|
zfsvfs->z_os->os_upgrade_status = 0;
|
|
|
|
mutex_exit(&zfsvfs->z_os->os_upgrade_lock);
|
|
|
|
|
|
|
|
dsl_pool_config_enter(
|
|
|
|
dmu_objset_pool(zfsvfs->z_os), FTAG);
|
|
|
|
dmu_objset_userspace_upgrade(zfsvfs->z_os);
|
|
|
|
dsl_pool_config_exit(
|
|
|
|
dmu_objset_pool(zfsvfs->z_os), FTAG);
|
|
|
|
} else {
|
|
|
|
mutex_exit(&zfsvfs->z_os->os_upgrade_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
taskq_wait_id(zfsvfs->z_os->os_spa->spa_upgrade_taskq,
|
|
|
|
zfsvfs->z_os->os_upgrade_id);
|
|
|
|
error = zfsvfs->z_os->os_upgrade_status;
|
|
|
|
}
|
2019-12-10 20:21:07 +03:00
|
|
|
zfs_vfs_rele(zfsvfs);
|
2009-07-03 02:44:48 +04:00
|
|
|
} else {
|
2020-11-16 20:10:29 +03:00
|
|
|
objset_t *os;
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
/* XXX kind of reading contents without owning */
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
error = dmu_objset_hold_flags(zc->zc_name, B_TRUE, FTAG, &os);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
2009-07-03 02:44:48 +04:00
|
|
|
return (error);
|
|
|
|
|
2020-11-16 20:10:29 +03:00
|
|
|
mutex_enter(&os->os_upgrade_lock);
|
|
|
|
if (os->os_upgrade_id == 0) {
|
|
|
|
/* clear potential error code and retry */
|
|
|
|
os->os_upgrade_status = 0;
|
|
|
|
mutex_exit(&os->os_upgrade_lock);
|
|
|
|
|
|
|
|
dmu_objset_userspace_upgrade(os);
|
|
|
|
} else {
|
|
|
|
mutex_exit(&os->os_upgrade_lock);
|
|
|
|
}
|
2009-07-03 02:44:48 +04:00
|
|
|
|
2020-11-16 20:10:29 +03:00
|
|
|
dsl_pool_rele(dmu_objset_pool(os), FTAG);
|
|
|
|
|
|
|
|
taskq_wait_id(os->os_spa->spa_upgrade_taskq, os->os_upgrade_id);
|
|
|
|
error = os->os_upgrade_status;
|
|
|
|
|
|
|
|
dsl_dataset_rele_flags(dmu_objset_ds(os), DS_HOLD_FLAG_DECRYPT,
|
|
|
|
FTAG);
|
|
|
|
}
|
2009-07-03 02:44:48 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2016-10-04 21:46:10 +03:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* none
|
|
|
|
*/
|
|
|
|
static int
|
2018-02-14 01:54:54 +03:00
|
|
|
zfs_ioc_id_quota_upgrade(zfs_cmd_t *zc)
|
2016-10-04 21:46:10 +03:00
|
|
|
{
|
|
|
|
objset_t *os;
|
|
|
|
int error;
|
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
error = dmu_objset_hold_flags(zc->zc_name, B_TRUE, FTAG, &os);
|
2016-10-04 21:46:10 +03:00
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
2018-02-14 01:54:54 +03:00
|
|
|
if (dmu_objset_userobjspace_upgradable(os) ||
|
|
|
|
dmu_objset_projectquota_upgradable(os)) {
|
2016-10-04 21:46:10 +03:00
|
|
|
mutex_enter(&os->os_upgrade_lock);
|
|
|
|
if (os->os_upgrade_id == 0) {
|
|
|
|
/* clear potential error code and retry */
|
|
|
|
os->os_upgrade_status = 0;
|
|
|
|
mutex_exit(&os->os_upgrade_lock);
|
|
|
|
|
2018-02-14 01:54:54 +03:00
|
|
|
dmu_objset_id_quota_upgrade(os);
|
2016-10-04 21:46:10 +03:00
|
|
|
} else {
|
|
|
|
mutex_exit(&os->os_upgrade_lock);
|
|
|
|
}
|
|
|
|
|
2017-11-11 00:37:10 +03:00
|
|
|
dsl_pool_rele(dmu_objset_pool(os), FTAG);
|
|
|
|
|
2016-10-04 21:46:10 +03:00
|
|
|
taskq_wait_id(os->os_spa->spa_upgrade_taskq, os->os_upgrade_id);
|
|
|
|
error = os->os_upgrade_status;
|
2017-11-11 00:37:10 +03:00
|
|
|
} else {
|
|
|
|
dsl_pool_rele(dmu_objset_pool(os), FTAG);
|
2016-10-04 21:46:10 +03:00
|
|
|
}
|
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dsl_dataset_rele_flags(dmu_objset_ds(os), DS_HOLD_FLAG_DECRYPT, FTAG);
|
2016-10-04 21:46:10 +03:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_share(zfs_cmd_t *zc)
|
|
|
|
{
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOSYS));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2010-08-27 01:24:34 +04:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of containing filesystem
|
|
|
|
* zc_obj object # beyond which we want next in-use object #
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_obj next in-use object #
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_next_obj(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
objset_t *os = NULL;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
error = dmu_objset_hold(zc->zc_name, FTAG, &os);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
2010-08-27 01:24:34 +04:00
|
|
|
return (error);
|
|
|
|
|
2015-05-13 17:16:42 +03:00
|
|
|
error = dmu_object_next(os, &zc->zc_obj, B_FALSE, 0);
|
2010-08-27 01:24:34 +04:00
|
|
|
|
|
|
|
dmu_objset_rele(os, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of filesystem
|
|
|
|
* zc_value prefix name for snapshot
|
|
|
|
* zc_cleanup_fd cleanup-on-exit file descriptor for calling process
|
|
|
|
*
|
|
|
|
* outputs:
|
2013-08-28 15:45:09 +04:00
|
|
|
* zc_value short name of new snapshot
|
2010-08-27 01:24:34 +04:00
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_tmp_snapshot(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
char *snap_name;
|
2013-09-04 16:00:57 +04:00
|
|
|
char *hold_name;
|
|
|
|
minor_t minor;
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2021-07-11 04:00:37 +03:00
|
|
|
zfs_file_t *fp = zfs_onexit_fd_hold(zc->zc_cleanup_fd, &minor);
|
|
|
|
if (fp == NULL)
|
|
|
|
return (SET_ERROR(EBADF));
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
snap_name = kmem_asprintf("%s-%016llx", zc->zc_value,
|
|
|
|
(u_longlong_t)ddi_get_lbolt64());
|
|
|
|
hold_name = kmem_asprintf("%%%s", zc->zc_value);
|
|
|
|
|
2021-07-11 04:00:37 +03:00
|
|
|
int error = dsl_dataset_snapshot_tmp(zc->zc_name, snap_name, minor,
|
2013-09-04 16:00:57 +04:00
|
|
|
hold_name);
|
|
|
|
if (error == 0)
|
2016-09-26 01:08:28 +03:00
|
|
|
(void) strlcpy(zc->zc_value, snap_name,
|
|
|
|
sizeof (zc->zc_value));
|
2019-10-10 19:47:06 +03:00
|
|
|
kmem_strfree(snap_name);
|
|
|
|
kmem_strfree(hold_name);
|
2021-07-11 04:00:37 +03:00
|
|
|
zfs_onexit_fd_rele(fp);
|
2013-09-04 16:00:57 +04:00
|
|
|
return (error);
|
2010-08-27 01:24:34 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_name name of "to" snapshot
|
|
|
|
* zc_value name of "from" snapshot
|
|
|
|
* zc_cookie file descriptor to write diff data on
|
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* dmu_diff_record_t's to the file descriptor
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_diff(zfs_cmd_t *zc)
|
|
|
|
{
|
2019-11-21 20:32:57 +03:00
|
|
|
zfs_file_t *fp;
|
2010-08-27 01:24:34 +04:00
|
|
|
offset_t off;
|
|
|
|
int error;
|
|
|
|
|
2021-07-11 04:00:37 +03:00
|
|
|
if ((fp = zfs_file_get(zc->zc_cookie)) == NULL)
|
|
|
|
return (SET_ERROR(EBADF));
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2019-11-21 20:32:57 +03:00
|
|
|
off = zfs_file_off(fp);
|
|
|
|
error = dmu_diff(zc->zc_name, zc->zc_value, fp, &off);
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2021-07-11 04:00:37 +03:00
|
|
|
zfs_file_put(fp);
|
2010-08-27 01:24:34 +04:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2009-07-03 02:44:48 +04:00
|
|
|
static int
|
|
|
|
zfs_ioc_smb_acl(zfs_cmd_t *zc)
|
|
|
|
{
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2009-07-03 02:44:48 +04:00
|
|
|
}
|
|
|
|
|
2009-08-18 22:43:27 +04:00
|
|
|
/*
|
2013-09-04 16:00:57 +04:00
|
|
|
* innvl: {
|
|
|
|
* "holds" -> { snapname -> holdname (string), ... }
|
|
|
|
* (optional) "cleanup_fd" -> fd (int32)
|
|
|
|
* }
|
2009-08-18 22:43:27 +04:00
|
|
|
*
|
2013-09-04 16:00:57 +04:00
|
|
|
* outnvl: {
|
|
|
|
* snapname -> error value (int32)
|
|
|
|
* ...
|
|
|
|
* }
|
2009-08-18 22:43:27 +04:00
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_hold[] = {
|
|
|
|
{"holds", DATA_TYPE_NVLIST, 0},
|
|
|
|
{"cleanup_fd", DATA_TYPE_INT32, ZK_OPTIONAL},
|
|
|
|
};
|
|
|
|
|
2009-08-18 22:43:27 +04:00
|
|
|
static int
|
2013-09-04 16:00:57 +04:00
|
|
|
zfs_ioc_hold(const char *pool, nvlist_t *args, nvlist_t *errlist)
|
2009-08-18 22:43:27 +04:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) pool;
|
2016-01-09 20:37:15 +03:00
|
|
|
nvpair_t *pair;
|
2013-09-04 16:00:57 +04:00
|
|
|
nvlist_t *holds;
|
|
|
|
int cleanup_fd = -1;
|
2010-08-27 01:24:34 +04:00
|
|
|
int error;
|
|
|
|
minor_t minor = 0;
|
2021-07-11 04:00:37 +03:00
|
|
|
zfs_file_t *fp = NULL;
|
2009-08-18 22:43:27 +04:00
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
holds = fnvlist_lookup_nvlist(args, "holds");
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2016-01-09 20:37:15 +03:00
|
|
|
/* make sure the user didn't pass us any invalid (empty) tags */
|
|
|
|
for (pair = nvlist_next_nvpair(holds, NULL); pair != NULL;
|
|
|
|
pair = nvlist_next_nvpair(holds, pair)) {
|
2023-03-11 21:39:24 +03:00
|
|
|
const char *htag;
|
2016-01-09 20:37:15 +03:00
|
|
|
|
|
|
|
error = nvpair_value_string(pair, &htag);
|
|
|
|
if (error != 0)
|
|
|
|
return (SET_ERROR(error));
|
|
|
|
|
|
|
|
if (strlen(htag) == 0)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (nvlist_lookup_int32(args, "cleanup_fd", &cleanup_fd) == 0) {
|
2021-07-11 04:00:37 +03:00
|
|
|
fp = zfs_onexit_fd_hold(cleanup_fd, &minor);
|
|
|
|
if (fp == NULL)
|
|
|
|
return (SET_ERROR(EBADF));
|
2010-08-27 01:24:34 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_user_hold(holds, minor, errlist);
|
2021-07-11 04:00:37 +03:00
|
|
|
if (fp != NULL) {
|
|
|
|
ASSERT3U(minor, !=, 0);
|
|
|
|
zfs_onexit_fd_rele(fp);
|
|
|
|
}
|
2019-09-27 20:46:28 +03:00
|
|
|
return (SET_ERROR(error));
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2013-09-04 16:00:57 +04:00
|
|
|
* innvl is not used.
|
2009-08-18 22:43:27 +04:00
|
|
|
*
|
2013-09-04 16:00:57 +04:00
|
|
|
* outnvl: {
|
|
|
|
* holdname -> time added (uint64 seconds since epoch)
|
|
|
|
* ...
|
|
|
|
* }
|
2009-08-18 22:43:27 +04:00
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_get_holds[] = {
|
|
|
|
/* no nvl keys */
|
|
|
|
};
|
|
|
|
|
2009-08-18 22:43:27 +04:00
|
|
|
static int
|
2013-09-04 16:00:57 +04:00
|
|
|
zfs_ioc_get_holds(const char *snapname, nvlist_t *args, nvlist_t *outnvl)
|
2009-08-18 22:43:27 +04:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) args;
|
2013-09-04 16:00:57 +04:00
|
|
|
return (dsl_dataset_get_holds(snapname, outnvl));
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2013-09-04 16:00:57 +04:00
|
|
|
* innvl: {
|
|
|
|
* snapname -> { holdname, ... }
|
|
|
|
* ...
|
|
|
|
* }
|
2009-08-18 22:43:27 +04:00
|
|
|
*
|
2013-09-04 16:00:57 +04:00
|
|
|
* outnvl: {
|
|
|
|
* snapname -> error value (int32)
|
|
|
|
* ...
|
|
|
|
* }
|
2009-08-18 22:43:27 +04:00
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_release[] = {
|
|
|
|
{"<snapname>...", DATA_TYPE_NVLIST, ZK_WILDCARDLIST},
|
|
|
|
};
|
|
|
|
|
2009-08-18 22:43:27 +04:00
|
|
|
static int
|
2013-09-04 16:00:57 +04:00
|
|
|
zfs_ioc_release(const char *pool, nvlist_t *holds, nvlist_t *errlist)
|
2009-08-18 22:43:27 +04:00
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) pool;
|
2013-09-04 16:00:57 +04:00
|
|
|
return (dsl_dataset_user_release(holds, errlist));
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
2010-08-26 22:42:43 +04:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_guid flags (ZEVENT_NONBLOCK)
|
2013-11-23 04:00:39 +04:00
|
|
|
* zc_cleanup_fd zevent file descriptor
|
2010-08-26 22:42:43 +04:00
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_nvlist_dst next nvlist event
|
|
|
|
* zc_cookie dropped events since last get
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_events_next(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
zfs_zevent_t *ze;
|
|
|
|
nvlist_t *event = NULL;
|
|
|
|
minor_t minor;
|
|
|
|
uint64_t dropped = 0;
|
|
|
|
int error;
|
|
|
|
|
2021-07-11 04:00:37 +03:00
|
|
|
zfs_file_t *fp = zfs_zevent_fd_hold(zc->zc_cleanup_fd, &minor, &ze);
|
|
|
|
if (fp == NULL)
|
|
|
|
return (SET_ERROR(EBADF));
|
2010-08-26 22:42:43 +04:00
|
|
|
|
|
|
|
do {
|
2010-10-05 03:21:04 +04:00
|
|
|
error = zfs_zevent_next(ze, &event,
|
2016-12-12 21:46:26 +03:00
|
|
|
&zc->zc_nvlist_dst_size, &dropped);
|
2010-08-26 22:42:43 +04:00
|
|
|
if (event != NULL) {
|
|
|
|
zc->zc_cookie = dropped;
|
|
|
|
error = put_nvlist(zc, event);
|
2010-10-05 03:21:04 +04:00
|
|
|
nvlist_free(event);
|
2010-08-26 22:42:43 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
if (zc->zc_guid & ZEVENT_NONBLOCK)
|
|
|
|
break;
|
|
|
|
|
|
|
|
if ((error == 0) || (error != ENOENT))
|
|
|
|
break;
|
|
|
|
|
|
|
|
error = zfs_zevent_wait(ze);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
2010-08-26 22:42:43 +04:00
|
|
|
break;
|
|
|
|
} while (1);
|
|
|
|
|
2021-07-11 04:00:37 +03:00
|
|
|
zfs_zevent_fd_rele(fp);
|
2010-08-26 22:42:43 +04:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* outputs:
|
|
|
|
* zc_cookie cleared events count
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_events_clear(zfs_cmd_t *zc)
|
|
|
|
{
|
Cleanup: Specify unsignedness on things that should not be signed
In #13871, zfs_vdev_aggregation_limit_non_rotating and
zfs_vdev_aggregation_limit being signed was pointed out as a possible
reason not to eliminate an unnecessary MAX(unsigned, 0) since the
unsigned value was assigned from them.
There is no reason for these module parameters to be signed and upon
inspection, it was found that there are a number of other module
parameters that are signed, but should not be, so we make them unsigned.
Making them unsigned made it clear that some other variables in the code
should also be unsigned, so we also make those unsigned. This prevents
users from setting negative values that could potentially cause bad
behaviors. It also makes the code slightly easier to understand.
Mostly module parameters that deal with timeouts, limits, bitshifts and
percentages are made unsigned by this. Any that are boolean are left
signed, since whether booleans should be considered signed or unsigned
does not matter.
Making zfs_arc_lotsfree_percent unsigned caused a
`zfs_arc_lotsfree_percent >= 0` check to become redundant, so it was
removed. Removing the check was also necessary to prevent a compiler
error from -Werror=type-limits.
Several end of line comments had to be moved to their own lines because
replacing int with uint_t caused us to exceed the 80 character limit
enforced by cstyle.pl.
The following were kept signed because they are passed to
taskq_create(), which expects signed values and modifying the
OpenSolaris/Illumos DDI is out of scope of this patch:
* metaslab_load_pct
* zfs_sync_taskq_batch_pct
* zfs_zil_clean_taskq_nthr_pct
* zfs_zil_clean_taskq_minalloc
* zfs_zil_clean_taskq_maxalloc
* zfs_arc_prune_task_threads
Also, negative values in those parameters was found to be harmless.
The following were left signed because either negative values make
sense, or more analysis was needed to determine whether negative values
should be disallowed:
* zfs_metaslab_switch_threshold
* zfs_pd_bytes_max
* zfs_livelist_min_percent_shared
zfs_multihost_history was made static to be consistent with other
parameters.
A number of module parameters were marked as signed, but in reality
referenced unsigned variables. upgrade_errlog_limit is one of the
numerous examples. In the case of zfs_vdev_async_read_max_active, it was
already uint32_t, but zdb had an extern int declaration for it.
Interestingly, the documentation in zfs.4 was right for
upgrade_errlog_limit despite the module parameter being wrongly marked,
while the documentation for zfs_vdev_async_read_max_active (and friends)
was wrong. It was also wrong for zstd_abort_size, which was unsigned,
but was documented as signed.
Also, the documentation in zfs.4 incorrectly described the following
parameters as ulong when they were int:
* zfs_arc_meta_adjust_restarts
* zfs_override_estimate_recordsize
They are now uint_t as of this patch and thus the man page has been
updated to describe them as uint.
dbuf_state_index was left alone since it does nothing and perhaps should
be removed in another patch.
If any module parameters were missed, they were not found by `grep -r
'ZFS_MODULE_PARAM' | grep ', INT'`. I did find a few that grep missed,
but only because they were in files that had hits.
This patch intentionally did not attempt to address whether some of
these module parameters should be elevated to 64-bit parameters, because
the length of a long on 32-bit is 32-bit.
Lastly, it was pointed out during review that uint_t is a better match
for these variables than uint32_t because FreeBSD kernel parameter
definitions are designed for uint_t, whose bit width can change in
future memory models. As a result, we change the existing parameters
that are uint32_t to use uint_t.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13875
2022-09-28 02:42:41 +03:00
|
|
|
uint_t count;
|
2010-08-26 22:42:43 +04:00
|
|
|
|
|
|
|
zfs_zevent_drain_all(&count);
|
|
|
|
zc->zc_cookie = count;
|
|
|
|
|
2013-11-01 23:26:11 +04:00
|
|
|
return (0);
|
2010-08-26 22:42:43 +04:00
|
|
|
}
|
|
|
|
|
2013-11-23 02:52:16 +04:00
|
|
|
/*
|
|
|
|
* inputs:
|
|
|
|
* zc_guid eid | ZEVENT_SEEK_START | ZEVENT_SEEK_END
|
|
|
|
* zc_cleanup zevent file descriptor
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_events_seek(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
zfs_zevent_t *ze;
|
|
|
|
minor_t minor;
|
|
|
|
int error;
|
|
|
|
|
2021-07-11 04:00:37 +03:00
|
|
|
zfs_file_t *fp = zfs_zevent_fd_hold(zc->zc_cleanup_fd, &minor, &ze);
|
|
|
|
if (fp == NULL)
|
|
|
|
return (SET_ERROR(EBADF));
|
2013-11-23 02:52:16 +04:00
|
|
|
|
|
|
|
error = zfs_zevent_seek(ze, zc->zc_guid);
|
2021-07-11 04:00:37 +03:00
|
|
|
zfs_zevent_fd_rele(fp);
|
2013-11-23 02:52:16 +04:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2011-11-17 22:14:36 +04:00
|
|
|
/*
|
|
|
|
* inputs:
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
* zc_name name of later filesystem or snapshot
|
|
|
|
* zc_value full name of old snapshot or bookmark
|
2011-11-17 22:14:36 +04:00
|
|
|
*
|
|
|
|
* outputs:
|
|
|
|
* zc_cookie space in bytes
|
|
|
|
* zc_objset_type compressed space in bytes
|
|
|
|
* zc_perm_action uncompressed space in bytes
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_ioc_space_written(zfs_cmd_t *zc)
|
|
|
|
{
|
|
|
|
int error;
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_t *dp;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dsl_dataset_t *new;
|
2011-11-17 22:14:36 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_pool_hold(zc->zc_name, FTAG, &dp);
|
2011-11-17 22:14:36 +04:00
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_hold(dp, zc->zc_name, FTAG, &new);
|
|
|
|
if (error != 0) {
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (strchr(zc->zc_value, '#') != NULL) {
|
|
|
|
zfs_bookmark_phys_t bmp;
|
|
|
|
error = dsl_bookmark_lookup(dp, zc->zc_value,
|
|
|
|
new, &bmp);
|
|
|
|
if (error == 0) {
|
|
|
|
error = dsl_dataset_space_written_bookmark(&bmp, new,
|
|
|
|
&zc->zc_cookie,
|
|
|
|
&zc->zc_objset_type, &zc->zc_perm_action);
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
dsl_dataset_t *old;
|
|
|
|
error = dsl_dataset_hold(dp, zc->zc_value, FTAG, &old);
|
2011-11-17 22:14:36 +04:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (error == 0) {
|
|
|
|
error = dsl_dataset_space_written(old, new,
|
|
|
|
&zc->zc_cookie,
|
|
|
|
&zc->zc_objset_type, &zc->zc_perm_action);
|
|
|
|
dsl_dataset_rele(old, FTAG);
|
|
|
|
}
|
|
|
|
}
|
2011-11-17 22:14:36 +04:00
|
|
|
dsl_dataset_rele(new, FTAG);
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_rele(dp, FTAG);
|
2011-11-17 22:14:36 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2013-08-28 15:45:09 +04:00
|
|
|
* innvl: {
|
|
|
|
* "firstsnap" -> snapshot name
|
|
|
|
* }
|
2011-11-17 22:14:36 +04:00
|
|
|
*
|
2013-08-28 15:45:09 +04:00
|
|
|
* outnvl: {
|
|
|
|
* "used" -> space in bytes
|
|
|
|
* "compressed" -> compressed space in bytes
|
|
|
|
* "uncompressed" -> uncompressed space in bytes
|
|
|
|
* }
|
2011-11-17 22:14:36 +04:00
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_space_snaps[] = {
|
|
|
|
{"firstsnap", DATA_TYPE_STRING, 0},
|
|
|
|
};
|
|
|
|
|
2011-11-17 22:14:36 +04:00
|
|
|
static int
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_ioc_space_snaps(const char *lastsnap, nvlist_t *innvl, nvlist_t *outnvl)
|
2011-11-17 22:14:36 +04:00
|
|
|
{
|
|
|
|
int error;
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_t *dp;
|
2011-11-17 22:14:36 +04:00
|
|
|
dsl_dataset_t *new, *old;
|
2023-03-11 21:39:24 +03:00
|
|
|
const char *firstsnap;
|
2013-08-28 15:45:09 +04:00
|
|
|
uint64_t used, comp, uncomp;
|
2011-11-17 22:14:36 +04:00
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
firstsnap = fnvlist_lookup_string(innvl, "firstsnap");
|
2013-08-28 15:45:09 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_pool_hold(lastsnap, FTAG, &dp);
|
2011-11-17 22:14:36 +04:00
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
error = dsl_dataset_hold(dp, lastsnap, FTAG, &new);
|
2015-07-02 16:04:35 +03:00
|
|
|
if (error == 0 && !new->ds_is_snapshot) {
|
|
|
|
dsl_dataset_rele(new, FTAG);
|
|
|
|
error = SET_ERROR(EINVAL);
|
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0) {
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
error = dsl_dataset_hold(dp, firstsnap, FTAG, &old);
|
2015-07-02 16:04:35 +03:00
|
|
|
if (error == 0 && !old->ds_is_snapshot) {
|
|
|
|
dsl_dataset_rele(old, FTAG);
|
|
|
|
error = SET_ERROR(EINVAL);
|
|
|
|
}
|
2011-11-17 22:14:36 +04:00
|
|
|
if (error != 0) {
|
|
|
|
dsl_dataset_rele(new, FTAG);
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_rele(dp, FTAG);
|
2011-11-17 22:14:36 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
error = dsl_dataset_space_wouldfree(old, new, &used, &comp, &uncomp);
|
2011-11-17 22:14:36 +04:00
|
|
|
dsl_dataset_rele(old, FTAG);
|
|
|
|
dsl_dataset_rele(new, FTAG);
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_rele(dp, FTAG);
|
2013-08-28 15:45:09 +04:00
|
|
|
fnvlist_add_uint64(outnvl, "used", used);
|
|
|
|
fnvlist_add_uint64(outnvl, "compressed", comp);
|
|
|
|
fnvlist_add_uint64(outnvl, "uncompressed", uncomp);
|
2011-11-17 22:14:36 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
2013-08-28 15:45:09 +04:00
|
|
|
* innvl: {
|
|
|
|
* "fd" -> file descriptor to write stream to (int32)
|
|
|
|
* (optional) "fromsnap" -> full snap name to send an incremental from
|
2014-11-03 23:15:08 +03:00
|
|
|
* (optional) "largeblockok" -> (value ignored)
|
|
|
|
* indicates that blocks > 128KB are permitted
|
2014-06-06 01:19:08 +04:00
|
|
|
* (optional) "embedok" -> (value ignored)
|
|
|
|
* presence indicates DRR_WRITE_EMBEDDED records are permitted
|
2016-07-11 20:45:52 +03:00
|
|
|
* (optional) "compressok" -> (value ignored)
|
|
|
|
* presence indicates compressed DRR_WRITE records are permitted
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
* (optional) "rawok" -> (value ignored)
|
|
|
|
* presence indicates raw encrypted records should be used.
|
2020-01-10 21:16:58 +03:00
|
|
|
* (optional) "savedok" -> (value ignored)
|
|
|
|
* presence indicates we should send a partially received snapshot
|
2016-01-07 00:22:48 +03:00
|
|
|
* (optional) "resume_object" and "resume_offset" -> (uint64)
|
|
|
|
* if present, resume send stream from specified object and offset.
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
* (optional) "redactbook" -> (string)
|
|
|
|
* if present, use this bookmark's redaction list to generate a redacted
|
|
|
|
* send stream
|
2013-08-28 15:45:09 +04:00
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl is unused
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_send_new[] = {
|
|
|
|
{"fd", DATA_TYPE_INT32, 0},
|
|
|
|
{"fromsnap", DATA_TYPE_STRING, ZK_OPTIONAL},
|
|
|
|
{"largeblockok", DATA_TYPE_BOOLEAN, ZK_OPTIONAL},
|
|
|
|
{"embedok", DATA_TYPE_BOOLEAN, ZK_OPTIONAL},
|
|
|
|
{"compressok", DATA_TYPE_BOOLEAN, ZK_OPTIONAL},
|
|
|
|
{"rawok", DATA_TYPE_BOOLEAN, ZK_OPTIONAL},
|
2020-01-10 21:16:58 +03:00
|
|
|
{"savedok", DATA_TYPE_BOOLEAN, ZK_OPTIONAL},
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
{"resume_object", DATA_TYPE_UINT64, ZK_OPTIONAL},
|
|
|
|
{"resume_offset", DATA_TYPE_UINT64, ZK_OPTIONAL},
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
{"redactbook", DATA_TYPE_STRING, ZK_OPTIONAL},
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
};
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
static int
|
|
|
|
zfs_ioc_send_new(const char *snapname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) outnvl;
|
2013-08-28 15:45:09 +04:00
|
|
|
int error;
|
|
|
|
offset_t off;
|
2023-03-11 21:39:24 +03:00
|
|
|
const char *fromname = NULL;
|
2013-08-28 15:45:09 +04:00
|
|
|
int fd;
|
2019-11-21 20:32:57 +03:00
|
|
|
zfs_file_t *fp;
|
2014-11-03 23:15:08 +03:00
|
|
|
boolean_t largeblockok;
|
2014-06-06 01:19:08 +04:00
|
|
|
boolean_t embedok;
|
2016-07-11 20:45:52 +03:00
|
|
|
boolean_t compressok;
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
boolean_t rawok;
|
2020-01-10 21:16:58 +03:00
|
|
|
boolean_t savedok;
|
2016-01-07 00:22:48 +03:00
|
|
|
uint64_t resumeobj = 0;
|
|
|
|
uint64_t resumeoff = 0;
|
2023-03-11 21:39:24 +03:00
|
|
|
const char *redactbook = NULL;
|
2013-08-28 15:45:09 +04:00
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
fd = fnvlist_lookup_int32(innvl, "fd");
|
2013-08-28 15:45:09 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
(void) nvlist_lookup_string(innvl, "fromsnap", &fromname);
|
2013-08-28 15:45:09 +04:00
|
|
|
|
2014-11-03 23:15:08 +03:00
|
|
|
largeblockok = nvlist_exists(innvl, "largeblockok");
|
2014-06-06 01:19:08 +04:00
|
|
|
embedok = nvlist_exists(innvl, "embedok");
|
2016-07-11 20:45:52 +03:00
|
|
|
compressok = nvlist_exists(innvl, "compressok");
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
rawok = nvlist_exists(innvl, "rawok");
|
2020-01-10 21:16:58 +03:00
|
|
|
savedok = nvlist_exists(innvl, "savedok");
|
2014-06-06 01:19:08 +04:00
|
|
|
|
2016-01-07 00:22:48 +03:00
|
|
|
(void) nvlist_lookup_uint64(innvl, "resume_object", &resumeobj);
|
|
|
|
(void) nvlist_lookup_uint64(innvl, "resume_offset", &resumeoff);
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
(void) nvlist_lookup_string(innvl, "redactbook", &redactbook);
|
|
|
|
|
2021-07-11 04:00:37 +03:00
|
|
|
if ((fp = zfs_file_get(fd)) == NULL)
|
|
|
|
return (SET_ERROR(EBADF));
|
2019-11-21 20:32:57 +03:00
|
|
|
|
|
|
|
off = zfs_file_off(fp);
|
2013-08-28 15:45:09 +04:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dmu_send_outparams_t out = {0};
|
|
|
|
out.dso_outfunc = dump_bytes;
|
2019-11-21 20:32:57 +03:00
|
|
|
out.dso_arg = fp;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
out.dso_dryrun = B_FALSE;
|
2020-01-10 21:16:58 +03:00
|
|
|
error = dmu_send(snapname, fromname, embedok, largeblockok,
|
|
|
|
compressok, rawok, savedok, resumeobj, resumeoff,
|
|
|
|
redactbook, fd, &off, &out);
|
2013-08-28 15:45:09 +04:00
|
|
|
|
2021-07-11 04:00:37 +03:00
|
|
|
zfs_file_put(fp);
|
2013-08-28 15:45:09 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2020-06-15 21:30:37 +03:00
|
|
|
static int
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
send_space_sum(objset_t *os, void *buf, int len, void *arg)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) os, (void) buf;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
uint64_t *size = arg;
|
2022-02-16 04:38:43 +03:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
*size += len;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
/*
|
|
|
|
* Determine approximately how large a zfs send stream will be -- the number
|
|
|
|
* of bytes that will be written to the fd supplied to zfs_ioc_send_new().
|
|
|
|
*
|
|
|
|
* innvl: {
|
2015-04-08 21:37:13 +03:00
|
|
|
* (optional) "from" -> full snap or bookmark name to send an incremental
|
|
|
|
* from
|
2016-07-11 20:45:52 +03:00
|
|
|
* (optional) "largeblockok" -> (value ignored)
|
|
|
|
* indicates that blocks > 128KB are permitted
|
|
|
|
* (optional) "embedok" -> (value ignored)
|
|
|
|
* presence indicates DRR_WRITE_EMBEDDED records are permitted
|
|
|
|
* (optional) "compressok" -> (value ignored)
|
|
|
|
* presence indicates compressed DRR_WRITE records are permitted
|
2020-10-03 03:40:46 +03:00
|
|
|
* (optional) "rawok" -> (value ignored)
|
2017-08-31 19:00:35 +03:00
|
|
|
* presence indicates raw encrypted records should be used.
|
2020-10-03 03:40:46 +03:00
|
|
|
* (optional) "resume_object" and "resume_offset" -> (uint64)
|
|
|
|
* if present, resume send stream from specified object and offset.
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
* (optional) "fd" -> file descriptor to use as a cookie for progress
|
|
|
|
* tracking (int32)
|
2013-08-28 15:45:09 +04:00
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl: {
|
|
|
|
* "space" -> bytes of space (uint64)
|
|
|
|
* }
|
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_send_space[] = {
|
|
|
|
{"from", DATA_TYPE_STRING, ZK_OPTIONAL},
|
|
|
|
{"fromsnap", DATA_TYPE_STRING, ZK_OPTIONAL},
|
|
|
|
{"largeblockok", DATA_TYPE_BOOLEAN, ZK_OPTIONAL},
|
|
|
|
{"embedok", DATA_TYPE_BOOLEAN, ZK_OPTIONAL},
|
|
|
|
{"compressok", DATA_TYPE_BOOLEAN, ZK_OPTIONAL},
|
|
|
|
{"rawok", DATA_TYPE_BOOLEAN, ZK_OPTIONAL},
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
{"fd", DATA_TYPE_INT32, ZK_OPTIONAL},
|
|
|
|
{"redactbook", DATA_TYPE_STRING, ZK_OPTIONAL},
|
2020-10-03 03:40:46 +03:00
|
|
|
{"resume_object", DATA_TYPE_UINT64, ZK_OPTIONAL},
|
|
|
|
{"resume_offset", DATA_TYPE_UINT64, ZK_OPTIONAL},
|
|
|
|
{"bytes", DATA_TYPE_UINT64, ZK_OPTIONAL},
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
};
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
static int
|
|
|
|
zfs_ioc_send_space(const char *snapname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_t *dp;
|
|
|
|
dsl_dataset_t *tosnap;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dsl_dataset_t *fromsnap = NULL;
|
2013-08-28 15:45:09 +04:00
|
|
|
int error;
|
2023-03-11 21:39:24 +03:00
|
|
|
const char *fromname = NULL;
|
|
|
|
const char *redactlist_book = NULL;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
boolean_t largeblockok;
|
|
|
|
boolean_t embedok;
|
2016-07-11 20:45:52 +03:00
|
|
|
boolean_t compressok;
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
boolean_t rawok;
|
2020-01-10 21:16:58 +03:00
|
|
|
boolean_t savedok;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
uint64_t space = 0;
|
|
|
|
boolean_t full_estimate = B_FALSE;
|
|
|
|
uint64_t resumeobj = 0;
|
|
|
|
uint64_t resumeoff = 0;
|
|
|
|
uint64_t resume_bytes = 0;
|
|
|
|
int32_t fd = -1;
|
|
|
|
zfs_bookmark_phys_t zbm = {0};
|
2013-08-28 15:45:09 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_pool_hold(snapname, FTAG, &dp);
|
|
|
|
if (error != 0)
|
2013-08-28 15:45:09 +04:00
|
|
|
return (error);
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_hold(dp, snapname, FTAG, &tosnap);
|
|
|
|
if (error != 0) {
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
(void) nvlist_lookup_int32(innvl, "fd", &fd);
|
2013-09-04 16:00:57 +04:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
largeblockok = nvlist_exists(innvl, "largeblockok");
|
|
|
|
embedok = nvlist_exists(innvl, "embedok");
|
2016-07-11 20:45:52 +03:00
|
|
|
compressok = nvlist_exists(innvl, "compressok");
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
rawok = nvlist_exists(innvl, "rawok");
|
2020-01-10 21:16:58 +03:00
|
|
|
savedok = nvlist_exists(innvl, "savedok");
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
boolean_t from = (nvlist_lookup_string(innvl, "from", &fromname) == 0);
|
|
|
|
boolean_t altbook = (nvlist_lookup_string(innvl, "redactbook",
|
|
|
|
&redactlist_book) == 0);
|
|
|
|
|
|
|
|
(void) nvlist_lookup_uint64(innvl, "resume_object", &resumeobj);
|
|
|
|
(void) nvlist_lookup_uint64(innvl, "resume_offset", &resumeoff);
|
|
|
|
(void) nvlist_lookup_uint64(innvl, "bytes", &resume_bytes);
|
|
|
|
|
|
|
|
if (altbook) {
|
|
|
|
full_estimate = B_TRUE;
|
|
|
|
} else if (from) {
|
|
|
|
if (strchr(fromname, '#')) {
|
|
|
|
error = dsl_bookmark_lookup(dp, fromname, tosnap, &zbm);
|
2016-07-11 20:45:52 +03:00
|
|
|
|
2015-04-08 21:37:13 +03:00
|
|
|
/*
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
* dsl_bookmark_lookup() will fail with EXDEV if
|
|
|
|
* the from-bookmark and tosnap are at the same txg.
|
|
|
|
* However, it's valid to do a send (and therefore,
|
|
|
|
* a send estimate) from and to the same time point,
|
|
|
|
* if the bookmark is redacted (the incremental send
|
|
|
|
* can change what's redacted on the target). In
|
|
|
|
* this case, dsl_bookmark_lookup() fills in zbm
|
|
|
|
* but returns EXDEV. Ignore this error.
|
2015-04-08 21:37:13 +03:00
|
|
|
*/
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (error == EXDEV && zbm.zbm_redaction_obj != 0 &&
|
|
|
|
zbm.zbm_guid ==
|
|
|
|
dsl_dataset_phys(tosnap)->ds_guid)
|
|
|
|
error = 0;
|
|
|
|
|
|
|
|
if (error != 0) {
|
|
|
|
dsl_dataset_rele(tosnap, FTAG);
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
if (zbm.zbm_redaction_obj != 0 || !(zbm.zbm_flags &
|
|
|
|
ZBM_FLAG_HAS_FBN)) {
|
|
|
|
full_estimate = B_TRUE;
|
|
|
|
}
|
|
|
|
} else if (strchr(fromname, '@')) {
|
2015-04-08 21:37:13 +03:00
|
|
|
error = dsl_dataset_hold(dp, fromname, FTAG, &fromsnap);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (error != 0) {
|
|
|
|
dsl_dataset_rele(tosnap, FTAG);
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
2015-04-08 21:37:13 +03:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (!dsl_dataset_is_before(tosnap, fromsnap, 0)) {
|
|
|
|
full_estimate = B_TRUE;
|
|
|
|
dsl_dataset_rele(fromsnap, FTAG);
|
|
|
|
}
|
2015-04-08 21:37:13 +03:00
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* from is not properly formatted as a snapshot or
|
|
|
|
* bookmark
|
|
|
|
*/
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dsl_dataset_rele(tosnap, FTAG);
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
return (SET_ERROR(EINVAL));
|
2013-08-28 15:45:09 +04:00
|
|
|
}
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
if (full_estimate) {
|
|
|
|
dmu_send_outparams_t out = {0};
|
|
|
|
offset_t off = 0;
|
|
|
|
out.dso_outfunc = send_space_sum;
|
|
|
|
out.dso_arg = &space;
|
|
|
|
out.dso_dryrun = B_TRUE;
|
2017-11-08 20:09:45 +03:00
|
|
|
/*
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
* We have to release these holds so dmu_send can take them. It
|
|
|
|
* will do all the error checking we need.
|
2017-11-08 20:09:45 +03:00
|
|
|
*/
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dsl_dataset_rele(tosnap, FTAG);
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
|
|
|
error = dmu_send(snapname, fromname, embedok, largeblockok,
|
2020-01-10 21:16:58 +03:00
|
|
|
compressok, rawok, savedok, resumeobj, resumeoff,
|
|
|
|
redactlist_book, fd, &off, &out);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
} else {
|
|
|
|
error = dmu_send_estimate_fast(tosnap, fromsnap,
|
|
|
|
(from && strchr(fromname, '#') != NULL ? &zbm : NULL),
|
2020-01-10 21:16:58 +03:00
|
|
|
compressok || rawok, savedok, &space);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
space -= resume_bytes;
|
|
|
|
if (fromsnap != NULL)
|
|
|
|
dsl_dataset_rele(fromsnap, FTAG);
|
|
|
|
dsl_dataset_rele(tosnap, FTAG);
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
2013-08-28 15:45:09 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
fnvlist_add_uint64(outnvl, "space", space);
|
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2017-05-19 22:33:11 +03:00
|
|
|
/*
|
|
|
|
* Sync the currently open TXG to disk for the specified pool.
|
|
|
|
* This is somewhat similar to 'zfs_sync()'.
|
|
|
|
* For cases that do not result in error this ioctl will wait for
|
|
|
|
* the currently open TXG to commit before returning back to the caller.
|
|
|
|
*
|
|
|
|
* innvl: {
|
|
|
|
* "force" -> when true, force uberblock update even if there is no dirty data.
|
|
|
|
* In addition this will cause the vdev configuration to be written
|
|
|
|
* out including updating the zpool cache file. (boolean_t)
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* onvl is unused
|
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_pool_sync[] = {
|
|
|
|
{"force", DATA_TYPE_BOOLEAN_VALUE, 0},
|
|
|
|
};
|
|
|
|
|
2017-05-19 22:33:11 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_pool_sync(const char *pool, nvlist_t *innvl, nvlist_t *onvl)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) onvl;
|
2017-05-19 22:33:11 +03:00
|
|
|
int err;
|
2020-12-10 01:52:45 +03:00
|
|
|
boolean_t rc, force = B_FALSE;
|
2017-05-19 22:33:11 +03:00
|
|
|
spa_t *spa;
|
|
|
|
|
|
|
|
if ((err = spa_open(pool, &spa, FTAG)) != 0)
|
|
|
|
return (err);
|
|
|
|
|
2020-12-10 01:52:45 +03:00
|
|
|
if (innvl) {
|
|
|
|
err = nvlist_lookup_boolean_value(innvl, "force", &rc);
|
|
|
|
if (err == 0)
|
|
|
|
force = rc;
|
|
|
|
}
|
2017-08-21 23:11:11 +03:00
|
|
|
|
2017-05-19 22:33:11 +03:00
|
|
|
if (force) {
|
|
|
|
spa_config_enter(spa, SCL_CONFIG, FTAG, RW_WRITER);
|
|
|
|
vdev_config_dirty(spa->spa_root_vdev);
|
|
|
|
spa_config_exit(spa, SCL_CONFIG, FTAG);
|
|
|
|
}
|
|
|
|
txg_wait_synced(spa_get_dsl(spa), 0);
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
|
2017-05-19 22:33:11 +03:00
|
|
|
spa_close(spa, FTAG);
|
|
|
|
|
2020-12-10 01:52:45 +03:00
|
|
|
return (0);
|
2017-05-19 22:33:11 +03:00
|
|
|
}
|
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
/*
|
|
|
|
* Load a user's wrapping key into the kernel.
|
|
|
|
* innvl: {
|
|
|
|
* "hidden_args" -> { "wkeydata" -> value }
|
|
|
|
* raw uint8_t array of encryption wrapping key data (32 bytes)
|
|
|
|
* (optional) "noop" -> (value ignored)
|
|
|
|
* presence indicated key should only be verified, not loaded
|
|
|
|
* }
|
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_load_key[] = {
|
|
|
|
{"hidden_args", DATA_TYPE_NVLIST, 0},
|
|
|
|
{"noop", DATA_TYPE_BOOLEAN, ZK_OPTIONAL},
|
|
|
|
};
|
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_load_key(const char *dsname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) outnvl;
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
int ret;
|
|
|
|
dsl_crypto_params_t *dcp = NULL;
|
|
|
|
nvlist_t *hidden_args;
|
|
|
|
boolean_t noop = nvlist_exists(innvl, "noop");
|
|
|
|
|
|
|
|
if (strchr(dsname, '@') != NULL || strchr(dsname, '%') != NULL) {
|
|
|
|
ret = SET_ERROR(EINVAL);
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
hidden_args = fnvlist_lookup_nvlist(innvl, ZPOOL_HIDDEN_ARGS);
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
|
|
|
|
ret = dsl_crypto_params_create_nvlist(DCP_CMD_NONE, NULL,
|
|
|
|
hidden_args, &dcp);
|
|
|
|
if (ret != 0)
|
|
|
|
goto error;
|
|
|
|
|
|
|
|
ret = spa_keystore_load_wkey(dsname, dcp, noop);
|
|
|
|
if (ret != 0)
|
|
|
|
goto error;
|
|
|
|
|
|
|
|
dsl_crypto_params_free(dcp, noop);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
error:
|
|
|
|
dsl_crypto_params_free(dcp, B_TRUE);
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Unload a user's wrapping key from the kernel.
|
|
|
|
* Both innvl and outnvl are unused.
|
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_unload_key[] = {
|
|
|
|
/* no nvl keys */
|
|
|
|
};
|
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_unload_key(const char *dsname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) innvl, (void) outnvl;
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
if (strchr(dsname, '@') != NULL || strchr(dsname, '%') != NULL) {
|
|
|
|
ret = (SET_ERROR(EINVAL));
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = spa_keystore_unload_wkey(dsname);
|
|
|
|
if (ret != 0)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
out:
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Changes a user's wrapping key used to decrypt a dataset. The keyformat,
|
2022-02-16 04:38:43 +03:00
|
|
|
* keylocation, pbkdf2salt, and pbkdf2iters properties can also be specified
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
* here to change how the key is derived in userspace.
|
|
|
|
*
|
|
|
|
* innvl: {
|
|
|
|
* "hidden_args" (optional) -> { "wkeydata" -> value }
|
|
|
|
* raw uint8_t array of new encryption wrapping key data (32 bytes)
|
|
|
|
* "props" (optional) -> { prop -> value }
|
|
|
|
* }
|
|
|
|
*
|
|
|
|
* outnvl is unused
|
|
|
|
*/
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
static const zfs_ioc_key_t zfs_keys_change_key[] = {
|
|
|
|
{"crypt_cmd", DATA_TYPE_UINT64, ZK_OPTIONAL},
|
|
|
|
{"hidden_args", DATA_TYPE_NVLIST, ZK_OPTIONAL},
|
|
|
|
{"props", DATA_TYPE_NVLIST, ZK_OPTIONAL},
|
|
|
|
};
|
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
static int
|
|
|
|
zfs_ioc_change_key(const char *dsname, nvlist_t *innvl, nvlist_t *outnvl)
|
|
|
|
{
|
2022-02-16 04:38:43 +03:00
|
|
|
(void) outnvl;
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
int ret;
|
|
|
|
uint64_t cmd = DCP_CMD_NONE;
|
|
|
|
dsl_crypto_params_t *dcp = NULL;
|
|
|
|
nvlist_t *args = NULL, *hidden_args = NULL;
|
|
|
|
|
|
|
|
if (strchr(dsname, '@') != NULL || strchr(dsname, '%') != NULL) {
|
|
|
|
ret = (SET_ERROR(EINVAL));
|
|
|
|
goto error;
|
|
|
|
}
|
|
|
|
|
|
|
|
(void) nvlist_lookup_uint64(innvl, "crypt_cmd", &cmd);
|
|
|
|
(void) nvlist_lookup_nvlist(innvl, "props", &args);
|
|
|
|
(void) nvlist_lookup_nvlist(innvl, ZPOOL_HIDDEN_ARGS, &hidden_args);
|
|
|
|
|
|
|
|
ret = dsl_crypto_params_create_nvlist(cmd, args, hidden_args, &dcp);
|
|
|
|
if (ret != 0)
|
|
|
|
goto error;
|
|
|
|
|
|
|
|
ret = spa_keystore_change_key(dsname, dcp);
|
|
|
|
if (ret != 0)
|
|
|
|
goto error;
|
|
|
|
|
|
|
|
dsl_crypto_params_free(dcp, B_FALSE);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
error:
|
|
|
|
dsl_crypto_params_free(dcp, B_TRUE);
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
static zfs_ioc_vec_t zfs_ioc_vec[ZFS_IOC_LAST - ZFS_IOC_FIRST];
|
|
|
|
|
|
|
|
static void
|
|
|
|
zfs_ioctl_register_legacy(zfs_ioc_t ioc, zfs_ioc_legacy_func_t *func,
|
|
|
|
zfs_secpolicy_func_t *secpolicy, zfs_ioc_namecheck_t namecheck,
|
|
|
|
boolean_t log_history, zfs_ioc_poolcheck_t pool_check)
|
|
|
|
{
|
|
|
|
zfs_ioc_vec_t *vec = &zfs_ioc_vec[ioc - ZFS_IOC_FIRST];
|
|
|
|
|
|
|
|
ASSERT3U(ioc, >=, ZFS_IOC_FIRST);
|
|
|
|
ASSERT3U(ioc, <, ZFS_IOC_LAST);
|
|
|
|
ASSERT3P(vec->zvec_legacy_func, ==, NULL);
|
|
|
|
ASSERT3P(vec->zvec_func, ==, NULL);
|
|
|
|
|
|
|
|
vec->zvec_legacy_func = func;
|
|
|
|
vec->zvec_secpolicy = secpolicy;
|
|
|
|
vec->zvec_namecheck = namecheck;
|
|
|
|
vec->zvec_allow_log = log_history;
|
|
|
|
vec->zvec_pool_check = pool_check;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* See the block comment at the beginning of this file for details on
|
|
|
|
* each argument to this function.
|
|
|
|
*/
|
2019-09-27 20:46:28 +03:00
|
|
|
void
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_ioctl_register(const char *name, zfs_ioc_t ioc, zfs_ioc_func_t *func,
|
|
|
|
zfs_secpolicy_func_t *secpolicy, zfs_ioc_namecheck_t namecheck,
|
|
|
|
zfs_ioc_poolcheck_t pool_check, boolean_t smush_outnvlist,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
boolean_t allow_log, const zfs_ioc_key_t *nvl_keys, size_t num_keys)
|
2013-08-28 15:45:09 +04:00
|
|
|
{
|
|
|
|
zfs_ioc_vec_t *vec = &zfs_ioc_vec[ioc - ZFS_IOC_FIRST];
|
|
|
|
|
|
|
|
ASSERT3U(ioc, >=, ZFS_IOC_FIRST);
|
|
|
|
ASSERT3U(ioc, <, ZFS_IOC_LAST);
|
|
|
|
ASSERT3P(vec->zvec_legacy_func, ==, NULL);
|
|
|
|
ASSERT3P(vec->zvec_func, ==, NULL);
|
|
|
|
|
|
|
|
/* if we are logging, the name must be valid */
|
|
|
|
ASSERT(!allow_log || namecheck != NO_NAME);
|
|
|
|
|
|
|
|
vec->zvec_name = name;
|
|
|
|
vec->zvec_func = func;
|
|
|
|
vec->zvec_secpolicy = secpolicy;
|
|
|
|
vec->zvec_namecheck = namecheck;
|
|
|
|
vec->zvec_pool_check = pool_check;
|
|
|
|
vec->zvec_smush_outnvlist = smush_outnvlist;
|
|
|
|
vec->zvec_allow_log = allow_log;
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
vec->zvec_nvl_keys = nvl_keys;
|
|
|
|
vec->zvec_nvl_key_count = num_keys;
|
2013-08-28 15:45:09 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
zfs_ioctl_register_pool(zfs_ioc_t ioc, zfs_ioc_legacy_func_t *func,
|
|
|
|
zfs_secpolicy_func_t *secpolicy, boolean_t log_history,
|
|
|
|
zfs_ioc_poolcheck_t pool_check)
|
|
|
|
{
|
|
|
|
zfs_ioctl_register_legacy(ioc, func, secpolicy,
|
|
|
|
POOL_NAME, log_history, pool_check);
|
|
|
|
}
|
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
void
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_ioctl_register_dataset_nolog(zfs_ioc_t ioc, zfs_ioc_legacy_func_t *func,
|
|
|
|
zfs_secpolicy_func_t *secpolicy, zfs_ioc_poolcheck_t pool_check)
|
|
|
|
{
|
|
|
|
zfs_ioctl_register_legacy(ioc, func, secpolicy,
|
|
|
|
DATASET_NAME, B_FALSE, pool_check);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
zfs_ioctl_register_pool_modify(zfs_ioc_t ioc, zfs_ioc_legacy_func_t *func)
|
|
|
|
{
|
|
|
|
zfs_ioctl_register_legacy(ioc, func, zfs_secpolicy_config,
|
|
|
|
POOL_NAME, B_TRUE, POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
zfs_ioctl_register_pool_meta(zfs_ioc_t ioc, zfs_ioc_legacy_func_t *func,
|
|
|
|
zfs_secpolicy_func_t *secpolicy)
|
|
|
|
{
|
|
|
|
zfs_ioctl_register_legacy(ioc, func, secpolicy,
|
|
|
|
NO_NAME, B_FALSE, POOL_CHECK_NONE);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
zfs_ioctl_register_dataset_read_secpolicy(zfs_ioc_t ioc,
|
|
|
|
zfs_ioc_legacy_func_t *func, zfs_secpolicy_func_t *secpolicy)
|
|
|
|
{
|
|
|
|
zfs_ioctl_register_legacy(ioc, func, secpolicy,
|
|
|
|
DATASET_NAME, B_FALSE, POOL_CHECK_SUSPENDED);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
zfs_ioctl_register_dataset_read(zfs_ioc_t ioc, zfs_ioc_legacy_func_t *func)
|
|
|
|
{
|
|
|
|
zfs_ioctl_register_dataset_read_secpolicy(ioc, func,
|
|
|
|
zfs_secpolicy_read);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
zfs_ioctl_register_dataset_modify(zfs_ioc_t ioc, zfs_ioc_legacy_func_t *func,
|
2017-01-12 20:42:11 +03:00
|
|
|
zfs_secpolicy_func_t *secpolicy)
|
2013-08-28 15:45:09 +04:00
|
|
|
{
|
|
|
|
zfs_ioctl_register_legacy(ioc, func, secpolicy,
|
|
|
|
DATASET_NAME, B_TRUE, POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY);
|
|
|
|
}
|
|
|
|
|
2020-06-15 21:30:37 +03:00
|
|
|
static void
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_ioctl_init(void)
|
|
|
|
{
|
|
|
|
zfs_ioctl_register("snapshot", ZFS_IOC_SNAPSHOT,
|
|
|
|
zfs_ioc_snapshot, zfs_secpolicy_snapshot, POOL_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_snapshot, ARRAY_SIZE(zfs_keys_snapshot));
|
2013-08-28 15:45:09 +04:00
|
|
|
|
|
|
|
zfs_ioctl_register("log_history", ZFS_IOC_LOG_HISTORY,
|
|
|
|
zfs_ioc_log_history, zfs_secpolicy_log_history, NO_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_FALSE, B_FALSE,
|
|
|
|
zfs_keys_log_history, ARRAY_SIZE(zfs_keys_log_history));
|
2013-08-28 15:45:09 +04:00
|
|
|
|
|
|
|
zfs_ioctl_register("space_snaps", ZFS_IOC_SPACE_SNAPS,
|
|
|
|
zfs_ioc_space_snaps, zfs_secpolicy_read, DATASET_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED, B_FALSE, B_FALSE,
|
|
|
|
zfs_keys_space_snaps, ARRAY_SIZE(zfs_keys_space_snaps));
|
2013-08-28 15:45:09 +04:00
|
|
|
|
|
|
|
zfs_ioctl_register("send", ZFS_IOC_SEND_NEW,
|
|
|
|
zfs_ioc_send_new, zfs_secpolicy_send_new, DATASET_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED, B_FALSE, B_FALSE,
|
|
|
|
zfs_keys_send_new, ARRAY_SIZE(zfs_keys_send_new));
|
2013-08-28 15:45:09 +04:00
|
|
|
|
|
|
|
zfs_ioctl_register("send_space", ZFS_IOC_SEND_SPACE,
|
|
|
|
zfs_ioc_send_space, zfs_secpolicy_read, DATASET_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED, B_FALSE, B_FALSE,
|
|
|
|
zfs_keys_send_space, ARRAY_SIZE(zfs_keys_send_space));
|
2013-08-28 15:45:09 +04:00
|
|
|
|
|
|
|
zfs_ioctl_register("create", ZFS_IOC_CREATE,
|
|
|
|
zfs_ioc_create, zfs_secpolicy_create_clone, DATASET_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_create, ARRAY_SIZE(zfs_keys_create));
|
2013-08-28 15:45:09 +04:00
|
|
|
|
|
|
|
zfs_ioctl_register("clone", ZFS_IOC_CLONE,
|
|
|
|
zfs_ioc_clone, zfs_secpolicy_create_clone, DATASET_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_clone, ARRAY_SIZE(zfs_keys_clone));
|
2013-08-28 15:45:09 +04:00
|
|
|
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
zfs_ioctl_register("remap", ZFS_IOC_REMAP,
|
2019-06-25 02:44:01 +03:00
|
|
|
zfs_ioc_remap, zfs_secpolicy_none, DATASET_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_FALSE, B_TRUE,
|
|
|
|
zfs_keys_remap, ARRAY_SIZE(zfs_keys_remap));
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_ioctl_register("destroy_snaps", ZFS_IOC_DESTROY_SNAPS,
|
|
|
|
zfs_ioc_destroy_snaps, zfs_secpolicy_destroy_snaps, POOL_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_destroy_snaps, ARRAY_SIZE(zfs_keys_destroy_snaps));
|
2013-08-28 15:45:09 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
zfs_ioctl_register("hold", ZFS_IOC_HOLD,
|
|
|
|
zfs_ioc_hold, zfs_secpolicy_hold, POOL_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_hold, ARRAY_SIZE(zfs_keys_hold));
|
2013-09-04 16:00:57 +04:00
|
|
|
zfs_ioctl_register("release", ZFS_IOC_RELEASE,
|
|
|
|
zfs_ioc_release, zfs_secpolicy_release, POOL_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_release, ARRAY_SIZE(zfs_keys_release));
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
zfs_ioctl_register("get_holds", ZFS_IOC_GET_HOLDS,
|
|
|
|
zfs_ioc_get_holds, zfs_secpolicy_read, DATASET_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED, B_FALSE, B_FALSE,
|
|
|
|
zfs_keys_get_holds, ARRAY_SIZE(zfs_keys_get_holds));
|
2013-09-04 16:00:57 +04:00
|
|
|
|
2013-08-14 23:42:31 +04:00
|
|
|
zfs_ioctl_register("rollback", ZFS_IOC_ROLLBACK,
|
|
|
|
zfs_ioc_rollback, zfs_secpolicy_rollback, DATASET_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_FALSE, B_TRUE,
|
|
|
|
zfs_keys_rollback, ARRAY_SIZE(zfs_keys_rollback));
|
2013-08-14 23:42:31 +04:00
|
|
|
|
2013-12-12 02:33:41 +04:00
|
|
|
zfs_ioctl_register("bookmark", ZFS_IOC_BOOKMARK,
|
|
|
|
zfs_ioc_bookmark, zfs_secpolicy_bookmark, POOL_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_bookmark, ARRAY_SIZE(zfs_keys_bookmark));
|
2013-12-12 02:33:41 +04:00
|
|
|
|
|
|
|
zfs_ioctl_register("get_bookmarks", ZFS_IOC_GET_BOOKMARKS,
|
|
|
|
zfs_ioc_get_bookmarks, zfs_secpolicy_read, DATASET_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED, B_FALSE, B_FALSE,
|
|
|
|
zfs_keys_get_bookmarks, ARRAY_SIZE(zfs_keys_get_bookmarks));
|
2013-12-12 02:33:41 +04:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
zfs_ioctl_register("get_bookmark_props", ZFS_IOC_GET_BOOKMARK_PROPS,
|
|
|
|
zfs_ioc_get_bookmark_props, zfs_secpolicy_read, ENTITY_NAME,
|
|
|
|
POOL_CHECK_SUSPENDED, B_FALSE, B_FALSE, zfs_keys_get_bookmark_props,
|
|
|
|
ARRAY_SIZE(zfs_keys_get_bookmark_props));
|
|
|
|
|
2013-12-12 02:33:41 +04:00
|
|
|
zfs_ioctl_register("destroy_bookmarks", ZFS_IOC_DESTROY_BOOKMARKS,
|
|
|
|
zfs_ioc_destroy_bookmarks, zfs_secpolicy_destroy_bookmarks,
|
|
|
|
POOL_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_destroy_bookmarks,
|
|
|
|
ARRAY_SIZE(zfs_keys_destroy_bookmarks));
|
2013-12-12 02:33:41 +04:00
|
|
|
|
2016-06-10 03:04:12 +03:00
|
|
|
zfs_ioctl_register("receive", ZFS_IOC_RECV_NEW,
|
2022-02-16 04:38:43 +03:00
|
|
|
zfs_ioc_recv_new, zfs_secpolicy_recv, DATASET_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_recv_new, ARRAY_SIZE(zfs_keys_recv_new));
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
zfs_ioctl_register("load-key", ZFS_IOC_LOAD_KEY,
|
|
|
|
zfs_ioc_load_key, zfs_secpolicy_load_key,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
DATASET_NAME, POOL_CHECK_SUSPENDED, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_load_key, ARRAY_SIZE(zfs_keys_load_key));
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
zfs_ioctl_register("unload-key", ZFS_IOC_UNLOAD_KEY,
|
|
|
|
zfs_ioc_unload_key, zfs_secpolicy_load_key,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
DATASET_NAME, POOL_CHECK_SUSPENDED, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_unload_key, ARRAY_SIZE(zfs_keys_unload_key));
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
zfs_ioctl_register("change-key", ZFS_IOC_CHANGE_KEY,
|
|
|
|
zfs_ioc_change_key, zfs_secpolicy_change_key,
|
|
|
|
DATASET_NAME, POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
B_TRUE, B_TRUE, zfs_keys_change_key,
|
|
|
|
ARRAY_SIZE(zfs_keys_change_key));
|
2016-06-10 03:04:12 +03:00
|
|
|
|
2017-05-19 22:33:11 +03:00
|
|
|
zfs_ioctl_register("sync", ZFS_IOC_POOL_SYNC,
|
|
|
|
zfs_ioc_pool_sync, zfs_secpolicy_none, POOL_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_FALSE, B_FALSE,
|
|
|
|
zfs_keys_pool_sync, ARRAY_SIZE(zfs_keys_pool_sync));
|
2017-10-26 22:26:09 +03:00
|
|
|
zfs_ioctl_register("reopen", ZFS_IOC_POOL_REOPEN, zfs_ioc_pool_reopen,
|
|
|
|
zfs_secpolicy_config, POOL_NAME, POOL_CHECK_SUSPENDED, B_TRUE,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
B_TRUE, zfs_keys_pool_reopen, ARRAY_SIZE(zfs_keys_pool_reopen));
|
2017-05-19 22:33:11 +03:00
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
zfs_ioctl_register("channel_program", ZFS_IOC_CHANNEL_PROGRAM,
|
|
|
|
zfs_ioc_channel_program, zfs_secpolicy_config,
|
|
|
|
POOL_NAME, POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
B_TRUE, zfs_keys_channel_program,
|
|
|
|
ARRAY_SIZE(zfs_keys_channel_program));
|
2018-02-08 19:16:23 +03:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
zfs_ioctl_register("redact", ZFS_IOC_REDACT,
|
|
|
|
zfs_ioc_redact, zfs_secpolicy_config, DATASET_NAME,
|
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_redact, ARRAY_SIZE(zfs_keys_redact));
|
|
|
|
|
2016-12-17 01:11:29 +03:00
|
|
|
zfs_ioctl_register("zpool_checkpoint", ZFS_IOC_POOL_CHECKPOINT,
|
|
|
|
zfs_ioc_pool_checkpoint, zfs_secpolicy_config, POOL_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_pool_checkpoint, ARRAY_SIZE(zfs_keys_pool_checkpoint));
|
2016-12-17 01:11:29 +03:00
|
|
|
|
|
|
|
zfs_ioctl_register("zpool_discard_checkpoint",
|
|
|
|
ZFS_IOC_POOL_DISCARD_CHECKPOINT, zfs_ioc_pool_discard_checkpoint,
|
|
|
|
zfs_secpolicy_config, POOL_NAME,
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_pool_discard_checkpoint,
|
|
|
|
ARRAY_SIZE(zfs_keys_pool_discard_checkpoint));
|
2016-12-17 01:11:29 +03:00
|
|
|
|
OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========
The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.
SOLUTION
=========
This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.
When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
- new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
- start, suspend, or cancel initialization
- Creates new open-context thread for each vdev
- Thread iterates through all metaslabs in this vdev
- Each metaslab:
- select a metaslab
- load the metaslab
- mark the metaslab as being zeroed
- walk all free ranges within that metaslab and translate
them to ranges on the leaf vdev
- issue a "zeroing" I/O on the leaf vdev that corresponds to
a free range on the metaslab we're working on
- continue until all free ranges for this metaslab have been
"zeroed"
- reset/unmark the metaslab being zeroed
- if more metaslabs exist, then repeat above tasks.
- if no more metaslabs, then we're done.
- progress for the initialization is stored on-disk in the vdev’s
leaf zap object. The following information is stored:
- the last offset that has been initialized
- the state of the initialization process (i.e. active,
suspended, or canceled)
- the start time for the initialization
- progress is reported via the zpool status command and shows
information for each of the vdevs that are initializing
Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2018-12-19 17:54:59 +03:00
|
|
|
zfs_ioctl_register("initialize", ZFS_IOC_POOL_INITIALIZE,
|
|
|
|
zfs_ioc_pool_initialize, zfs_secpolicy_config, POOL_NAME,
|
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_pool_initialize, ARRAY_SIZE(zfs_keys_pool_initialize));
|
|
|
|
|
2019-03-29 19:13:20 +03:00
|
|
|
zfs_ioctl_register("trim", ZFS_IOC_POOL_TRIM,
|
|
|
|
zfs_ioc_pool_trim, zfs_secpolicy_config, POOL_NAME,
|
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_TRUE, B_TRUE,
|
|
|
|
zfs_keys_pool_trim, ARRAY_SIZE(zfs_keys_pool_trim));
|
|
|
|
|
Add subcommand to wait for background zfs activity to complete
Currently the best way to wait for the completion of a long-running
operation in a pool, like a scrub or device removal, is to poll 'zpool
status' and parse its output, which is neither efficient nor convenient.
This change adds a 'wait' subcommand to the zpool command. When invoked,
'zpool wait' will block until a specified type of background activity
completes. Currently, this subcommand can wait for any of the following:
- Scrubs or resilvers to complete
- Devices to initialized
- Devices to be replaced
- Devices to be removed
- Checkpoints to be discarded
- Background freeing to complete
For example, a scrub that is in progress could be waited for by running
zpool wait -t scrub <pool>
This also adds a -w flag to the attach, checkpoint, initialize, replace,
remove, and scrub subcommands. When used, this flag makes the operations
kicked off by these subcommands synchronous instead of asynchronous.
This functionality is implemented using a new ioctl. The type of
activity to wait for is provided as input to the ioctl, and the ioctl
blocks until all activity of that type has completed. An ioctl was used
over other methods of kernel-userspace communiction primarily for the
sake of portability.
Porting Notes:
This is ported from Delphix OS change DLPX-44432. The following changes
were made while porting:
- Added ZoL-style ioctl input declaration.
- Reorganized error handling in zpool_initialize in libzfs to integrate
better with changes made for TRIM support.
- Fixed check for whether a checkpoint discard is in progress.
Previously it also waited if the pool had a checkpoint, instead of
just if a checkpoint was being discarded.
- Exposed zfs_initialize_chunk_size as a ZoL-style tunable.
- Updated more existing tests to make use of new 'zpool wait'
functionality, tests that don't exist in Delphix OS.
- Used existing ZoL tunable zfs_scan_suspend_progress, together with
zinject, in place of a new tunable zfs_scan_max_blks_per_txg.
- Added support for a non-integral interval argument to zpool wait.
Future work:
ZoL has support for trimming devices, which Delphix OS does not. In the
future, 'zpool wait' could be extended to add the ability to wait for
trim operations to complete.
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
Closes #9162
2019-09-14 04:09:06 +03:00
|
|
|
zfs_ioctl_register("wait", ZFS_IOC_WAIT,
|
|
|
|
zfs_ioc_wait, zfs_secpolicy_none, POOL_NAME,
|
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_FALSE, B_FALSE,
|
|
|
|
zfs_keys_pool_wait, ARRAY_SIZE(zfs_keys_pool_wait));
|
|
|
|
|
2020-04-01 20:02:06 +03:00
|
|
|
zfs_ioctl_register("wait_fs", ZFS_IOC_WAIT_FS,
|
|
|
|
zfs_ioc_wait_fs, zfs_secpolicy_none, DATASET_NAME,
|
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_FALSE, B_FALSE,
|
|
|
|
zfs_keys_fs_wait, ARRAY_SIZE(zfs_keys_fs_wait));
|
|
|
|
|
2020-05-07 19:36:33 +03:00
|
|
|
zfs_ioctl_register("set_bootenv", ZFS_IOC_SET_BOOTENV,
|
|
|
|
zfs_ioc_set_bootenv, zfs_secpolicy_config, POOL_NAME,
|
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_FALSE, B_TRUE,
|
|
|
|
zfs_keys_set_bootenv, ARRAY_SIZE(zfs_keys_set_bootenv));
|
|
|
|
|
|
|
|
zfs_ioctl_register("get_bootenv", ZFS_IOC_GET_BOOTENV,
|
|
|
|
zfs_ioc_get_bootenv, zfs_secpolicy_none, POOL_NAME,
|
|
|
|
POOL_CHECK_SUSPENDED, B_FALSE, B_TRUE,
|
|
|
|
zfs_keys_get_bootenv, ARRAY_SIZE(zfs_keys_get_bootenv));
|
|
|
|
|
2021-11-30 17:46:25 +03:00
|
|
|
zfs_ioctl_register("zpool_vdev_get_props", ZFS_IOC_VDEV_GET_PROPS,
|
|
|
|
zfs_ioc_vdev_get_props, zfs_secpolicy_read, POOL_NAME,
|
|
|
|
POOL_CHECK_NONE, B_FALSE, B_FALSE, zfs_keys_vdev_get_props,
|
|
|
|
ARRAY_SIZE(zfs_keys_vdev_get_props));
|
|
|
|
|
|
|
|
zfs_ioctl_register("zpool_vdev_set_props", ZFS_IOC_VDEV_SET_PROPS,
|
|
|
|
zfs_ioc_vdev_set_props, zfs_secpolicy_config, POOL_NAME,
|
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY, B_FALSE, B_FALSE,
|
|
|
|
zfs_keys_vdev_set_props, ARRAY_SIZE(zfs_keys_vdev_set_props));
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
/* IOCTLS that use the legacy function signature */
|
|
|
|
|
|
|
|
zfs_ioctl_register_legacy(ZFS_IOC_POOL_FREEZE, zfs_ioc_pool_freeze,
|
|
|
|
zfs_secpolicy_config, NO_NAME, B_FALSE, POOL_CHECK_READONLY);
|
|
|
|
|
|
|
|
zfs_ioctl_register_pool(ZFS_IOC_POOL_CREATE, zfs_ioc_pool_create,
|
|
|
|
zfs_secpolicy_config, B_TRUE, POOL_CHECK_NONE);
|
|
|
|
zfs_ioctl_register_pool_modify(ZFS_IOC_POOL_SCAN,
|
|
|
|
zfs_ioc_pool_scan);
|
|
|
|
zfs_ioctl_register_pool_modify(ZFS_IOC_POOL_UPGRADE,
|
|
|
|
zfs_ioc_pool_upgrade);
|
|
|
|
zfs_ioctl_register_pool_modify(ZFS_IOC_VDEV_ADD,
|
|
|
|
zfs_ioc_vdev_add);
|
|
|
|
zfs_ioctl_register_pool_modify(ZFS_IOC_VDEV_REMOVE,
|
|
|
|
zfs_ioc_vdev_remove);
|
|
|
|
zfs_ioctl_register_pool_modify(ZFS_IOC_VDEV_SET_STATE,
|
|
|
|
zfs_ioc_vdev_set_state);
|
|
|
|
zfs_ioctl_register_pool_modify(ZFS_IOC_VDEV_ATTACH,
|
|
|
|
zfs_ioc_vdev_attach);
|
|
|
|
zfs_ioctl_register_pool_modify(ZFS_IOC_VDEV_DETACH,
|
|
|
|
zfs_ioc_vdev_detach);
|
|
|
|
zfs_ioctl_register_pool_modify(ZFS_IOC_VDEV_SETPATH,
|
|
|
|
zfs_ioc_vdev_setpath);
|
|
|
|
zfs_ioctl_register_pool_modify(ZFS_IOC_VDEV_SETFRU,
|
|
|
|
zfs_ioc_vdev_setfru);
|
|
|
|
zfs_ioctl_register_pool_modify(ZFS_IOC_POOL_SET_PROPS,
|
|
|
|
zfs_ioc_pool_set_props);
|
|
|
|
zfs_ioctl_register_pool_modify(ZFS_IOC_VDEV_SPLIT,
|
|
|
|
zfs_ioc_vdev_split);
|
|
|
|
zfs_ioctl_register_pool_modify(ZFS_IOC_POOL_REGUID,
|
|
|
|
zfs_ioc_pool_reguid);
|
|
|
|
|
|
|
|
zfs_ioctl_register_pool_meta(ZFS_IOC_POOL_CONFIGS,
|
|
|
|
zfs_ioc_pool_configs, zfs_secpolicy_none);
|
|
|
|
zfs_ioctl_register_pool_meta(ZFS_IOC_POOL_TRYIMPORT,
|
|
|
|
zfs_ioc_pool_tryimport, zfs_secpolicy_config);
|
|
|
|
zfs_ioctl_register_pool_meta(ZFS_IOC_INJECT_FAULT,
|
|
|
|
zfs_ioc_inject_fault, zfs_secpolicy_inject);
|
|
|
|
zfs_ioctl_register_pool_meta(ZFS_IOC_CLEAR_FAULT,
|
|
|
|
zfs_ioc_clear_fault, zfs_secpolicy_inject);
|
|
|
|
zfs_ioctl_register_pool_meta(ZFS_IOC_INJECT_LIST_NEXT,
|
|
|
|
zfs_ioc_inject_list_next, zfs_secpolicy_inject);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* pool destroy, and export don't log the history as part of
|
|
|
|
* zfsdev_ioctl, but rather zfs_ioc_pool_export
|
|
|
|
* does the logging of those commands.
|
|
|
|
*/
|
|
|
|
zfs_ioctl_register_pool(ZFS_IOC_POOL_DESTROY, zfs_ioc_pool_destroy,
|
2015-02-28 01:35:56 +03:00
|
|
|
zfs_secpolicy_config, B_FALSE, POOL_CHECK_SUSPENDED);
|
2013-08-28 15:45:09 +04:00
|
|
|
zfs_ioctl_register_pool(ZFS_IOC_POOL_EXPORT, zfs_ioc_pool_export,
|
2015-02-28 01:35:56 +03:00
|
|
|
zfs_secpolicy_config, B_FALSE, POOL_CHECK_SUSPENDED);
|
2013-08-28 15:45:09 +04:00
|
|
|
|
|
|
|
zfs_ioctl_register_pool(ZFS_IOC_POOL_STATS, zfs_ioc_pool_stats,
|
|
|
|
zfs_secpolicy_read, B_FALSE, POOL_CHECK_NONE);
|
|
|
|
zfs_ioctl_register_pool(ZFS_IOC_POOL_GET_PROPS, zfs_ioc_pool_get_props,
|
|
|
|
zfs_secpolicy_read, B_FALSE, POOL_CHECK_NONE);
|
|
|
|
|
|
|
|
zfs_ioctl_register_pool(ZFS_IOC_ERROR_LOG, zfs_ioc_error_log,
|
|
|
|
zfs_secpolicy_inject, B_FALSE, POOL_CHECK_SUSPENDED);
|
|
|
|
zfs_ioctl_register_pool(ZFS_IOC_DSOBJ_TO_DSNAME,
|
|
|
|
zfs_ioc_dsobj_to_dsname,
|
|
|
|
zfs_secpolicy_diff, B_FALSE, POOL_CHECK_SUSPENDED);
|
|
|
|
zfs_ioctl_register_pool(ZFS_IOC_POOL_GET_HISTORY,
|
|
|
|
zfs_ioc_pool_get_history,
|
|
|
|
zfs_secpolicy_config, B_FALSE, POOL_CHECK_SUSPENDED);
|
|
|
|
|
|
|
|
zfs_ioctl_register_pool(ZFS_IOC_POOL_IMPORT, zfs_ioc_pool_import,
|
|
|
|
zfs_secpolicy_config, B_TRUE, POOL_CHECK_NONE);
|
|
|
|
|
|
|
|
zfs_ioctl_register_pool(ZFS_IOC_CLEAR, zfs_ioc_clear,
|
2017-07-07 20:39:53 +03:00
|
|
|
zfs_secpolicy_config, B_TRUE, POOL_CHECK_READONLY);
|
2013-08-28 15:45:09 +04:00
|
|
|
|
|
|
|
zfs_ioctl_register_dataset_read(ZFS_IOC_SPACE_WRITTEN,
|
|
|
|
zfs_ioc_space_written);
|
|
|
|
zfs_ioctl_register_dataset_read(ZFS_IOC_OBJSET_RECVD_PROPS,
|
|
|
|
zfs_ioc_objset_recvd_props);
|
|
|
|
zfs_ioctl_register_dataset_read(ZFS_IOC_NEXT_OBJ,
|
|
|
|
zfs_ioc_next_obj);
|
|
|
|
zfs_ioctl_register_dataset_read(ZFS_IOC_GET_FSACL,
|
|
|
|
zfs_ioc_get_fsacl);
|
|
|
|
zfs_ioctl_register_dataset_read(ZFS_IOC_OBJSET_STATS,
|
|
|
|
zfs_ioc_objset_stats);
|
|
|
|
zfs_ioctl_register_dataset_read(ZFS_IOC_OBJSET_ZPLPROPS,
|
|
|
|
zfs_ioc_objset_zplprops);
|
|
|
|
zfs_ioctl_register_dataset_read(ZFS_IOC_DATASET_LIST_NEXT,
|
|
|
|
zfs_ioc_dataset_list_next);
|
|
|
|
zfs_ioctl_register_dataset_read(ZFS_IOC_SNAPSHOT_LIST_NEXT,
|
|
|
|
zfs_ioc_snapshot_list_next);
|
|
|
|
zfs_ioctl_register_dataset_read(ZFS_IOC_SEND_PROGRESS,
|
|
|
|
zfs_ioc_send_progress);
|
|
|
|
|
|
|
|
zfs_ioctl_register_dataset_read_secpolicy(ZFS_IOC_DIFF,
|
|
|
|
zfs_ioc_diff, zfs_secpolicy_diff);
|
|
|
|
zfs_ioctl_register_dataset_read_secpolicy(ZFS_IOC_OBJ_TO_STATS,
|
|
|
|
zfs_ioc_obj_to_stats, zfs_secpolicy_diff);
|
|
|
|
zfs_ioctl_register_dataset_read_secpolicy(ZFS_IOC_OBJ_TO_PATH,
|
|
|
|
zfs_ioc_obj_to_path, zfs_secpolicy_diff);
|
|
|
|
zfs_ioctl_register_dataset_read_secpolicy(ZFS_IOC_USERSPACE_ONE,
|
|
|
|
zfs_ioc_userspace_one, zfs_secpolicy_userspace_one);
|
|
|
|
zfs_ioctl_register_dataset_read_secpolicy(ZFS_IOC_USERSPACE_MANY,
|
|
|
|
zfs_ioc_userspace_many, zfs_secpolicy_userspace_many);
|
|
|
|
zfs_ioctl_register_dataset_read_secpolicy(ZFS_IOC_SEND,
|
|
|
|
zfs_ioc_send, zfs_secpolicy_send);
|
|
|
|
|
|
|
|
zfs_ioctl_register_dataset_modify(ZFS_IOC_SET_PROP, zfs_ioc_set_prop,
|
|
|
|
zfs_secpolicy_none);
|
|
|
|
zfs_ioctl_register_dataset_modify(ZFS_IOC_DESTROY, zfs_ioc_destroy,
|
|
|
|
zfs_secpolicy_destroy);
|
|
|
|
zfs_ioctl_register_dataset_modify(ZFS_IOC_RENAME, zfs_ioc_rename,
|
|
|
|
zfs_secpolicy_rename);
|
|
|
|
zfs_ioctl_register_dataset_modify(ZFS_IOC_RECV, zfs_ioc_recv,
|
|
|
|
zfs_secpolicy_recv);
|
|
|
|
zfs_ioctl_register_dataset_modify(ZFS_IOC_PROMOTE, zfs_ioc_promote,
|
|
|
|
zfs_secpolicy_promote);
|
|
|
|
zfs_ioctl_register_dataset_modify(ZFS_IOC_INHERIT_PROP,
|
|
|
|
zfs_ioc_inherit_prop, zfs_secpolicy_inherit_prop);
|
|
|
|
zfs_ioctl_register_dataset_modify(ZFS_IOC_SET_FSACL, zfs_ioc_set_fsacl,
|
|
|
|
zfs_secpolicy_set_fsacl);
|
|
|
|
|
|
|
|
zfs_ioctl_register_dataset_nolog(ZFS_IOC_SHARE, zfs_ioc_share,
|
|
|
|
zfs_secpolicy_share, POOL_CHECK_NONE);
|
|
|
|
zfs_ioctl_register_dataset_nolog(ZFS_IOC_SMB_ACL, zfs_ioc_smb_acl,
|
|
|
|
zfs_secpolicy_smb_acl, POOL_CHECK_NONE);
|
|
|
|
zfs_ioctl_register_dataset_nolog(ZFS_IOC_USERSPACE_UPGRADE,
|
|
|
|
zfs_ioc_userspace_upgrade, zfs_secpolicy_userspace_upgrade,
|
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY);
|
|
|
|
zfs_ioctl_register_dataset_nolog(ZFS_IOC_TMP_SNAPSHOT,
|
|
|
|
zfs_ioc_tmp_snapshot, zfs_secpolicy_tmp_snapshot,
|
|
|
|
POOL_CHECK_SUSPENDED | POOL_CHECK_READONLY);
|
|
|
|
|
|
|
|
zfs_ioctl_register_legacy(ZFS_IOC_EVENTS_NEXT, zfs_ioc_events_next,
|
|
|
|
zfs_secpolicy_config, NO_NAME, B_FALSE, POOL_CHECK_NONE);
|
|
|
|
zfs_ioctl_register_legacy(ZFS_IOC_EVENTS_CLEAR, zfs_ioc_events_clear,
|
|
|
|
zfs_secpolicy_config, NO_NAME, B_FALSE, POOL_CHECK_NONE);
|
2013-11-23 02:52:16 +04:00
|
|
|
zfs_ioctl_register_legacy(ZFS_IOC_EVENTS_SEEK, zfs_ioc_events_seek,
|
|
|
|
zfs_secpolicy_config, NO_NAME, B_FALSE, POOL_CHECK_NONE);
|
2019-09-27 20:46:28 +03:00
|
|
|
|
|
|
|
zfs_ioctl_init_os();
|
2013-08-28 15:45:09 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
/*
|
|
|
|
* Verify that for non-legacy ioctls the input nvlist
|
|
|
|
* pairs match against the expected input.
|
|
|
|
*
|
|
|
|
* Possible errors are:
|
|
|
|
* ZFS_ERR_IOC_ARG_UNAVAIL An unrecognized nvpair was encountered
|
|
|
|
* ZFS_ERR_IOC_ARG_REQUIRED A required nvpair is missing
|
|
|
|
* ZFS_ERR_IOC_ARG_BADTYPE Invalid type for nvpair
|
|
|
|
*/
|
|
|
|
static int
|
|
|
|
zfs_check_input_nvpairs(nvlist_t *innvl, const zfs_ioc_vec_t *vec)
|
|
|
|
{
|
|
|
|
const zfs_ioc_key_t *nvl_keys = vec->zvec_nvl_keys;
|
|
|
|
boolean_t required_keys_found = B_FALSE;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* examine each input pair
|
|
|
|
*/
|
|
|
|
for (nvpair_t *pair = nvlist_next_nvpair(innvl, NULL);
|
|
|
|
pair != NULL; pair = nvlist_next_nvpair(innvl, pair)) {
|
2023-03-11 21:39:24 +03:00
|
|
|
const char *name = nvpair_name(pair);
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
data_type_t type = nvpair_type(pair);
|
|
|
|
boolean_t identified = B_FALSE;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* check pair against the documented names and type
|
|
|
|
*/
|
|
|
|
for (int k = 0; k < vec->zvec_nvl_key_count; k++) {
|
|
|
|
/* if not a wild card name, check for an exact match */
|
|
|
|
if ((nvl_keys[k].zkey_flags & ZK_WILDCARDLIST) == 0 &&
|
|
|
|
strcmp(nvl_keys[k].zkey_name, name) != 0)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
identified = B_TRUE;
|
|
|
|
|
|
|
|
if (nvl_keys[k].zkey_type != DATA_TYPE_ANY &&
|
|
|
|
nvl_keys[k].zkey_type != type) {
|
|
|
|
return (SET_ERROR(ZFS_ERR_IOC_ARG_BADTYPE));
|
|
|
|
}
|
|
|
|
|
|
|
|
if (nvl_keys[k].zkey_flags & ZK_OPTIONAL)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
required_keys_found = B_TRUE;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* allow an 'optional' key, everything else is invalid */
|
|
|
|
if (!identified &&
|
|
|
|
(strcmp(name, "optional") != 0 ||
|
|
|
|
type != DATA_TYPE_NVLIST)) {
|
|
|
|
return (SET_ERROR(ZFS_ERR_IOC_ARG_UNAVAIL));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* verify that all required keys were found */
|
|
|
|
for (int k = 0; k < vec->zvec_nvl_key_count; k++) {
|
|
|
|
if (nvl_keys[k].zkey_flags & ZK_OPTIONAL)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (nvl_keys[k].zkey_flags & ZK_WILDCARDLIST) {
|
2019-09-03 03:56:41 +03:00
|
|
|
/* at least one non-optional key is expected here */
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
if (!required_keys_found)
|
|
|
|
return (SET_ERROR(ZFS_ERR_IOC_ARG_REQUIRED));
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!nvlist_exists(innvl, nvl_keys[k].zkey_name))
|
|
|
|
return (SET_ERROR(ZFS_ERR_IOC_ARG_REQUIRED));
|
|
|
|
}
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2020-06-15 21:30:37 +03:00
|
|
|
static int
|
2010-08-27 01:24:34 +04:00
|
|
|
pool_status_check(const char *name, zfs_ioc_namecheck_t type,
|
|
|
|
zfs_ioc_poolcheck_t check)
|
2009-07-03 02:44:48 +04:00
|
|
|
{
|
|
|
|
spa_t *spa;
|
|
|
|
int error;
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
ASSERT(type == POOL_NAME || type == DATASET_NAME ||
|
|
|
|
type == ENTITY_NAME);
|
2009-07-03 02:44:48 +04:00
|
|
|
|
2010-08-27 01:24:34 +04:00
|
|
|
if (check & POOL_CHECK_NONE)
|
|
|
|
return (0);
|
|
|
|
|
2009-07-03 02:44:48 +04:00
|
|
|
error = spa_open(name, &spa, FTAG);
|
|
|
|
if (error == 0) {
|
2010-08-27 01:24:34 +04:00
|
|
|
if ((check & POOL_CHECK_SUSPENDED) && spa_suspended(spa))
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EAGAIN);
|
2010-08-27 01:24:34 +04:00
|
|
|
else if ((check & POOL_CHECK_READONLY) && !spa_writeable(spa))
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EROFS);
|
2009-07-03 02:44:48 +04:00
|
|
|
spa_close(spa, FTAG);
|
|
|
|
}
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2020-02-29 01:50:32 +03:00
|
|
|
int
|
2021-07-11 04:00:37 +03:00
|
|
|
zfsdev_getminor(zfs_file_t *fp, minor_t *minorp)
|
2020-02-29 01:50:32 +03:00
|
|
|
{
|
|
|
|
zfsdev_state_t *zs, *fpd;
|
|
|
|
|
|
|
|
ASSERT(!MUTEX_HELD(&zfsdev_state_lock));
|
|
|
|
|
|
|
|
fpd = zfs_file_private(fp);
|
|
|
|
if (fpd == NULL)
|
|
|
|
return (SET_ERROR(EBADF));
|
|
|
|
|
|
|
|
mutex_enter(&zfsdev_state_lock);
|
|
|
|
|
2023-02-07 11:23:45 +03:00
|
|
|
for (zs = &zfsdev_state_listhead; zs != NULL; zs = zs->zs_next) {
|
2020-02-29 01:50:32 +03:00
|
|
|
|
|
|
|
if (zs->zs_minor == -1)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (fpd == zs) {
|
|
|
|
*minorp = fpd->zs_minor;
|
|
|
|
mutex_exit(&zfsdev_state_lock);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_exit(&zfsdev_state_lock);
|
|
|
|
|
|
|
|
return (SET_ERROR(EBADF));
|
|
|
|
}
|
|
|
|
|
2021-03-16 15:44:23 +03:00
|
|
|
void *
|
|
|
|
zfsdev_get_state(minor_t minor, enum zfsdev_state_type which)
|
2010-08-26 22:44:39 +04:00
|
|
|
{
|
|
|
|
zfsdev_state_t *zs;
|
|
|
|
|
2023-02-07 11:23:45 +03:00
|
|
|
for (zs = &zfsdev_state_listhead; zs != NULL; zs = zs->zs_next) {
|
2010-08-26 22:44:39 +04:00
|
|
|
if (zs->zs_minor == minor) {
|
2022-09-14 02:59:33 +03:00
|
|
|
membar_consumer();
|
2010-08-26 22:44:39 +04:00
|
|
|
switch (which) {
|
2013-11-01 23:26:11 +04:00
|
|
|
case ZST_ONEXIT:
|
|
|
|
return (zs->zs_onexit);
|
|
|
|
case ZST_ZEVENT:
|
|
|
|
return (zs->zs_zevent);
|
|
|
|
case ZST_ALL:
|
|
|
|
return (zs);
|
2010-08-26 22:44:39 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-11-01 23:26:11 +04:00
|
|
|
return (NULL);
|
2010-08-26 22:44:39 +04:00
|
|
|
}
|
|
|
|
|
2010-08-27 01:24:34 +04:00
|
|
|
/*
|
2010-08-26 22:44:39 +04:00
|
|
|
* Find a free minor number. The zfsdev_state_list is expected to
|
|
|
|
* be short since it is only a list of currently open file handles.
|
2010-08-27 01:24:34 +04:00
|
|
|
*/
|
2021-03-16 16:04:58 +03:00
|
|
|
static minor_t
|
2010-08-27 01:24:34 +04:00
|
|
|
zfsdev_minor_alloc(void)
|
|
|
|
{
|
2010-08-26 22:44:39 +04:00
|
|
|
static minor_t last_minor = 0;
|
2010-08-27 01:24:34 +04:00
|
|
|
minor_t m;
|
|
|
|
|
|
|
|
ASSERT(MUTEX_HELD(&zfsdev_state_lock));
|
|
|
|
|
|
|
|
for (m = last_minor + 1; m != last_minor; m++) {
|
|
|
|
if (m > ZFSDEV_MAX_MINOR)
|
|
|
|
m = 1;
|
2021-03-16 15:44:23 +03:00
|
|
|
if (zfsdev_get_state(m, ZST_ALL) == NULL) {
|
2010-08-27 01:24:34 +04:00
|
|
|
last_minor = m;
|
|
|
|
return (m);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2021-03-16 16:04:58 +03:00
|
|
|
int
|
|
|
|
zfsdev_state_init(void *priv)
|
|
|
|
{
|
|
|
|
zfsdev_state_t *zs, *zsprev = NULL;
|
|
|
|
minor_t minor;
|
|
|
|
boolean_t newzs = B_FALSE;
|
|
|
|
|
|
|
|
ASSERT(MUTEX_HELD(&zfsdev_state_lock));
|
|
|
|
|
|
|
|
minor = zfsdev_minor_alloc();
|
|
|
|
if (minor == 0)
|
|
|
|
return (SET_ERROR(ENXIO));
|
|
|
|
|
2023-02-07 11:23:45 +03:00
|
|
|
for (zs = &zfsdev_state_listhead; zs != NULL; zs = zs->zs_next) {
|
2021-03-16 16:04:58 +03:00
|
|
|
if (zs->zs_minor == -1)
|
|
|
|
break;
|
|
|
|
zsprev = zs;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!zs) {
|
|
|
|
zs = kmem_zalloc(sizeof (zfsdev_state_t), KM_SLEEP);
|
|
|
|
newzs = B_TRUE;
|
|
|
|
}
|
|
|
|
|
|
|
|
zfsdev_private_set_state(priv, zs);
|
|
|
|
|
|
|
|
zfs_onexit_init((zfs_onexit_t **)&zs->zs_onexit);
|
|
|
|
zfs_zevent_init((zfs_zevent_t **)&zs->zs_zevent);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* In order to provide for lock-free concurrent read access
|
|
|
|
* to the minor list in zfsdev_get_state(), new entries
|
|
|
|
* must be completely written before linking them into the
|
|
|
|
* list whereas existing entries are already linked; the last
|
|
|
|
* operation must be updating zs_minor (from -1 to the new
|
|
|
|
* value).
|
|
|
|
*/
|
|
|
|
if (newzs) {
|
|
|
|
zs->zs_minor = minor;
|
|
|
|
membar_producer();
|
|
|
|
zsprev->zs_next = zs;
|
|
|
|
} else {
|
|
|
|
membar_producer();
|
|
|
|
zs->zs_minor = minor;
|
|
|
|
}
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
zfsdev_state_destroy(void *priv)
|
|
|
|
{
|
|
|
|
zfsdev_state_t *zs = zfsdev_private_get_state(priv);
|
|
|
|
|
|
|
|
ASSERT(zs != NULL);
|
|
|
|
ASSERT3S(zs->zs_minor, >, 0);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The last reference to this zfsdev file descriptor is being dropped.
|
|
|
|
* We don't have to worry about lookup grabbing this state object, and
|
|
|
|
* zfsdev_state_init() will not try to reuse this object until it is
|
|
|
|
* invalidated by setting zs_minor to -1. Invalidation must be done
|
|
|
|
* last, with a memory barrier to ensure ordering. This lets us avoid
|
|
|
|
* taking the global zfsdev state lock around destruction.
|
|
|
|
*/
|
|
|
|
zfs_onexit_destroy(zs->zs_onexit);
|
|
|
|
zfs_zevent_destroy(zs->zs_zevent);
|
|
|
|
zs->zs_onexit = NULL;
|
|
|
|
zs->zs_zevent = NULL;
|
|
|
|
membar_producer();
|
|
|
|
zs->zs_minor = -1;
|
|
|
|
}
|
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
long
|
2020-06-08 23:57:22 +03:00
|
|
|
zfsdev_ioctl_common(uint_t vecnum, zfs_cmd_t *zc, int flag)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2020-06-08 23:57:22 +03:00
|
|
|
int error, cmd;
|
2013-08-28 15:45:09 +04:00
|
|
|
const zfs_ioc_vec_t *vec;
|
2013-12-24 00:06:34 +04:00
|
|
|
char *saved_poolname = NULL;
|
2020-08-18 19:33:55 +03:00
|
|
|
uint64_t max_nvlist_src_size;
|
2020-06-18 00:30:03 +03:00
|
|
|
size_t saved_poolname_len = 0;
|
2013-08-28 15:45:09 +04:00
|
|
|
nvlist_t *innvl = NULL;
|
2015-03-31 06:43:29 +03:00
|
|
|
fstrans_cookie_t cookie;
|
2021-01-11 20:29:25 +03:00
|
|
|
hrtime_t start_time = gethrtime();
|
2013-08-28 15:45:09 +04:00
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
cmd = vecnum;
|
2019-12-02 21:08:27 +03:00
|
|
|
error = 0;
|
2013-08-28 15:45:09 +04:00
|
|
|
if (vecnum >= sizeof (zfs_ioc_vec) / sizeof (zfs_ioc_vec[0]))
|
2019-12-02 21:08:27 +03:00
|
|
|
return (SET_ERROR(ZFS_ERR_IOC_CMD_UNAVAIL));
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
vec = &zfs_ioc_vec[vecnum];
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-12-14 02:49:33 +04:00
|
|
|
/*
|
|
|
|
* The registered ioctl list may be sparse, verify that either
|
|
|
|
* a normal or legacy handler are registered.
|
|
|
|
*/
|
|
|
|
if (vec->zvec_func == NULL && vec->zvec_legacy_func == NULL)
|
2019-12-02 21:08:27 +03:00
|
|
|
return (SET_ERROR(ZFS_ERR_IOC_CMD_UNAVAIL));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
zc->zc_iflags = flag & FKIOCTL;
|
2020-08-18 19:33:55 +03:00
|
|
|
max_nvlist_src_size = zfs_max_nvlist_src_size_os();
|
|
|
|
if (zc->zc_nvlist_src_size > max_nvlist_src_size) {
|
2016-06-07 19:16:52 +03:00
|
|
|
/*
|
|
|
|
* Make sure the user doesn't pass in an insane value for
|
|
|
|
* zc_nvlist_src_size. We have to check, since we will end
|
|
|
|
* up allocating that much memory inside of get_nvlist(). This
|
|
|
|
* prevents a nefarious user from allocating tons of kernel
|
|
|
|
* memory.
|
|
|
|
*
|
|
|
|
* Also, we return EINVAL instead of ENOMEM here. The reason
|
|
|
|
* being that returning ENOMEM from an ioctl() has a special
|
|
|
|
* connotation; that the user's size value is too small and
|
|
|
|
* needs to be expanded to hold the nvlist. See
|
|
|
|
* zcmd_expand_dst_nvlist() for details.
|
|
|
|
*/
|
|
|
|
error = SET_ERROR(EINVAL); /* User's size too big */
|
|
|
|
|
|
|
|
} else if (zc->zc_nvlist_src_size != 0) {
|
2013-08-28 15:45:09 +04:00
|
|
|
error = get_nvlist(zc->zc_nvlist_src, zc->zc_nvlist_src_size,
|
|
|
|
zc->zc_iflags, &innvl);
|
|
|
|
if (error != 0)
|
|
|
|
goto out;
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Ensure that all pool/dataset names are valid before we pass down to
|
|
|
|
* the lower layers.
|
|
|
|
*/
|
2013-08-28 15:45:09 +04:00
|
|
|
zc->zc_name[sizeof (zc->zc_name) - 1] = '\0';
|
|
|
|
switch (vec->zvec_namecheck) {
|
|
|
|
case POOL_NAME:
|
|
|
|
if (pool_namecheck(zc->zc_name, NULL, NULL) != 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EINVAL);
|
2013-08-28 15:45:09 +04:00
|
|
|
else
|
2010-08-27 01:24:34 +04:00
|
|
|
error = pool_status_check(zc->zc_name,
|
2013-08-28 15:45:09 +04:00
|
|
|
vec->zvec_namecheck, vec->zvec_pool_check);
|
|
|
|
break;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
case DATASET_NAME:
|
|
|
|
if (dataset_namecheck(zc->zc_name, NULL, NULL) != 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EINVAL);
|
2013-08-28 15:45:09 +04:00
|
|
|
else
|
2010-08-27 01:24:34 +04:00
|
|
|
error = pool_status_check(zc->zc_name,
|
2013-08-28 15:45:09 +04:00
|
|
|
vec->zvec_namecheck, vec->zvec_pool_check);
|
|
|
|
break;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
case ENTITY_NAME:
|
|
|
|
if (entity_namecheck(zc->zc_name, NULL, NULL) != 0) {
|
|
|
|
error = SET_ERROR(EINVAL);
|
|
|
|
} else {
|
|
|
|
error = pool_status_check(zc->zc_name,
|
|
|
|
vec->zvec_namecheck, vec->zvec_pool_check);
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
case NO_NAME:
|
|
|
|
break;
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
Add basic zfs ioc input nvpair validation
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7780
2018-09-02 22:14:01 +03:00
|
|
|
/*
|
|
|
|
* Ensure that all input pairs are valid before we pass them down
|
|
|
|
* to the lower layers.
|
|
|
|
*
|
|
|
|
* The vectored functions can use fnvlist_lookup_{type} for any
|
|
|
|
* required pairs since zfs_check_input_nvpairs() confirmed that
|
|
|
|
* they exist and are of the correct type.
|
|
|
|
*/
|
|
|
|
if (error == 0 && vec->zvec_func != NULL) {
|
|
|
|
error = zfs_check_input_nvpairs(innvl, vec);
|
|
|
|
if (error != 0)
|
|
|
|
goto out;
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2017-01-31 21:24:23 +03:00
|
|
|
if (error == 0) {
|
2016-05-02 20:00:50 +03:00
|
|
|
cookie = spl_fstrans_mark();
|
2013-08-28 15:45:09 +04:00
|
|
|
error = vec->zvec_secpolicy(zc, innvl, CRED());
|
2016-05-02 20:00:50 +03:00
|
|
|
spl_fstrans_unmark(cookie);
|
|
|
|
}
|
2013-08-28 15:45:09 +04:00
|
|
|
|
|
|
|
if (error != 0)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
/* legacy ioctls can modify zc_name */
|
2020-06-18 00:30:03 +03:00
|
|
|
/*
|
|
|
|
* Can't use kmem_strdup() as we might truncate the string and
|
|
|
|
* kmem_strfree() would then free with incorrect size.
|
|
|
|
*/
|
|
|
|
saved_poolname_len = strlen(zc->zc_name) + 1;
|
|
|
|
saved_poolname = kmem_alloc(saved_poolname_len, KM_SLEEP);
|
|
|
|
|
|
|
|
strlcpy(saved_poolname, zc->zc_name, saved_poolname_len);
|
|
|
|
saved_poolname[strcspn(saved_poolname, "/@#")] = '\0';
|
2013-08-28 15:45:09 +04:00
|
|
|
|
|
|
|
if (vec->zvec_func != NULL) {
|
|
|
|
nvlist_t *outnvl;
|
|
|
|
int puterror = 0;
|
|
|
|
spa_t *spa;
|
|
|
|
nvlist_t *lognv = NULL;
|
|
|
|
|
|
|
|
ASSERT(vec->zvec_legacy_func == NULL);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Add the innvl to the lognv before calling the func,
|
|
|
|
* in case the func changes the innvl.
|
|
|
|
*/
|
|
|
|
if (vec->zvec_allow_log) {
|
|
|
|
lognv = fnvlist_alloc();
|
|
|
|
fnvlist_add_string(lognv, ZPOOL_HIST_IOCTL,
|
|
|
|
vec->zvec_name);
|
|
|
|
if (!nvlist_empty(innvl)) {
|
|
|
|
fnvlist_add_nvlist(lognv, ZPOOL_HIST_INPUT_NVL,
|
|
|
|
innvl);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2014-11-21 03:09:39 +03:00
|
|
|
outnvl = fnvlist_alloc();
|
2015-03-31 06:43:29 +03:00
|
|
|
cookie = spl_fstrans_mark();
|
2013-08-28 15:45:09 +04:00
|
|
|
error = vec->zvec_func(zc->zc_name, innvl, outnvl);
|
2015-03-31 06:43:29 +03:00
|
|
|
spl_fstrans_unmark(cookie);
|
2013-08-28 15:45:09 +04:00
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
/*
|
|
|
|
* Some commands can partially execute, modify state, and still
|
|
|
|
* return an error. In these cases, attempt to record what
|
|
|
|
* was modified.
|
|
|
|
*/
|
|
|
|
if ((error == 0 ||
|
|
|
|
(cmd == ZFS_IOC_CHANNEL_PROGRAM && error != EINVAL)) &&
|
|
|
|
vec->zvec_allow_log &&
|
2013-08-28 15:45:09 +04:00
|
|
|
spa_open(zc->zc_name, &spa, FTAG) == 0) {
|
|
|
|
if (!nvlist_empty(outnvl)) {
|
2020-11-14 21:17:16 +03:00
|
|
|
size_t out_size = fnvlist_size(outnvl);
|
|
|
|
if (out_size > zfs_history_output_max) {
|
|
|
|
fnvlist_add_int64(lognv,
|
|
|
|
ZPOOL_HIST_OUTPUT_SIZE, out_size);
|
|
|
|
} else {
|
|
|
|
fnvlist_add_nvlist(lognv,
|
|
|
|
ZPOOL_HIST_OUTPUT_NVL, outnvl);
|
|
|
|
}
|
2013-08-28 15:45:09 +04:00
|
|
|
}
|
2018-02-08 19:16:23 +03:00
|
|
|
if (error != 0) {
|
|
|
|
fnvlist_add_int64(lognv, ZPOOL_HIST_ERRNO,
|
|
|
|
error);
|
|
|
|
}
|
2021-01-11 20:29:25 +03:00
|
|
|
fnvlist_add_int64(lognv, ZPOOL_HIST_ELAPSED_NS,
|
|
|
|
gethrtime() - start_time);
|
2013-08-28 15:45:09 +04:00
|
|
|
(void) spa_history_log_nvl(spa, lognv);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
}
|
|
|
|
fnvlist_free(lognv);
|
|
|
|
|
|
|
|
if (!nvlist_empty(outnvl) || zc->zc_nvlist_dst_size != 0) {
|
|
|
|
int smusherror = 0;
|
|
|
|
if (vec->zvec_smush_outnvlist) {
|
|
|
|
smusherror = nvlist_smush(outnvl,
|
|
|
|
zc->zc_nvlist_dst_size);
|
|
|
|
}
|
|
|
|
if (smusherror == 0)
|
|
|
|
puterror = put_nvlist(zc, outnvl);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (puterror != 0)
|
|
|
|
error = puterror;
|
|
|
|
|
|
|
|
nvlist_free(outnvl);
|
|
|
|
} else {
|
2015-03-31 06:43:29 +03:00
|
|
|
cookie = spl_fstrans_mark();
|
2013-08-28 15:45:09 +04:00
|
|
|
error = vec->zvec_legacy_func(zc);
|
2015-03-31 06:43:29 +03:00
|
|
|
spl_fstrans_unmark(cookie);
|
2013-08-28 15:45:09 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
nvlist_free(innvl);
|
|
|
|
if (error == 0 && vec->zvec_allow_log) {
|
|
|
|
char *s = tsd_get(zfs_allow_log_key);
|
|
|
|
if (s != NULL)
|
2019-10-10 19:47:06 +03:00
|
|
|
kmem_strfree(s);
|
2020-06-18 00:30:03 +03:00
|
|
|
(void) tsd_set(zfs_allow_log_key, kmem_strdup(saved_poolname));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
2020-06-18 00:30:03 +03:00
|
|
|
if (saved_poolname != NULL)
|
|
|
|
kmem_free(saved_poolname, saved_poolname_len);
|
|
|
|
|
2019-12-02 21:08:27 +03:00
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
int
|
|
|
|
zfs_kmod_init(void)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2019-09-27 20:46:28 +03:00
|
|
|
int error;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
if ((error = zvol_init()) != 0)
|
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2019-11-21 20:32:57 +03:00
|
|
|
spa_init(SPA_MODE_READ | SPA_MODE_WRITE);
|
2019-09-27 20:46:28 +03:00
|
|
|
zfs_init();
|
2018-03-13 20:45:55 +03:00
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
zfs_ioctl_init();
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-08-26 22:44:39 +04:00
|
|
|
mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL);
|
2023-02-07 11:23:45 +03:00
|
|
|
zfsdev_state_listhead.zs_minor = -1;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
if ((error = zfsdev_attach()) != 0)
|
|
|
|
goto out;
|
2018-03-13 20:45:55 +03:00
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
tsd_create(&zfs_fsyncer_key, NULL);
|
|
|
|
tsd_create(&rrw_tsd_key, rrw_tsd_destroy);
|
|
|
|
tsd_create(&zfs_allow_log_key, zfs_allow_log_destroy);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
return (0);
|
|
|
|
out:
|
|
|
|
zfs_fini();
|
|
|
|
spa_fini();
|
|
|
|
zvol_fini();
|
2018-03-13 20:45:55 +03:00
|
|
|
|
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
void
|
|
|
|
zfs_kmod_fini(void)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2020-08-14 01:03:23 +03:00
|
|
|
zfsdev_state_t *zs, *zsnext = NULL;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2019-09-27 20:46:28 +03:00
|
|
|
zfsdev_detach();
|
|
|
|
|
2010-08-26 22:44:39 +04:00
|
|
|
mutex_destroy(&zfsdev_state_lock);
|
2014-05-08 18:51:01 +04:00
|
|
|
|
2023-02-07 11:23:45 +03:00
|
|
|
for (zs = &zfsdev_state_listhead; zs != NULL; zs = zsnext) {
|
2020-08-14 01:03:23 +03:00
|
|
|
zsnext = zs->zs_next;
|
|
|
|
if (zs->zs_onexit)
|
|
|
|
zfs_onexit_destroy(zs->zs_onexit);
|
|
|
|
if (zs->zs_zevent)
|
|
|
|
zfs_zevent_destroy(zs->zs_zevent);
|
2023-05-05 18:51:41 +03:00
|
|
|
if (zs != &zfsdev_state_listhead)
|
|
|
|
kmem_free(zs, sizeof (zfsdev_state_t));
|
2014-05-08 18:51:01 +04:00
|
|
|
}
|
2010-08-26 22:44:39 +04:00
|
|
|
|
2020-09-04 20:34:28 +03:00
|
|
|
zfs_ereport_taskq_fini(); /* run before zfs_fini() on Linux */
|
2008-11-20 23:01:55 +03:00
|
|
|
zfs_fini();
|
|
|
|
spa_fini();
|
2014-03-22 13:07:14 +04:00
|
|
|
zvol_fini();
|
2011-07-02 23:34:05 +04:00
|
|
|
|
2012-12-20 21:55:47 +04:00
|
|
|
tsd_destroy(&zfs_fsyncer_key);
|
2010-12-18 02:26:17 +03:00
|
|
|
tsd_destroy(&rrw_tsd_key);
|
2013-08-28 15:45:09 +04:00
|
|
|
tsd_destroy(&zfs_allow_log_key);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
2020-08-18 19:33:55 +03:00
|
|
|
|
Cleanup: 64-bit kernel module parameters should use fixed width types
Various module parameters such as `zfs_arc_max` were originally
`uint64_t` on OpenSolaris/Illumos, but were changed to `unsigned long`
for Linux compatibility because Linux's kernel default module parameter
implementation did not support 64-bit types on 32-bit platforms. This
caused problems when porting OpenZFS to Windows because its LLP64 memory
model made `unsigned long` a 32-bit type on 64-bit, which created the
undesireable situation that parameters that should accept 64-bit values
could not on 64-bit Windows.
Upon inspection, it turns out that the Linux kernel module parameter
interface is extensible, such that we are allowed to define our own
types. Rather than maintaining the original type change via hacks to to
continue shrinking module parameters on 32-bit Linux, we implement
support for 64-bit module parameters on Linux.
After doing a review of all 64-bit kernel parameters (found via the man
page and also proposed changes by Andrew Innes), the kernel module
parameters fell into a few groups:
Parameters that were originally 64-bit on Illumos:
* dbuf_cache_max_bytes
* dbuf_metadata_cache_max_bytes
* l2arc_feed_min_ms
* l2arc_feed_secs
* l2arc_headroom
* l2arc_headroom_boost
* l2arc_write_boost
* l2arc_write_max
* metaslab_aliquot
* metaslab_force_ganging
* zfetch_array_rd_sz
* zfs_arc_max
* zfs_arc_meta_limit
* zfs_arc_meta_min
* zfs_arc_min
* zfs_async_block_max_blocks
* zfs_condense_max_obsolete_bytes
* zfs_condense_min_mapping_bytes
* zfs_deadman_checktime_ms
* zfs_deadman_synctime_ms
* zfs_initialize_chunk_size
* zfs_initialize_value
* zfs_lua_max_instrlimit
* zfs_lua_max_memlimit
* zil_slog_bulk
Parameters that were originally 32-bit on Illumos:
* zfs_per_txg_dirty_frees_percent
Parameters that were originally `ssize_t` on Illumos:
* zfs_immediate_write_sz
Note that `ssize_t` is `int32_t` on 32-bit and `int64_t` on 64-bit. It
has been upgraded to 64-bit.
Parameters that were `long`/`unsigned long` because of Linux/FreeBSD
influence:
* l2arc_rebuild_blocks_min_l2size
* zfs_key_max_salt_uses
* zfs_max_log_walking
* zfs_max_logsm_summary_length
* zfs_metaslab_max_size_cache_sec
* zfs_min_metaslabs_to_flush
* zfs_multihost_interval
* zfs_unflushed_log_block_max
* zfs_unflushed_log_block_min
* zfs_unflushed_log_block_pct
* zfs_unflushed_max_mem_amt
* zfs_unflushed_max_mem_ppm
New parameters that do not exist in Illumos:
* l2arc_trim_ahead
* vdev_file_logical_ashift
* vdev_file_physical_ashift
* zfs_arc_dnode_limit
* zfs_arc_dnode_limit_percent
* zfs_arc_dnode_reduce_percent
* zfs_arc_meta_limit_percent
* zfs_arc_sys_free
* zfs_deadman_ziotime_ms
* zfs_delete_blocks
* zfs_history_output_max
* zfs_livelist_max_entries
* zfs_max_async_dedup_frees
* zfs_max_nvlist_src_size
* zfs_rebuild_max_segment
* zfs_rebuild_vdev_limit
* zfs_unflushed_log_txg_max
* zfs_vdev_max_auto_ashift
* zfs_vdev_min_auto_ashift
* zfs_vnops_read_chunk_size
* zvol_max_discard_blocks
Rather than clutter the lists with commentary, the module parameters
that need comments are repeated below.
A few parameters were defined in Linux/FreeBSD specific code, where the
use of ulong/long is not an issue for portability, so we leave them
alone:
* zfs_delete_blocks
* zfs_key_max_salt_uses
* zvol_max_discard_blocks
The documentation for a few parameters was found to be incorrect:
* zfs_deadman_checktime_ms - incorrectly documented as int
* zfs_delete_blocks - not documented as Linux only
* zfs_history_output_max - incorrectly documented as int
* zfs_vnops_read_chunk_size - incorrectly documented as long
* zvol_max_discard_blocks - incorrectly documented as ulong
The documentation for these has been fixed, alongside the changes to
document the switch to fixed width types.
In addition, several kernel module parameters were percentages or held
ashift values, so being 64-bit never made sense for them. They have been
downgraded to 32-bit:
* vdev_file_logical_ashift
* vdev_file_physical_ashift
* zfs_arc_dnode_limit_percent
* zfs_arc_dnode_reduce_percent
* zfs_arc_meta_limit_percent
* zfs_per_txg_dirty_frees_percent
* zfs_unflushed_log_block_pct
* zfs_vdev_max_auto_ashift
* zfs_vdev_min_auto_ashift
Of special note are `zfs_vdev_max_auto_ashift` and
`zfs_vdev_min_auto_ashift`, which were already defined as `uint64_t`,
and passed to the kernel as `ulong`. This is inherently buggy on big
endian 32-bit Linux, since the values would not be written to the
correct locations. 32-bit FreeBSD was unaffected because its sysctl code
correctly treated this as a `uint64_t`.
Lastly, a code comment suggests that `zfs_arc_sys_free` is
Linux-specific, but there is nothing to indicate to me that it is
Linux-specific. Nothing was done about that.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Original-patch-by: Andrew Innes <andrew.c12@gmail.com>
Original-patch-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13984
Closes #14004
2022-10-03 22:06:54 +03:00
|
|
|
ZFS_MODULE_PARAM(zfs, zfs_, max_nvlist_src_size, U64, ZMOD_RW,
|
2022-01-21 19:07:15 +03:00
|
|
|
"Maximum size in bytes allowed for src nvlist passed with ZFS ioctls");
|
2020-11-14 21:17:16 +03:00
|
|
|
|
Cleanup: 64-bit kernel module parameters should use fixed width types
Various module parameters such as `zfs_arc_max` were originally
`uint64_t` on OpenSolaris/Illumos, but were changed to `unsigned long`
for Linux compatibility because Linux's kernel default module parameter
implementation did not support 64-bit types on 32-bit platforms. This
caused problems when porting OpenZFS to Windows because its LLP64 memory
model made `unsigned long` a 32-bit type on 64-bit, which created the
undesireable situation that parameters that should accept 64-bit values
could not on 64-bit Windows.
Upon inspection, it turns out that the Linux kernel module parameter
interface is extensible, such that we are allowed to define our own
types. Rather than maintaining the original type change via hacks to to
continue shrinking module parameters on 32-bit Linux, we implement
support for 64-bit module parameters on Linux.
After doing a review of all 64-bit kernel parameters (found via the man
page and also proposed changes by Andrew Innes), the kernel module
parameters fell into a few groups:
Parameters that were originally 64-bit on Illumos:
* dbuf_cache_max_bytes
* dbuf_metadata_cache_max_bytes
* l2arc_feed_min_ms
* l2arc_feed_secs
* l2arc_headroom
* l2arc_headroom_boost
* l2arc_write_boost
* l2arc_write_max
* metaslab_aliquot
* metaslab_force_ganging
* zfetch_array_rd_sz
* zfs_arc_max
* zfs_arc_meta_limit
* zfs_arc_meta_min
* zfs_arc_min
* zfs_async_block_max_blocks
* zfs_condense_max_obsolete_bytes
* zfs_condense_min_mapping_bytes
* zfs_deadman_checktime_ms
* zfs_deadman_synctime_ms
* zfs_initialize_chunk_size
* zfs_initialize_value
* zfs_lua_max_instrlimit
* zfs_lua_max_memlimit
* zil_slog_bulk
Parameters that were originally 32-bit on Illumos:
* zfs_per_txg_dirty_frees_percent
Parameters that were originally `ssize_t` on Illumos:
* zfs_immediate_write_sz
Note that `ssize_t` is `int32_t` on 32-bit and `int64_t` on 64-bit. It
has been upgraded to 64-bit.
Parameters that were `long`/`unsigned long` because of Linux/FreeBSD
influence:
* l2arc_rebuild_blocks_min_l2size
* zfs_key_max_salt_uses
* zfs_max_log_walking
* zfs_max_logsm_summary_length
* zfs_metaslab_max_size_cache_sec
* zfs_min_metaslabs_to_flush
* zfs_multihost_interval
* zfs_unflushed_log_block_max
* zfs_unflushed_log_block_min
* zfs_unflushed_log_block_pct
* zfs_unflushed_max_mem_amt
* zfs_unflushed_max_mem_ppm
New parameters that do not exist in Illumos:
* l2arc_trim_ahead
* vdev_file_logical_ashift
* vdev_file_physical_ashift
* zfs_arc_dnode_limit
* zfs_arc_dnode_limit_percent
* zfs_arc_dnode_reduce_percent
* zfs_arc_meta_limit_percent
* zfs_arc_sys_free
* zfs_deadman_ziotime_ms
* zfs_delete_blocks
* zfs_history_output_max
* zfs_livelist_max_entries
* zfs_max_async_dedup_frees
* zfs_max_nvlist_src_size
* zfs_rebuild_max_segment
* zfs_rebuild_vdev_limit
* zfs_unflushed_log_txg_max
* zfs_vdev_max_auto_ashift
* zfs_vdev_min_auto_ashift
* zfs_vnops_read_chunk_size
* zvol_max_discard_blocks
Rather than clutter the lists with commentary, the module parameters
that need comments are repeated below.
A few parameters were defined in Linux/FreeBSD specific code, where the
use of ulong/long is not an issue for portability, so we leave them
alone:
* zfs_delete_blocks
* zfs_key_max_salt_uses
* zvol_max_discard_blocks
The documentation for a few parameters was found to be incorrect:
* zfs_deadman_checktime_ms - incorrectly documented as int
* zfs_delete_blocks - not documented as Linux only
* zfs_history_output_max - incorrectly documented as int
* zfs_vnops_read_chunk_size - incorrectly documented as long
* zvol_max_discard_blocks - incorrectly documented as ulong
The documentation for these has been fixed, alongside the changes to
document the switch to fixed width types.
In addition, several kernel module parameters were percentages or held
ashift values, so being 64-bit never made sense for them. They have been
downgraded to 32-bit:
* vdev_file_logical_ashift
* vdev_file_physical_ashift
* zfs_arc_dnode_limit_percent
* zfs_arc_dnode_reduce_percent
* zfs_arc_meta_limit_percent
* zfs_per_txg_dirty_frees_percent
* zfs_unflushed_log_block_pct
* zfs_vdev_max_auto_ashift
* zfs_vdev_min_auto_ashift
Of special note are `zfs_vdev_max_auto_ashift` and
`zfs_vdev_min_auto_ashift`, which were already defined as `uint64_t`,
and passed to the kernel as `ulong`. This is inherently buggy on big
endian 32-bit Linux, since the values would not be written to the
correct locations. 32-bit FreeBSD was unaffected because its sysctl code
correctly treated this as a `uint64_t`.
Lastly, a code comment suggests that `zfs_arc_sys_free` is
Linux-specific, but there is nothing to indicate to me that it is
Linux-specific. Nothing was done about that.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Original-patch-by: Andrew Innes <andrew.c12@gmail.com>
Original-patch-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13984
Closes #14004
2022-10-03 22:06:54 +03:00
|
|
|
ZFS_MODULE_PARAM(zfs, zfs_, history_output_max, U64, ZMOD_RW,
|
2022-01-21 19:07:15 +03:00
|
|
|
"Maximum size in bytes of ZFS ioctl output that will be logged");
|