2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* CDDL HEADER START
|
|
|
|
*
|
|
|
|
* The contents of this file are subject to the terms of the
|
|
|
|
* Common Development and Distribution License (the "License").
|
|
|
|
* You may not use this file except in compliance with the License.
|
|
|
|
*
|
|
|
|
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
|
2022-07-12 00:16:13 +03:00
|
|
|
* or https://opensource.org/licenses/CDDL-1.0.
|
2008-11-20 23:01:55 +03:00
|
|
|
* See the License for the specific language governing permissions
|
|
|
|
* and limitations under the License.
|
|
|
|
*
|
|
|
|
* When distributing Covered Code, include this CDDL HEADER in each
|
|
|
|
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
|
|
|
|
* If applicable, add the following below this CDDL HEADER, with the
|
|
|
|
* fields enclosed by brackets "[]" replaced with your own identifying
|
|
|
|
* information: Portions Copyright [yyyy] [name of copyright owner]
|
|
|
|
*
|
|
|
|
* CDDL HEADER END
|
|
|
|
*/
|
2017-02-03 01:13:41 +03:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
2010-05-29 00:45:14 +04:00
|
|
|
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
zfs promote does not delete livelist of origin
When a clone is promoted, its livelist is no longer accurate, so it is
discarded. If the clone's origin is also a clone (i.e. we are promoting
a clone of a clone), then the origin's livelist is also no longer
accurate, so it should be discarded, but the code doesn't actually do
that.
Consider a pool with:
* Filesystem A
* Clone B, a clone of A
* Clone C, a clone of B
If we promote C, it discards C's livelist. It should discard B's
livelist, but that is not happening. The impact is that when B is
destroyed, we use the livelist to find the blocks to free, but the
livelist is no longer correct so we end up freeing blocks that are still
in use by C. The incorrectly-freed blocks can be reallocated causing
checksum errors. And when C is destroyed it can double-free the
incorrectly-freed blocks.
The problem is that we remove the livelist of `origin_ds->ds_dir`, but
the origin snapshot has already been moved to the promoted dsl_dir. So
this is actually trying to remove the livelist of the promoted dsl_dir,
which was already removed. As explained in a comment in the beginning
of `dsl_dataset_promote_sync()`, we need to use the saved `odd` for the
origin's dsl_dir.
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10652
2020-07-31 18:59:00 +03:00
|
|
|
* Copyright (c) 2011, 2020 by Delphix. All rights reserved.
|
2015-04-01 16:07:48 +03:00
|
|
|
* Copyright (c) 2014, Joyent, Inc. All rights reserved.
|
2014-02-05 21:34:18 +04:00
|
|
|
* Copyright (c) 2014 RackTop Systems.
|
2015-04-02 06:44:32 +03:00
|
|
|
* Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
|
2014-03-22 13:07:14 +04:00
|
|
|
* Copyright (c) 2016 Actifio, Inc. All rights reserved.
|
2016-06-09 22:29:09 +03:00
|
|
|
* Copyright 2016, OmniTI Computer Consulting, Inc. All rights reserved.
|
2017-02-03 01:13:41 +03:00
|
|
|
* Copyright 2017 Nexenta Systems, Inc.
|
Add zstd support to zfs
This PR adds two new compression types, based on ZStandard:
- zstd: A basic ZStandard compression algorithm Available compression.
Levels for zstd are zstd-1 through zstd-19, where the compression
increases with every level, but speed decreases.
- zstd-fast: A faster version of the ZStandard compression algorithm
zstd-fast is basically a "negative" level of zstd. The compression
decreases with every level, but speed increases.
Available compression levels for zstd-fast:
- zstd-fast-1 through zstd-fast-10
- zstd-fast-20 through zstd-fast-100 (in increments of 10)
- zstd-fast-500 and zstd-fast-1000
For more information check the man page.
Implementation details:
Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.
The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers). The upper bits are used to store the compression level.
It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.
All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables. Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).
The userspace tools all use the combined/bit-shifted value.
Additional notes:
zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.
ZSTD is included with all current tests and new tests are added
as-needed.
Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born. This is currently only used by zstd but can be
extended as needed.
Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #6247
Closes #9024
Closes #10277
Closes #10278
2020-08-18 20:10:17 +03:00
|
|
|
* Copyright (c) 2019, Klara Inc.
|
|
|
|
* Copyright (c) 2019, Allan Jude
|
|
|
|
* Copyright (c) 2020 The FreeBSD Foundation [1]
|
|
|
|
*
|
|
|
|
* [1] Portions of this software were developed by Allan Jude
|
|
|
|
* under sponsorship from the FreeBSD Foundation.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include <sys/dmu_objset.h>
|
|
|
|
#include <sys/dsl_dataset.h>
|
|
|
|
#include <sys/dsl_dir.h>
|
|
|
|
#include <sys/dsl_prop.h>
|
|
|
|
#include <sys/dsl_synctask.h>
|
|
|
|
#include <sys/dmu_traverse.h>
|
2012-05-10 02:05:14 +04:00
|
|
|
#include <sys/dmu_impl.h>
|
2008-11-20 23:01:55 +03:00
|
|
|
#include <sys/dmu_tx.h>
|
|
|
|
#include <sys/arc.h>
|
|
|
|
#include <sys/zio.h>
|
|
|
|
#include <sys/zap.h>
|
2012-12-14 03:24:15 +04:00
|
|
|
#include <sys/zfeature.h>
|
2008-11-20 23:01:55 +03:00
|
|
|
#include <sys/unique.h>
|
|
|
|
#include <sys/zfs_context.h>
|
|
|
|
#include <sys/zfs_ioctl.h>
|
|
|
|
#include <sys/spa.h>
|
2016-12-17 01:11:29 +03:00
|
|
|
#include <sys/spa_impl.h>
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
#include <sys/vdev.h>
|
2008-12-03 23:09:06 +03:00
|
|
|
#include <sys/zfs_znode.h>
|
2010-08-27 01:24:34 +04:00
|
|
|
#include <sys/zfs_onexit.h>
|
2009-08-18 22:43:27 +04:00
|
|
|
#include <sys/zvol.h>
|
2010-05-29 00:45:14 +04:00
|
|
|
#include <sys/dsl_scan.h>
|
|
|
|
#include <sys/dsl_deadlist.h>
|
2013-09-04 16:00:57 +04:00
|
|
|
#include <sys/dsl_destroy.h>
|
|
|
|
#include <sys/dsl_userhold.h>
|
2013-12-12 02:33:41 +04:00
|
|
|
#include <sys/dsl_bookmark.h>
|
2016-06-07 19:16:52 +03:00
|
|
|
#include <sys/policy.h>
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
#include <sys/dmu_send.h>
|
2018-10-10 00:05:13 +03:00
|
|
|
#include <sys/dmu_recv.h>
|
2016-01-07 00:22:48 +03:00
|
|
|
#include <sys/zio_compress.h>
|
|
|
|
#include <zfs_fletcher.h>
|
2016-06-16 01:47:05 +03:00
|
|
|
#include <sys/zio_checksum.h>
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2014-11-03 23:15:08 +03:00
|
|
|
/*
|
|
|
|
* The SPA supports block sizes up to 16MB. However, very large blocks
|
|
|
|
* can have an impact on i/o latency (e.g. tying up a spinning disk for
|
|
|
|
* ~300ms), and also potentially on the memory allocator. Therefore,
|
2022-04-29 01:12:24 +03:00
|
|
|
* we did not allow the recordsize to be set larger than zfs_max_recordsize
|
|
|
|
* (former default: 1MB). Larger blocks could be created by changing this
|
|
|
|
* tunable, and pools with larger blocks could always be imported and used,
|
|
|
|
* regardless of this setting.
|
|
|
|
*
|
|
|
|
* We do, however, still limit it by default to 1M on x86_32, because Linux's
|
|
|
|
* 3/1 memory split doesn't leave much room for 16M chunks.
|
2014-11-03 23:15:08 +03:00
|
|
|
*/
|
2022-04-29 01:12:24 +03:00
|
|
|
#ifdef _ILP32
|
Cleanup: Specify unsignedness on things that should not be signed
In #13871, zfs_vdev_aggregation_limit_non_rotating and
zfs_vdev_aggregation_limit being signed was pointed out as a possible
reason not to eliminate an unnecessary MAX(unsigned, 0) since the
unsigned value was assigned from them.
There is no reason for these module parameters to be signed and upon
inspection, it was found that there are a number of other module
parameters that are signed, but should not be, so we make them unsigned.
Making them unsigned made it clear that some other variables in the code
should also be unsigned, so we also make those unsigned. This prevents
users from setting negative values that could potentially cause bad
behaviors. It also makes the code slightly easier to understand.
Mostly module parameters that deal with timeouts, limits, bitshifts and
percentages are made unsigned by this. Any that are boolean are left
signed, since whether booleans should be considered signed or unsigned
does not matter.
Making zfs_arc_lotsfree_percent unsigned caused a
`zfs_arc_lotsfree_percent >= 0` check to become redundant, so it was
removed. Removing the check was also necessary to prevent a compiler
error from -Werror=type-limits.
Several end of line comments had to be moved to their own lines because
replacing int with uint_t caused us to exceed the 80 character limit
enforced by cstyle.pl.
The following were kept signed because they are passed to
taskq_create(), which expects signed values and modifying the
OpenSolaris/Illumos DDI is out of scope of this patch:
* metaslab_load_pct
* zfs_sync_taskq_batch_pct
* zfs_zil_clean_taskq_nthr_pct
* zfs_zil_clean_taskq_minalloc
* zfs_zil_clean_taskq_maxalloc
* zfs_arc_prune_task_threads
Also, negative values in those parameters was found to be harmless.
The following were left signed because either negative values make
sense, or more analysis was needed to determine whether negative values
should be disallowed:
* zfs_metaslab_switch_threshold
* zfs_pd_bytes_max
* zfs_livelist_min_percent_shared
zfs_multihost_history was made static to be consistent with other
parameters.
A number of module parameters were marked as signed, but in reality
referenced unsigned variables. upgrade_errlog_limit is one of the
numerous examples. In the case of zfs_vdev_async_read_max_active, it was
already uint32_t, but zdb had an extern int declaration for it.
Interestingly, the documentation in zfs.4 was right for
upgrade_errlog_limit despite the module parameter being wrongly marked,
while the documentation for zfs_vdev_async_read_max_active (and friends)
was wrong. It was also wrong for zstd_abort_size, which was unsigned,
but was documented as signed.
Also, the documentation in zfs.4 incorrectly described the following
parameters as ulong when they were int:
* zfs_arc_meta_adjust_restarts
* zfs_override_estimate_recordsize
They are now uint_t as of this patch and thus the man page has been
updated to describe them as uint.
dbuf_state_index was left alone since it does nothing and perhaps should
be removed in another patch.
If any module parameters were missed, they were not found by `grep -r
'ZFS_MODULE_PARAM' | grep ', INT'`. I did find a few that grep missed,
but only because they were in files that had hits.
This patch intentionally did not attempt to address whether some of
these module parameters should be elevated to 64-bit parameters, because
the length of a long on 32-bit is 32-bit.
Lastly, it was pointed out during review that uint_t is a better match
for these variables than uint32_t because FreeBSD kernel parameter
definitions are designed for uint_t, whose bit width can change in
future memory models. As a result, we change the existing parameters
that are uint32_t to use uint_t.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13875
2022-09-28 02:42:41 +03:00
|
|
|
uint_t zfs_max_recordsize = 1 * 1024 * 1024;
|
2022-04-29 01:12:24 +03:00
|
|
|
#else
|
Cleanup: Specify unsignedness on things that should not be signed
In #13871, zfs_vdev_aggregation_limit_non_rotating and
zfs_vdev_aggregation_limit being signed was pointed out as a possible
reason not to eliminate an unnecessary MAX(unsigned, 0) since the
unsigned value was assigned from them.
There is no reason for these module parameters to be signed and upon
inspection, it was found that there are a number of other module
parameters that are signed, but should not be, so we make them unsigned.
Making them unsigned made it clear that some other variables in the code
should also be unsigned, so we also make those unsigned. This prevents
users from setting negative values that could potentially cause bad
behaviors. It also makes the code slightly easier to understand.
Mostly module parameters that deal with timeouts, limits, bitshifts and
percentages are made unsigned by this. Any that are boolean are left
signed, since whether booleans should be considered signed or unsigned
does not matter.
Making zfs_arc_lotsfree_percent unsigned caused a
`zfs_arc_lotsfree_percent >= 0` check to become redundant, so it was
removed. Removing the check was also necessary to prevent a compiler
error from -Werror=type-limits.
Several end of line comments had to be moved to their own lines because
replacing int with uint_t caused us to exceed the 80 character limit
enforced by cstyle.pl.
The following were kept signed because they are passed to
taskq_create(), which expects signed values and modifying the
OpenSolaris/Illumos DDI is out of scope of this patch:
* metaslab_load_pct
* zfs_sync_taskq_batch_pct
* zfs_zil_clean_taskq_nthr_pct
* zfs_zil_clean_taskq_minalloc
* zfs_zil_clean_taskq_maxalloc
* zfs_arc_prune_task_threads
Also, negative values in those parameters was found to be harmless.
The following were left signed because either negative values make
sense, or more analysis was needed to determine whether negative values
should be disallowed:
* zfs_metaslab_switch_threshold
* zfs_pd_bytes_max
* zfs_livelist_min_percent_shared
zfs_multihost_history was made static to be consistent with other
parameters.
A number of module parameters were marked as signed, but in reality
referenced unsigned variables. upgrade_errlog_limit is one of the
numerous examples. In the case of zfs_vdev_async_read_max_active, it was
already uint32_t, but zdb had an extern int declaration for it.
Interestingly, the documentation in zfs.4 was right for
upgrade_errlog_limit despite the module parameter being wrongly marked,
while the documentation for zfs_vdev_async_read_max_active (and friends)
was wrong. It was also wrong for zstd_abort_size, which was unsigned,
but was documented as signed.
Also, the documentation in zfs.4 incorrectly described the following
parameters as ulong when they were int:
* zfs_arc_meta_adjust_restarts
* zfs_override_estimate_recordsize
They are now uint_t as of this patch and thus the man page has been
updated to describe them as uint.
dbuf_state_index was left alone since it does nothing and perhaps should
be removed in another patch.
If any module parameters were missed, they were not found by `grep -r
'ZFS_MODULE_PARAM' | grep ', INT'`. I did find a few that grep missed,
but only because they were in files that had hits.
This patch intentionally did not attempt to address whether some of
these module parameters should be elevated to 64-bit parameters, because
the length of a long on 32-bit is 32-bit.
Lastly, it was pointed out during review that uint_t is a better match
for these variables than uint32_t because FreeBSD kernel parameter
definitions are designed for uint_t, whose bit width can change in
future memory models. As a result, we change the existing parameters
that are uint32_t to use uint_t.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13875
2022-09-28 02:42:41 +03:00
|
|
|
uint_t zfs_max_recordsize = 16 * 1024 * 1024;
|
2022-04-29 01:12:24 +03:00
|
|
|
#endif
|
2022-01-15 02:37:55 +03:00
|
|
|
static int zfs_allow_redacted_dataset_mount = 0;
|
2014-11-03 23:15:08 +03:00
|
|
|
|
2022-08-22 22:36:22 +03:00
|
|
|
int zfs_snapshot_history_enabled = 1;
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
#define SWITCH64(x, y) \
|
|
|
|
{ \
|
|
|
|
uint64_t __tmp = (x); \
|
|
|
|
(x) = (y); \
|
|
|
|
(y) = __tmp; \
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
#define DS_REF_MAX (1ULL << 62)
|
|
|
|
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
static void dsl_dataset_set_remap_deadlist_object(dsl_dataset_t *ds,
|
|
|
|
uint64_t obj, dmu_tx_t *tx);
|
|
|
|
static void dsl_dataset_unset_remap_deadlist_object(dsl_dataset_t *ds,
|
|
|
|
dmu_tx_t *tx);
|
|
|
|
|
2018-10-16 21:15:04 +03:00
|
|
|
static void unload_zfeature(dsl_dataset_t *ds, spa_feature_t f);
|
|
|
|
|
Cleanup: Specify unsignedness on things that should not be signed
In #13871, zfs_vdev_aggregation_limit_non_rotating and
zfs_vdev_aggregation_limit being signed was pointed out as a possible
reason not to eliminate an unnecessary MAX(unsigned, 0) since the
unsigned value was assigned from them.
There is no reason for these module parameters to be signed and upon
inspection, it was found that there are a number of other module
parameters that are signed, but should not be, so we make them unsigned.
Making them unsigned made it clear that some other variables in the code
should also be unsigned, so we also make those unsigned. This prevents
users from setting negative values that could potentially cause bad
behaviors. It also makes the code slightly easier to understand.
Mostly module parameters that deal with timeouts, limits, bitshifts and
percentages are made unsigned by this. Any that are boolean are left
signed, since whether booleans should be considered signed or unsigned
does not matter.
Making zfs_arc_lotsfree_percent unsigned caused a
`zfs_arc_lotsfree_percent >= 0` check to become redundant, so it was
removed. Removing the check was also necessary to prevent a compiler
error from -Werror=type-limits.
Several end of line comments had to be moved to their own lines because
replacing int with uint_t caused us to exceed the 80 character limit
enforced by cstyle.pl.
The following were kept signed because they are passed to
taskq_create(), which expects signed values and modifying the
OpenSolaris/Illumos DDI is out of scope of this patch:
* metaslab_load_pct
* zfs_sync_taskq_batch_pct
* zfs_zil_clean_taskq_nthr_pct
* zfs_zil_clean_taskq_minalloc
* zfs_zil_clean_taskq_maxalloc
* zfs_arc_prune_task_threads
Also, negative values in those parameters was found to be harmless.
The following were left signed because either negative values make
sense, or more analysis was needed to determine whether negative values
should be disallowed:
* zfs_metaslab_switch_threshold
* zfs_pd_bytes_max
* zfs_livelist_min_percent_shared
zfs_multihost_history was made static to be consistent with other
parameters.
A number of module parameters were marked as signed, but in reality
referenced unsigned variables. upgrade_errlog_limit is one of the
numerous examples. In the case of zfs_vdev_async_read_max_active, it was
already uint32_t, but zdb had an extern int declaration for it.
Interestingly, the documentation in zfs.4 was right for
upgrade_errlog_limit despite the module parameter being wrongly marked,
while the documentation for zfs_vdev_async_read_max_active (and friends)
was wrong. It was also wrong for zstd_abort_size, which was unsigned,
but was documented as signed.
Also, the documentation in zfs.4 incorrectly described the following
parameters as ulong when they were int:
* zfs_arc_meta_adjust_restarts
* zfs_override_estimate_recordsize
They are now uint_t as of this patch and thus the man page has been
updated to describe them as uint.
dbuf_state_index was left alone since it does nothing and perhaps should
be removed in another patch.
If any module parameters were missed, they were not found by `grep -r
'ZFS_MODULE_PARAM' | grep ', INT'`. I did find a few that grep missed,
but only because they were in files that had hits.
This patch intentionally did not attempt to address whether some of
these module parameters should be elevated to 64-bit parameters, because
the length of a long on 32-bit is 32-bit.
Lastly, it was pointed out during review that uint_t is a better match
for these variables than uint32_t because FreeBSD kernel parameter
definitions are designed for uint_t, whose bit width can change in
future memory models. As a result, we change the existing parameters
that are uint32_t to use uint_t.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13875
2022-09-28 02:42:41 +03:00
|
|
|
extern uint_t spa_asize_inflation;
|
2016-06-09 22:29:09 +03:00
|
|
|
|
2016-11-22 02:09:54 +03:00
|
|
|
static zil_header_t zero_zil;
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
2017-01-03 20:31:18 +03:00
|
|
|
* Figure out how much of this delta should be propagated to the dsl_dir
|
2008-11-20 23:01:55 +03:00
|
|
|
* layer. If there's a refreservation, that space has already been
|
|
|
|
* partially accounted for in our ancestors.
|
|
|
|
*/
|
|
|
|
static int64_t
|
|
|
|
parent_delta(dsl_dataset_t *ds, int64_t delta)
|
|
|
|
{
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys_t *ds_phys;
|
2008-11-20 23:01:55 +03:00
|
|
|
uint64_t old_bytes, new_bytes;
|
|
|
|
|
|
|
|
if (ds->ds_reserved == 0)
|
|
|
|
return (delta);
|
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
ds_phys = dsl_dataset_phys(ds);
|
|
|
|
old_bytes = MAX(ds_phys->ds_unique_bytes, ds->ds_reserved);
|
|
|
|
new_bytes = MAX(ds_phys->ds_unique_bytes + delta, ds->ds_reserved);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
ASSERT3U(ABS((int64_t)(new_bytes - old_bytes)), <=, ABS(delta));
|
|
|
|
return (new_bytes - old_bytes);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2010-05-29 00:45:14 +04:00
|
|
|
dsl_dataset_block_born(dsl_dataset_t *ds, const blkptr_t *bp, dmu_tx_t *tx)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2019-07-26 20:54:14 +03:00
|
|
|
spa_t *spa = dmu_tx_pool(tx)->dp_spa;
|
|
|
|
int used = bp_get_dsize_sync(spa, bp);
|
|
|
|
int compressed = BP_GET_PSIZE(bp);
|
|
|
|
int uncompressed = BP_GET_UCSIZE(bp);
|
2008-11-20 23:01:55 +03:00
|
|
|
int64_t delta;
|
2022-08-25 23:33:32 +03:00
|
|
|
spa_feature_t f;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
dprintf_bp(bp, "ds=%p", ds);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
ASSERT(dmu_tx_is_syncing(tx));
|
|
|
|
/* It could have been compressed away to nothing */
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (BP_IS_HOLE(bp) || BP_IS_REDACTED(bp))
|
2008-11-20 23:01:55 +03:00
|
|
|
return;
|
|
|
|
ASSERT(BP_GET_TYPE(bp) != DMU_OT_NONE);
|
2012-12-14 03:24:15 +04:00
|
|
|
ASSERT(DMU_OT_IS_VALID(BP_GET_TYPE(bp)));
|
2008-11-20 23:01:55 +03:00
|
|
|
if (ds == NULL) {
|
2012-12-15 04:13:40 +04:00
|
|
|
dsl_pool_mos_diduse_space(tx->tx_pool,
|
|
|
|
used, compressed, uncompressed);
|
2008-11-20 23:01:55 +03:00
|
|
|
return;
|
|
|
|
}
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2016-11-22 02:09:54 +03:00
|
|
|
ASSERT3U(bp->blk_birth, >, dsl_dataset_phys(ds)->ds_prev_snap_txg);
|
2013-08-22 21:51:47 +04:00
|
|
|
dmu_buf_will_dirty(ds->ds_dbuf, tx);
|
2008-11-20 23:01:55 +03:00
|
|
|
mutex_enter(&ds->ds_lock);
|
|
|
|
delta = parent_delta(ds, used);
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_referenced_bytes += used;
|
|
|
|
dsl_dataset_phys(ds)->ds_compressed_bytes += compressed;
|
|
|
|
dsl_dataset_phys(ds)->ds_uncompressed_bytes += uncompressed;
|
|
|
|
dsl_dataset_phys(ds)->ds_unique_bytes += used;
|
2016-06-16 01:47:05 +03:00
|
|
|
|
2022-08-25 23:33:32 +03:00
|
|
|
if (BP_GET_LSIZE(bp) > SPA_OLD_MAXBLOCKSIZE) {
|
|
|
|
ds->ds_feature_activation[SPA_FEATURE_LARGE_BLOCKS] =
|
|
|
|
(void *)B_TRUE;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
f = zio_checksum_to_feature(BP_GET_CHECKSUM(bp));
|
|
|
|
if (f != SPA_FEATURE_NONE) {
|
|
|
|
ASSERT3S(spa_feature_table[f].fi_type, ==,
|
|
|
|
ZFEATURE_TYPE_BOOLEAN);
|
|
|
|
ds->ds_feature_activation[f] = (void *)B_TRUE;
|
|
|
|
}
|
|
|
|
|
|
|
|
f = zio_compress_to_feature(BP_GET_COMPRESS(bp));
|
|
|
|
if (f != SPA_FEATURE_NONE) {
|
|
|
|
ASSERT3S(spa_feature_table[f].fi_type, ==,
|
|
|
|
ZFEATURE_TYPE_BOOLEAN);
|
|
|
|
ds->ds_feature_activation[f] = (void *)B_TRUE;
|
|
|
|
}
|
2016-06-16 01:47:05 +03:00
|
|
|
|
2019-07-26 20:54:14 +03:00
|
|
|
/*
|
|
|
|
* Track block for livelist, but ignore embedded blocks because
|
|
|
|
* they do not need to be freed.
|
|
|
|
*/
|
|
|
|
if (dsl_deadlist_is_open(&ds->ds_dir->dd_livelist) &&
|
|
|
|
bp->blk_birth > ds->ds_dir->dd_origin_txg &&
|
|
|
|
!(BP_IS_EMBEDDED(bp))) {
|
|
|
|
ASSERT(dsl_dir_is_clone(ds->ds_dir));
|
|
|
|
ASSERT(spa_feature_is_enabled(spa,
|
|
|
|
SPA_FEATURE_LIVELIST));
|
|
|
|
bplist_append(&ds->ds_dir->dd_pending_allocs, bp);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
mutex_exit(&ds->ds_lock);
|
2021-07-16 22:39:24 +03:00
|
|
|
dsl_dir_diduse_transfer_space(ds->ds_dir, delta,
|
|
|
|
compressed, uncompressed, used,
|
2008-12-03 23:09:06 +03:00
|
|
|
DD_USED_REFRSRV, DD_USED_HEAD, tx);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
/*
|
|
|
|
* Called when the specified segment has been remapped, and is thus no
|
|
|
|
* longer referenced in the head dataset. The vdev must be indirect.
|
|
|
|
*
|
|
|
|
* If the segment is referenced by a snapshot, put it on the remap deadlist.
|
|
|
|
* Otherwise, add this segment to the obsolete spacemap.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
dsl_dataset_block_remapped(dsl_dataset_t *ds, uint64_t vdev, uint64_t offset,
|
|
|
|
uint64_t size, uint64_t birth, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
spa_t *spa = ds->ds_dir->dd_pool->dp_spa;
|
|
|
|
|
|
|
|
ASSERT(dmu_tx_is_syncing(tx));
|
|
|
|
ASSERT(birth <= tx->tx_txg);
|
|
|
|
ASSERT(!ds->ds_is_snapshot);
|
|
|
|
|
|
|
|
if (birth > dsl_dataset_phys(ds)->ds_prev_snap_txg) {
|
|
|
|
spa_vdev_indirect_mark_obsolete(spa, vdev, offset, size, tx);
|
|
|
|
} else {
|
|
|
|
blkptr_t fakebp;
|
|
|
|
dva_t *dva = &fakebp.blk_dva[0];
|
|
|
|
|
|
|
|
ASSERT(ds != NULL);
|
|
|
|
|
|
|
|
mutex_enter(&ds->ds_remap_deadlist_lock);
|
|
|
|
if (!dsl_dataset_remap_deadlist_exists(ds)) {
|
|
|
|
dsl_dataset_create_remap_deadlist(ds, tx);
|
|
|
|
}
|
|
|
|
mutex_exit(&ds->ds_remap_deadlist_lock);
|
|
|
|
|
|
|
|
BP_ZERO(&fakebp);
|
|
|
|
fakebp.blk_birth = birth;
|
|
|
|
DVA_SET_VDEV(dva, vdev);
|
|
|
|
DVA_SET_OFFSET(dva, offset);
|
|
|
|
DVA_SET_ASIZE(dva, size);
|
2019-07-26 20:54:14 +03:00
|
|
|
dsl_deadlist_insert(&ds->ds_remap_deadlist, &fakebp, B_FALSE,
|
|
|
|
tx);
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
int
|
2010-05-29 00:45:14 +04:00
|
|
|
dsl_dataset_block_kill(dsl_dataset_t *ds, const blkptr_t *bp, dmu_tx_t *tx,
|
|
|
|
boolean_t async)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2016-12-17 01:11:29 +03:00
|
|
|
spa_t *spa = dmu_tx_pool(tx)->dp_spa;
|
|
|
|
|
|
|
|
int used = bp_get_dsize_sync(spa, bp);
|
2013-12-09 22:37:51 +04:00
|
|
|
int compressed = BP_GET_PSIZE(bp);
|
|
|
|
int uncompressed = BP_GET_UCSIZE(bp);
|
2010-08-26 20:52:39 +04:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (BP_IS_HOLE(bp) || BP_IS_REDACTED(bp))
|
2008-12-03 23:09:06 +03:00
|
|
|
return (0);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
ASSERT(dmu_tx_is_syncing(tx));
|
|
|
|
ASSERT(bp->blk_birth <= tx->tx_txg);
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
if (ds == NULL) {
|
2010-05-29 00:45:14 +04:00
|
|
|
dsl_free(tx->tx_pool, tx->tx_txg, bp);
|
2012-12-15 04:13:40 +04:00
|
|
|
dsl_pool_mos_diduse_space(tx->tx_pool,
|
|
|
|
-used, -compressed, -uncompressed);
|
2008-12-03 23:09:06 +03:00
|
|
|
return (used);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
ASSERT3P(tx->tx_pool, ==, ds->ds_dir->dd_pool);
|
|
|
|
|
2015-04-02 06:44:32 +03:00
|
|
|
ASSERT(!ds->ds_is_snapshot);
|
2008-11-20 23:01:55 +03:00
|
|
|
dmu_buf_will_dirty(ds->ds_dbuf, tx);
|
|
|
|
|
2019-07-26 20:54:14 +03:00
|
|
|
/*
|
|
|
|
* Track block for livelist, but ignore embedded blocks because
|
|
|
|
* they do not need to be freed.
|
|
|
|
*/
|
|
|
|
if (dsl_deadlist_is_open(&ds->ds_dir->dd_livelist) &&
|
|
|
|
bp->blk_birth > ds->ds_dir->dd_origin_txg &&
|
|
|
|
!(BP_IS_EMBEDDED(bp))) {
|
|
|
|
ASSERT(dsl_dir_is_clone(ds->ds_dir));
|
|
|
|
ASSERT(spa_feature_is_enabled(spa,
|
|
|
|
SPA_FEATURE_LIVELIST));
|
|
|
|
bplist_append(&ds->ds_dir->dd_pending_frees, bp);
|
|
|
|
}
|
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
if (bp->blk_birth > dsl_dataset_phys(ds)->ds_prev_snap_txg) {
|
2008-11-20 23:01:55 +03:00
|
|
|
int64_t delta;
|
|
|
|
|
2021-06-23 07:53:45 +03:00
|
|
|
dprintf_bp(bp, "freeing ds=%llu", (u_longlong_t)ds->ds_object);
|
2010-05-29 00:45:14 +04:00
|
|
|
dsl_free(tx->tx_pool, tx->tx_txg, bp);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
mutex_enter(&ds->ds_lock);
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT(dsl_dataset_phys(ds)->ds_unique_bytes >= used ||
|
2008-11-20 23:01:55 +03:00
|
|
|
!DS_UNIQUE_IS_ACCURATE(ds));
|
|
|
|
delta = parent_delta(ds, -used);
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_unique_bytes -= used;
|
2008-11-20 23:01:55 +03:00
|
|
|
mutex_exit(&ds->ds_lock);
|
2021-07-16 22:39:24 +03:00
|
|
|
dsl_dir_diduse_transfer_space(ds->ds_dir,
|
|
|
|
delta, -compressed, -uncompressed, -used,
|
2008-12-03 23:09:06 +03:00
|
|
|
DD_USED_REFRSRV, DD_USED_HEAD, tx);
|
2008-11-20 23:01:55 +03:00
|
|
|
} else {
|
|
|
|
dprintf_bp(bp, "putting on dead list: %s", "");
|
2010-05-29 00:45:14 +04:00
|
|
|
if (async) {
|
|
|
|
/*
|
|
|
|
* We are here as part of zio's write done callback,
|
|
|
|
* which means we're a zio interrupt thread. We can't
|
|
|
|
* call dsl_deadlist_insert() now because it may block
|
|
|
|
* waiting for I/O. Instead, put bp on the deferred
|
|
|
|
* queue and let dsl_pool_sync() finish the job.
|
|
|
|
*/
|
|
|
|
bplist_append(&ds->ds_pending_deadlist, bp);
|
|
|
|
} else {
|
2019-07-26 20:54:14 +03:00
|
|
|
dsl_deadlist_insert(&ds->ds_deadlist, bp, B_FALSE, tx);
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
ASSERT3U(ds->ds_prev->ds_object, ==,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_prev_snap_obj);
|
|
|
|
ASSERT(dsl_dataset_phys(ds->ds_prev)->ds_num_children > 0);
|
2008-11-20 23:01:55 +03:00
|
|
|
/* if (bp->blk_birth > prev prev snap txg) prev unique += bs */
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(ds->ds_prev)->ds_next_snap_obj ==
|
2008-11-20 23:01:55 +03:00
|
|
|
ds->ds_object && bp->blk_birth >
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds->ds_prev)->ds_prev_snap_txg) {
|
2008-11-20 23:01:55 +03:00
|
|
|
dmu_buf_will_dirty(ds->ds_prev->ds_dbuf, tx);
|
|
|
|
mutex_enter(&ds->ds_prev->ds_lock);
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds->ds_prev)->ds_unique_bytes += used;
|
2008-11-20 23:01:55 +03:00
|
|
|
mutex_exit(&ds->ds_prev->ds_lock);
|
|
|
|
}
|
2010-05-29 00:45:14 +04:00
|
|
|
if (bp->blk_birth > ds->ds_dir->dd_origin_txg) {
|
2008-12-03 23:09:06 +03:00
|
|
|
dsl_dir_transfer_space(ds->ds_dir, used,
|
|
|
|
DD_USED_HEAD, DD_USED_SNAP, tx);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
|
|
|
|
dsl_bookmark_block_killed(ds, bp, tx);
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
mutex_enter(&ds->ds_lock);
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT3U(dsl_dataset_phys(ds)->ds_referenced_bytes, >=, used);
|
|
|
|
dsl_dataset_phys(ds)->ds_referenced_bytes -= used;
|
|
|
|
ASSERT3U(dsl_dataset_phys(ds)->ds_compressed_bytes, >=, compressed);
|
|
|
|
dsl_dataset_phys(ds)->ds_compressed_bytes -= compressed;
|
|
|
|
ASSERT3U(dsl_dataset_phys(ds)->ds_uncompressed_bytes, >=, uncompressed);
|
|
|
|
dsl_dataset_phys(ds)->ds_uncompressed_bytes -= uncompressed;
|
2008-11-20 23:01:55 +03:00
|
|
|
mutex_exit(&ds->ds_lock);
|
2008-12-03 23:09:06 +03:00
|
|
|
|
|
|
|
return (used);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2018-10-16 21:15:04 +03:00
|
|
|
struct feature_type_uint64_array_arg {
|
|
|
|
uint64_t length;
|
|
|
|
uint64_t *array;
|
|
|
|
};
|
|
|
|
|
|
|
|
static void
|
|
|
|
unload_zfeature(dsl_dataset_t *ds, spa_feature_t f)
|
|
|
|
{
|
|
|
|
switch (spa_feature_table[f].fi_type) {
|
|
|
|
case ZFEATURE_TYPE_BOOLEAN:
|
|
|
|
break;
|
|
|
|
case ZFEATURE_TYPE_UINT64_ARRAY:
|
|
|
|
{
|
|
|
|
struct feature_type_uint64_array_arg *ftuaa = ds->ds_feature[f];
|
|
|
|
kmem_free(ftuaa->array, ftuaa->length * sizeof (uint64_t));
|
|
|
|
kmem_free(ftuaa, sizeof (*ftuaa));
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
default:
|
|
|
|
panic("Invalid zfeature type %d", spa_feature_table[f].fi_type);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
load_zfeature(objset_t *mos, dsl_dataset_t *ds, spa_feature_t f)
|
|
|
|
{
|
|
|
|
int err = 0;
|
|
|
|
switch (spa_feature_table[f].fi_type) {
|
|
|
|
case ZFEATURE_TYPE_BOOLEAN:
|
|
|
|
err = zap_contains(mos, ds->ds_object,
|
|
|
|
spa_feature_table[f].fi_guid);
|
|
|
|
if (err == 0) {
|
|
|
|
ds->ds_feature[f] = (void *)B_TRUE;
|
|
|
|
} else {
|
|
|
|
ASSERT3U(err, ==, ENOENT);
|
|
|
|
err = 0;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case ZFEATURE_TYPE_UINT64_ARRAY:
|
|
|
|
{
|
|
|
|
uint64_t int_size, num_int;
|
|
|
|
uint64_t *data;
|
|
|
|
err = zap_length(mos, ds->ds_object,
|
|
|
|
spa_feature_table[f].fi_guid, &int_size, &num_int);
|
|
|
|
if (err != 0) {
|
|
|
|
ASSERT3U(err, ==, ENOENT);
|
|
|
|
err = 0;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
ASSERT3U(int_size, ==, sizeof (uint64_t));
|
|
|
|
data = kmem_alloc(int_size * num_int, KM_SLEEP);
|
|
|
|
VERIFY0(zap_lookup(mos, ds->ds_object,
|
|
|
|
spa_feature_table[f].fi_guid, int_size, num_int, data));
|
|
|
|
struct feature_type_uint64_array_arg *ftuaa =
|
|
|
|
kmem_alloc(sizeof (*ftuaa), KM_SLEEP);
|
|
|
|
ftuaa->length = num_int;
|
|
|
|
ftuaa->array = data;
|
|
|
|
ds->ds_feature[f] = ftuaa;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
default:
|
|
|
|
panic("Invalid zfeature type %d", spa_feature_table[f].fi_type);
|
|
|
|
}
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
2017-01-27 01:43:28 +03:00
|
|
|
/*
|
2019-09-03 03:56:41 +03:00
|
|
|
* We have to release the fsid synchronously or we risk that a subsequent
|
2017-01-27 01:43:28 +03:00
|
|
|
* mount of the same dataset will fail to unique_insert the fsid. This
|
|
|
|
* failure would manifest itself as the fsid of this dataset changing
|
|
|
|
* between mounts which makes NFS clients quite unhappy.
|
|
|
|
*/
|
2008-11-20 23:01:55 +03:00
|
|
|
static void
|
2017-01-27 01:43:28 +03:00
|
|
|
dsl_dataset_evict_sync(void *dbu)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2015-04-02 06:44:32 +03:00
|
|
|
dsl_dataset_t *ds = dbu;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT(ds->ds_owner == NULL);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
unique_remove(ds->ds_fsid_guid);
|
2017-01-27 01:43:28 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
dsl_dataset_evict_async(void *dbu)
|
|
|
|
{
|
|
|
|
dsl_dataset_t *ds = dbu;
|
|
|
|
|
|
|
|
ASSERT(ds->ds_owner == NULL);
|
|
|
|
|
|
|
|
ds->ds_dbuf = NULL;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
if (ds->ds_objset != NULL)
|
|
|
|
dmu_objset_evict(ds->ds_objset);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
if (ds->ds_prev) {
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds->ds_prev, ds);
|
2008-11-20 23:01:55 +03:00
|
|
|
ds->ds_prev = NULL;
|
|
|
|
}
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dsl_bookmark_fini_ds(ds);
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
bplist_destroy(&ds->ds_pending_deadlist);
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
if (dsl_deadlist_is_open(&ds->ds_deadlist))
|
2010-05-29 00:45:14 +04:00
|
|
|
dsl_deadlist_close(&ds->ds_deadlist);
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
if (dsl_deadlist_is_open(&ds->ds_remap_deadlist))
|
|
|
|
dsl_deadlist_close(&ds->ds_remap_deadlist);
|
2008-12-03 23:09:06 +03:00
|
|
|
if (ds->ds_dir)
|
2015-04-02 06:44:32 +03:00
|
|
|
dsl_dir_async_rele(ds->ds_dir, ds);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
ASSERT(!list_link_active(&ds->ds_synced_link));
|
|
|
|
|
2018-10-16 21:15:04 +03:00
|
|
|
for (spa_feature_t f = 0; f < SPA_FEATURES; f++) {
|
|
|
|
if (dsl_dataset_feature_is_active(ds, f))
|
|
|
|
unload_zfeature(ds, f);
|
|
|
|
}
|
|
|
|
|
2015-11-05 02:00:58 +03:00
|
|
|
list_destroy(&ds->ds_prop_cbs);
|
2008-11-20 23:01:55 +03:00
|
|
|
mutex_destroy(&ds->ds_lock);
|
|
|
|
mutex_destroy(&ds->ds_opening_lock);
|
2015-04-01 16:49:14 +03:00
|
|
|
mutex_destroy(&ds->ds_sendstream_lock);
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
mutex_destroy(&ds->ds_remap_deadlist_lock);
|
2018-10-01 20:42:05 +03:00
|
|
|
zfs_refcount_destroy(&ds->ds_longholds);
|
2017-01-27 22:43:42 +03:00
|
|
|
rrw_destroy(&ds->ds_bp_rwlock);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
kmem_free(ds, sizeof (dsl_dataset_t));
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
int
|
2008-11-20 23:01:55 +03:00
|
|
|
dsl_dataset_get_snapname(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
dsl_dataset_phys_t *headphys;
|
|
|
|
int err;
|
|
|
|
dmu_buf_t *headdbuf;
|
|
|
|
dsl_pool_t *dp = ds->ds_dir->dd_pool;
|
|
|
|
objset_t *mos = dp->dp_meta_objset;
|
|
|
|
|
|
|
|
if (ds->ds_snapname[0])
|
|
|
|
return (0);
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(ds)->ds_next_snap_obj == 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
return (0);
|
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
err = dmu_bonus_hold(mos, dsl_dir_phys(ds->ds_dir)->dd_head_dataset_obj,
|
2008-11-20 23:01:55 +03:00
|
|
|
FTAG, &headdbuf);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (err != 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
return (err);
|
|
|
|
headphys = headdbuf->db_data;
|
|
|
|
err = zap_value_search(dp->dp_meta_objset,
|
|
|
|
headphys->ds_snapnames_zapobj, ds->ds_object, 0, ds->ds_snapname);
|
2015-08-23 17:58:11 +03:00
|
|
|
if (err != 0 && zfs_recover == B_TRUE) {
|
|
|
|
err = 0;
|
|
|
|
(void) snprintf(ds->ds_snapname, sizeof (ds->ds_snapname),
|
|
|
|
"SNAPOBJ=%llu-ERR=%d",
|
|
|
|
(unsigned long long)ds->ds_object, err);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
dmu_buf_rele(headdbuf, FTAG);
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
2013-01-26 02:57:53 +04:00
|
|
|
int
|
2008-12-03 23:09:06 +03:00
|
|
|
dsl_dataset_snap_lookup(dsl_dataset_t *ds, const char *name, uint64_t *value)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2008-12-03 23:09:06 +03:00
|
|
|
objset_t *mos = ds->ds_dir->dd_pool->dp_meta_objset;
|
2015-04-01 18:14:34 +03:00
|
|
|
uint64_t snapobj = dsl_dataset_phys(ds)->ds_snapnames_zapobj;
|
2017-02-03 01:13:41 +03:00
|
|
|
matchtype_t mt = 0;
|
2008-11-20 23:01:55 +03:00
|
|
|
int err;
|
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(ds)->ds_flags & DS_FLAG_CI_DATASET)
|
2017-02-03 01:13:41 +03:00
|
|
|
mt = MT_NORMALIZE;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
err = zap_lookup_norm(mos, snapobj, name, 8, 1,
|
2008-11-20 23:01:55 +03:00
|
|
|
value, mt, NULL, 0, NULL);
|
2017-02-03 01:13:41 +03:00
|
|
|
if (err == ENOTSUP && (mt & MT_NORMALIZE))
|
2008-12-03 23:09:06 +03:00
|
|
|
err = zap_lookup(mos, snapobj, name, 8, 1, value);
|
2008-11-20 23:01:55 +03:00
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
int
|
2015-04-01 16:07:48 +03:00
|
|
|
dsl_dataset_snap_remove(dsl_dataset_t *ds, const char *name, dmu_tx_t *tx,
|
|
|
|
boolean_t adj_cnt)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2008-12-03 23:09:06 +03:00
|
|
|
objset_t *mos = ds->ds_dir->dd_pool->dp_meta_objset;
|
2015-04-01 18:14:34 +03:00
|
|
|
uint64_t snapobj = dsl_dataset_phys(ds)->ds_snapnames_zapobj;
|
2017-02-03 01:13:41 +03:00
|
|
|
matchtype_t mt = 0;
|
2008-11-20 23:01:55 +03:00
|
|
|
int err;
|
|
|
|
|
2022-08-03 02:45:30 +03:00
|
|
|
dsl_dir_snap_cmtime_update(ds->ds_dir, tx);
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(ds)->ds_flags & DS_FLAG_CI_DATASET)
|
2017-02-03 01:13:41 +03:00
|
|
|
mt = MT_NORMALIZE;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
err = zap_remove_norm(mos, snapobj, name, mt, tx);
|
2017-02-03 01:13:41 +03:00
|
|
|
if (err == ENOTSUP && (mt & MT_NORMALIZE))
|
2008-12-03 23:09:06 +03:00
|
|
|
err = zap_remove(mos, snapobj, name, tx);
|
2015-04-01 16:07:48 +03:00
|
|
|
|
|
|
|
if (err == 0 && adj_cnt)
|
|
|
|
dsl_fs_ss_count_adjust(ds->ds_dir, -1,
|
|
|
|
DD_FIELD_SNAPSHOT_COUNT, tx);
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
2015-04-02 14:59:15 +03:00
|
|
|
boolean_t
|
2022-04-19 21:38:30 +03:00
|
|
|
dsl_dataset_try_add_ref(dsl_pool_t *dp, dsl_dataset_t *ds, const void *tag)
|
2015-04-02 14:59:15 +03:00
|
|
|
{
|
2015-04-02 15:00:07 +03:00
|
|
|
dmu_buf_t *dbuf = ds->ds_dbuf;
|
|
|
|
boolean_t result = B_FALSE;
|
|
|
|
|
|
|
|
if (dbuf != NULL && dmu_buf_try_add_ref(dbuf, dp->dp_meta_objset,
|
|
|
|
ds->ds_object, DMU_BONUS_BLKID, tag)) {
|
|
|
|
|
|
|
|
if (ds == dmu_buf_get_user(dbuf))
|
|
|
|
result = B_TRUE;
|
|
|
|
else
|
|
|
|
dmu_buf_rele(dbuf, tag);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (result);
|
2015-04-02 14:59:15 +03:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
int
|
2022-04-19 21:38:30 +03:00
|
|
|
dsl_dataset_hold_obj(dsl_pool_t *dp, uint64_t dsobj, const void *tag,
|
2018-10-03 19:47:11 +03:00
|
|
|
dsl_dataset_t **dsp)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
|
|
|
objset_t *mos = dp->dp_meta_objset;
|
|
|
|
dmu_buf_t *dbuf;
|
|
|
|
dsl_dataset_t *ds;
|
|
|
|
int err;
|
2010-08-27 01:24:34 +04:00
|
|
|
dmu_object_info_t doi;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT(dsl_pool_config_held(dp));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
err = dmu_bonus_hold(mos, dsobj, tag, &dbuf);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (err != 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
return (err);
|
2010-08-27 01:24:34 +04:00
|
|
|
|
|
|
|
/* Make sure dsobj has the correct object type. */
|
|
|
|
dmu_object_info_from_db(dbuf, &doi);
|
2013-10-08 21:13:05 +04:00
|
|
|
if (doi.doi_bonus_type != DMU_OT_DSL_DATASET) {
|
2013-06-11 21:13:38 +04:00
|
|
|
dmu_buf_rele(dbuf, tag);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2013-06-11 21:13:38 +04:00
|
|
|
}
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
ds = dmu_buf_get_user(dbuf);
|
|
|
|
if (ds == NULL) {
|
2010-08-26 20:58:04 +04:00
|
|
|
dsl_dataset_t *winner = NULL;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2014-11-21 03:09:39 +03:00
|
|
|
ds = kmem_zalloc(sizeof (dsl_dataset_t), KM_SLEEP);
|
2008-11-20 23:01:55 +03:00
|
|
|
ds->ds_dbuf = dbuf;
|
|
|
|
ds->ds_object = dsobj;
|
2015-04-02 06:44:32 +03:00
|
|
|
ds->ds_is_snapshot = dsl_dataset_phys(ds)->ds_num_children != 0;
|
2010-08-26 21:26:44 +04:00
|
|
|
list_link_init(&ds->ds_synced_link);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
err = dsl_dir_hold_obj(dp, dsl_dataset_phys(ds)->ds_dir_obj,
|
|
|
|
NULL, ds, &ds->ds_dir);
|
|
|
|
if (err != 0) {
|
|
|
|
kmem_free(ds, sizeof (dsl_dataset_t));
|
|
|
|
dmu_buf_rele(dbuf, tag);
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
mutex_init(&ds->ds_lock, NULL, MUTEX_DEFAULT, NULL);
|
|
|
|
mutex_init(&ds->ds_opening_lock, NULL, MUTEX_DEFAULT, NULL);
|
2012-05-10 02:05:14 +04:00
|
|
|
mutex_init(&ds->ds_sendstream_lock, NULL, MUTEX_DEFAULT, NULL);
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
mutex_init(&ds->ds_remap_deadlist_lock,
|
|
|
|
NULL, MUTEX_DEFAULT, NULL);
|
2017-01-27 22:43:42 +03:00
|
|
|
rrw_init(&ds->ds_bp_rwlock, B_FALSE);
|
2018-10-01 20:42:05 +03:00
|
|
|
zfs_refcount_create(&ds->ds_longholds);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
bplist_create(&ds->ds_pending_deadlist);
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
list_create(&ds->ds_sendstreams, sizeof (dmu_sendstatus_t),
|
|
|
|
offsetof(dmu_sendstatus_t, dss_link));
|
2012-05-10 02:05:14 +04:00
|
|
|
|
2015-11-05 02:00:58 +03:00
|
|
|
list_create(&ds->ds_prop_cbs, sizeof (dsl_prop_cb_record_t),
|
|
|
|
offsetof(dsl_prop_cb_record_t, cbr_ds_node));
|
|
|
|
|
2014-11-03 23:15:08 +03:00
|
|
|
if (doi.doi_type == DMU_OTN_ZAP_METADATA) {
|
2015-07-24 19:53:55 +03:00
|
|
|
spa_feature_t f;
|
|
|
|
|
|
|
|
for (f = 0; f < SPA_FEATURES; f++) {
|
|
|
|
if (!(spa_feature_table[f].fi_flags &
|
|
|
|
ZFEATURE_FLAG_PER_DATASET))
|
|
|
|
continue;
|
2018-10-16 21:15:04 +03:00
|
|
|
err = load_zfeature(mos, ds, f);
|
2015-05-12 20:23:45 +03:00
|
|
|
}
|
2014-11-03 23:15:08 +03:00
|
|
|
}
|
|
|
|
|
2015-04-02 06:44:32 +03:00
|
|
|
if (!ds->ds_is_snapshot) {
|
2008-11-20 23:01:55 +03:00
|
|
|
ds->ds_snapname[0] = '\0';
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(ds)->ds_prev_snap_obj != 0) {
|
2013-09-04 16:00:57 +04:00
|
|
|
err = dsl_dataset_hold_obj(dp,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_prev_snap_obj,
|
2008-12-03 23:09:06 +03:00
|
|
|
ds, &ds->ds_prev);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
2022-09-24 02:52:03 +03:00
|
|
|
if (err != 0)
|
|
|
|
goto after_dsl_bookmark_fini;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
err = dsl_bookmark_init_ds(ds);
|
2009-08-18 22:43:27 +04:00
|
|
|
} else {
|
|
|
|
if (zfs_flags & ZFS_DEBUG_SNAPNAMES)
|
|
|
|
err = dsl_dataset_get_snapname(ds);
|
2015-04-01 18:14:34 +03:00
|
|
|
if (err == 0 &&
|
|
|
|
dsl_dataset_phys(ds)->ds_userrefs_obj != 0) {
|
2009-08-18 22:43:27 +04:00
|
|
|
err = zap_count(
|
|
|
|
ds->ds_dir->dd_pool->dp_meta_objset,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_userrefs_obj,
|
2009-08-18 22:43:27 +04:00
|
|
|
&ds->ds_userrefs);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2015-04-02 06:44:32 +03:00
|
|
|
if (err == 0 && !ds->ds_is_snapshot) {
|
2013-09-04 16:00:57 +04:00
|
|
|
err = dsl_prop_get_int_ds(ds,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_REFRESERVATION),
|
|
|
|
&ds->ds_reserved);
|
2008-11-20 23:01:55 +03:00
|
|
|
if (err == 0) {
|
2013-09-04 16:00:57 +04:00
|
|
|
err = dsl_prop_get_int_ds(ds,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_REFQUOTA),
|
|
|
|
&ds->ds_quota);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
} else {
|
|
|
|
ds->ds_reserved = ds->ds_quota = 0;
|
|
|
|
}
|
|
|
|
|
2019-02-04 22:24:55 +03:00
|
|
|
if (err == 0 && ds->ds_dir->dd_crypto_obj != 0 &&
|
|
|
|
ds->ds_is_snapshot &&
|
|
|
|
zap_contains(mos, dsobj, DS_FIELD_IVSET_GUID) != 0) {
|
|
|
|
dp->dp_spa->spa_errata =
|
|
|
|
ZPOOL_ERRATA_ZOL_8308_ENCRYPTION;
|
|
|
|
}
|
|
|
|
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
dsl_deadlist_open(&ds->ds_deadlist,
|
|
|
|
mos, dsl_dataset_phys(ds)->ds_deadlist_obj);
|
|
|
|
uint64_t remap_deadlist_obj =
|
|
|
|
dsl_dataset_get_remap_deadlist_object(ds);
|
|
|
|
if (remap_deadlist_obj != 0) {
|
|
|
|
dsl_deadlist_open(&ds->ds_remap_deadlist, mos,
|
|
|
|
remap_deadlist_obj);
|
|
|
|
}
|
|
|
|
|
2017-01-27 01:43:28 +03:00
|
|
|
dmu_buf_init_user(&ds->ds_dbu, dsl_dataset_evict_sync,
|
|
|
|
dsl_dataset_evict_async, &ds->ds_dbuf);
|
2015-04-02 06:44:32 +03:00
|
|
|
if (err == 0)
|
|
|
|
winner = dmu_buf_set_user_ie(dbuf, &ds->ds_dbu);
|
|
|
|
|
|
|
|
if (err != 0 || winner != NULL) {
|
2010-05-29 00:45:14 +04:00
|
|
|
dsl_deadlist_close(&ds->ds_deadlist);
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
if (dsl_deadlist_is_open(&ds->ds_remap_deadlist))
|
|
|
|
dsl_deadlist_close(&ds->ds_remap_deadlist);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dsl_bookmark_fini_ds(ds);
|
2022-09-24 02:52:03 +03:00
|
|
|
after_dsl_bookmark_fini:
|
2008-12-03 23:09:06 +03:00
|
|
|
if (ds->ds_prev)
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds->ds_prev, ds);
|
|
|
|
dsl_dir_rele(ds->ds_dir, ds);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
for (spa_feature_t f = 0; f < SPA_FEATURES; f++) {
|
|
|
|
if (dsl_dataset_feature_is_active(ds, f))
|
|
|
|
unload_zfeature(ds, f);
|
|
|
|
}
|
|
|
|
|
2019-03-19 06:34:30 +03:00
|
|
|
list_destroy(&ds->ds_prop_cbs);
|
|
|
|
list_destroy(&ds->ds_sendstreams);
|
2022-09-24 02:52:03 +03:00
|
|
|
bplist_destroy(&ds->ds_pending_deadlist);
|
2008-11-20 23:01:55 +03:00
|
|
|
mutex_destroy(&ds->ds_lock);
|
|
|
|
mutex_destroy(&ds->ds_opening_lock);
|
2015-04-01 16:49:14 +03:00
|
|
|
mutex_destroy(&ds->ds_sendstream_lock);
|
2019-03-19 06:34:30 +03:00
|
|
|
mutex_destroy(&ds->ds_remap_deadlist_lock);
|
2018-10-01 20:42:05 +03:00
|
|
|
zfs_refcount_destroy(&ds->ds_longholds);
|
2019-03-19 06:34:30 +03:00
|
|
|
rrw_destroy(&ds->ds_bp_rwlock);
|
2008-11-20 23:01:55 +03:00
|
|
|
kmem_free(ds, sizeof (dsl_dataset_t));
|
2013-09-04 16:00:57 +04:00
|
|
|
if (err != 0) {
|
2008-11-20 23:01:55 +03:00
|
|
|
dmu_buf_rele(dbuf, tag);
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
ds = winner;
|
|
|
|
} else {
|
|
|
|
ds->ds_fsid_guid =
|
2015-04-01 18:14:34 +03:00
|
|
|
unique_insert(dsl_dataset_phys(ds)->ds_fsid_guid);
|
2017-01-27 01:43:28 +03:00
|
|
|
if (ds->ds_fsid_guid !=
|
|
|
|
dsl_dataset_phys(ds)->ds_fsid_guid) {
|
|
|
|
zfs_dbgmsg("ds_fsid_guid changed from "
|
|
|
|
"%llx to %llx for pool %s dataset id %llu",
|
|
|
|
(long long)
|
|
|
|
dsl_dataset_phys(ds)->ds_fsid_guid,
|
|
|
|
(long long)ds->ds_fsid_guid,
|
|
|
|
spa_name(dp->dp_spa),
|
2021-06-23 07:53:45 +03:00
|
|
|
(u_longlong_t)dsobj);
|
2017-01-27 01:43:28 +03:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
}
|
2018-10-03 19:47:11 +03:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
ASSERT3P(ds->ds_dbuf, ==, dbuf);
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT3P(dsl_dataset_phys(ds), ==, dbuf->db_data);
|
|
|
|
ASSERT(dsl_dataset_phys(ds)->ds_prev_snap_obj != 0 ||
|
2008-12-03 23:09:06 +03:00
|
|
|
spa_version(dp->dp_spa) < SPA_VERSION_ORIGIN ||
|
|
|
|
dp->dp_origin_snap == NULL || ds == dp->dp_origin_snap);
|
2008-11-20 23:01:55 +03:00
|
|
|
*dsp = ds;
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
int
|
2018-10-03 19:47:11 +03:00
|
|
|
dsl_dataset_create_key_mapping(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
dsl_dir_t *dd = ds->ds_dir;
|
|
|
|
|
|
|
|
if (dd->dd_crypto_obj == 0)
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
return (spa_keystore_create_mapping(dd->dd_pool->dp_spa,
|
|
|
|
ds, ds, &ds->ds_key_mapping));
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
dsl_dataset_hold_obj_flags(dsl_pool_t *dp, uint64_t dsobj,
|
2022-04-19 21:38:30 +03:00
|
|
|
ds_hold_flags_t flags, const void *tag, dsl_dataset_t **dsp)
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
{
|
2018-10-03 19:47:11 +03:00
|
|
|
int err;
|
|
|
|
|
|
|
|
err = dsl_dataset_hold_obj(dp, dsobj, tag, dsp);
|
|
|
|
if (err != 0)
|
|
|
|
return (err);
|
|
|
|
|
|
|
|
ASSERT3P(*dsp, !=, NULL);
|
|
|
|
|
|
|
|
if (flags & DS_HOLD_FLAG_DECRYPT) {
|
|
|
|
err = dsl_dataset_create_key_mapping(*dsp);
|
|
|
|
if (err != 0)
|
|
|
|
dsl_dataset_rele(*dsp, tag);
|
|
|
|
}
|
|
|
|
|
|
|
|
return (err);
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
dsl_dataset_hold_flags(dsl_pool_t *dp, const char *name, ds_hold_flags_t flags,
|
2022-04-19 21:38:30 +03:00
|
|
|
const void *tag, dsl_dataset_t **dsp)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
|
|
|
dsl_dir_t *dd;
|
2008-12-03 23:09:06 +03:00
|
|
|
const char *snapname;
|
2008-11-20 23:01:55 +03:00
|
|
|
uint64_t obj;
|
|
|
|
int err = 0;
|
2015-12-22 04:31:57 +03:00
|
|
|
dsl_dataset_t *ds;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
err = dsl_dir_hold(dp, name, FTAG, &dd, &snapname);
|
|
|
|
if (err != 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
return (err);
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT(dsl_pool_config_held(dp));
|
2015-04-01 18:14:34 +03:00
|
|
|
obj = dsl_dir_phys(dd)->dd_head_dataset_obj;
|
2013-09-04 16:00:57 +04:00
|
|
|
if (obj != 0)
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
err = dsl_dataset_hold_obj_flags(dp, obj, flags, tag, &ds);
|
2008-12-03 23:09:06 +03:00
|
|
|
else
|
2013-03-08 22:41:28 +04:00
|
|
|
err = SET_ERROR(ENOENT);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
/* we may be looking for a snapshot */
|
|
|
|
if (err == 0 && snapname != NULL) {
|
2015-12-22 04:31:57 +03:00
|
|
|
dsl_dataset_t *snap_ds;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
if (*snapname++ != '@') {
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dsl_dataset_rele_flags(ds, flags, tag);
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dir_rele(dd, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOENT));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
dprintf("looking for snapshot '%s'\n", snapname);
|
2015-12-22 04:31:57 +03:00
|
|
|
err = dsl_dataset_snap_lookup(ds, snapname, &obj);
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
if (err == 0) {
|
|
|
|
err = dsl_dataset_hold_obj_flags(dp, obj, flags, tag,
|
|
|
|
&snap_ds);
|
|
|
|
}
|
|
|
|
dsl_dataset_rele_flags(ds, flags, tag);
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (err == 0) {
|
2015-12-22 04:31:57 +03:00
|
|
|
mutex_enter(&snap_ds->ds_lock);
|
|
|
|
if (snap_ds->ds_snapname[0] == 0)
|
|
|
|
(void) strlcpy(snap_ds->ds_snapname, snapname,
|
|
|
|
sizeof (snap_ds->ds_snapname));
|
|
|
|
mutex_exit(&snap_ds->ds_lock);
|
|
|
|
ds = snap_ds;
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
}
|
2015-12-22 04:31:57 +03:00
|
|
|
if (err == 0)
|
|
|
|
*dsp = ds;
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dir_rele(dd, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
2022-04-19 21:38:30 +03:00
|
|
|
dsl_dataset_hold(dsl_pool_t *dp, const char *name, const void *tag,
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dsl_dataset_t **dsp)
|
|
|
|
{
|
|
|
|
return (dsl_dataset_hold_flags(dp, name, 0, tag, dsp));
|
|
|
|
}
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
static int
|
|
|
|
dsl_dataset_own_obj_impl(dsl_pool_t *dp, uint64_t dsobj, ds_hold_flags_t flags,
|
2022-04-19 21:38:30 +03:00
|
|
|
const void *tag, boolean_t override, dsl_dataset_t **dsp)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
int err = dsl_dataset_hold_obj_flags(dp, dsobj, flags, tag, dsp);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (err != 0)
|
|
|
|
return (err);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (!dsl_dataset_tryown(*dsp, tag, override)) {
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dsl_dataset_rele_flags(*dsp, flags, tag);
|
2013-09-04 16:00:57 +04:00
|
|
|
*dsp = NULL;
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EBUSY));
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
int
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dsl_dataset_own_obj(dsl_pool_t *dp, uint64_t dsobj, ds_hold_flags_t flags,
|
2022-04-19 21:38:30 +03:00
|
|
|
const void *tag, dsl_dataset_t **dsp)
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
{
|
|
|
|
return (dsl_dataset_own_obj_impl(dp, dsobj, flags, tag, B_FALSE, dsp));
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
dsl_dataset_own_obj_force(dsl_pool_t *dp, uint64_t dsobj,
|
2022-04-19 21:38:30 +03:00
|
|
|
ds_hold_flags_t flags, const void *tag, dsl_dataset_t **dsp)
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
{
|
|
|
|
return (dsl_dataset_own_obj_impl(dp, dsobj, flags, tag, B_TRUE, dsp));
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
dsl_dataset_own_impl(dsl_pool_t *dp, const char *name, ds_hold_flags_t flags,
|
2022-04-19 21:38:30 +03:00
|
|
|
const void *tag, boolean_t override, dsl_dataset_t **dsp)
|
2013-09-04 16:00:57 +04:00
|
|
|
{
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
int err = dsl_dataset_hold_flags(dp, name, flags, tag, dsp);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (err != 0)
|
2008-12-03 23:09:06 +03:00
|
|
|
return (err);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (!dsl_dataset_tryown(*dsp, tag, override)) {
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dsl_dataset_rele_flags(*dsp, flags, tag);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EBUSY));
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
|
|
|
return (0);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
int
|
|
|
|
dsl_dataset_own_force(dsl_pool_t *dp, const char *name, ds_hold_flags_t flags,
|
2022-04-19 21:38:30 +03:00
|
|
|
const void *tag, dsl_dataset_t **dsp)
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
{
|
|
|
|
return (dsl_dataset_own_impl(dp, name, flags, tag, B_TRUE, dsp));
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
dsl_dataset_own(dsl_pool_t *dp, const char *name, ds_hold_flags_t flags,
|
2022-04-19 21:38:30 +03:00
|
|
|
const void *tag, dsl_dataset_t **dsp)
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
{
|
|
|
|
return (dsl_dataset_own_impl(dp, name, flags, tag, B_FALSE, dsp));
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* See the comment above dsl_pool_hold() for details. In summary, a long
|
|
|
|
* hold is used to prevent destruction of a dataset while the pool hold
|
|
|
|
* is dropped, allowing other concurrent operations (e.g. spa_sync()).
|
|
|
|
*
|
|
|
|
* The dataset and pool must be held when this function is called. After it
|
|
|
|
* is called, the pool hold may be released while the dataset is still held
|
|
|
|
* and accessed.
|
|
|
|
*/
|
|
|
|
void
|
2022-01-15 02:37:55 +03:00
|
|
|
dsl_dataset_long_hold(dsl_dataset_t *ds, const void *tag)
|
2013-09-04 16:00:57 +04:00
|
|
|
{
|
|
|
|
ASSERT(dsl_pool_config_held(ds->ds_dir->dd_pool));
|
2018-09-26 20:29:26 +03:00
|
|
|
(void) zfs_refcount_add(&ds->ds_longholds, tag);
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2022-01-15 02:37:55 +03:00
|
|
|
dsl_dataset_long_rele(dsl_dataset_t *ds, const void *tag)
|
2013-09-04 16:00:57 +04:00
|
|
|
{
|
2018-10-01 20:42:05 +03:00
|
|
|
(void) zfs_refcount_remove(&ds->ds_longholds, tag);
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Return B_TRUE if there are any long holds on this dataset. */
|
|
|
|
boolean_t
|
|
|
|
dsl_dataset_long_held(dsl_dataset_t *ds)
|
|
|
|
{
|
2018-10-01 20:42:05 +03:00
|
|
|
return (!zfs_refcount_is_zero(&ds->ds_longholds));
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
void
|
|
|
|
dsl_dataset_name(dsl_dataset_t *ds, char *name)
|
|
|
|
{
|
|
|
|
if (ds == NULL) {
|
2020-06-07 21:42:12 +03:00
|
|
|
(void) strlcpy(name, "mos", ZFS_MAX_DATASET_NAME_LEN);
|
2008-11-20 23:01:55 +03:00
|
|
|
} else {
|
|
|
|
dsl_dir_name(ds->ds_dir, name);
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_dataset_get_snapname(ds));
|
2008-11-20 23:01:55 +03:00
|
|
|
if (ds->ds_snapname[0]) {
|
2016-06-16 00:28:36 +03:00
|
|
|
VERIFY3U(strlcat(name, "@", ZFS_MAX_DATASET_NAME_LEN),
|
|
|
|
<, ZFS_MAX_DATASET_NAME_LEN);
|
2008-12-03 23:09:06 +03:00
|
|
|
/*
|
|
|
|
* We use a "recursive" mutex so that we
|
|
|
|
* can call dprintf_ds() with ds_lock held.
|
|
|
|
*/
|
2008-11-20 23:01:55 +03:00
|
|
|
if (!MUTEX_HELD(&ds->ds_lock)) {
|
|
|
|
mutex_enter(&ds->ds_lock);
|
2016-06-16 00:28:36 +03:00
|
|
|
VERIFY3U(strlcat(name, ds->ds_snapname,
|
|
|
|
ZFS_MAX_DATASET_NAME_LEN), <,
|
|
|
|
ZFS_MAX_DATASET_NAME_LEN);
|
2008-11-20 23:01:55 +03:00
|
|
|
mutex_exit(&ds->ds_lock);
|
|
|
|
} else {
|
2016-06-16 00:28:36 +03:00
|
|
|
VERIFY3U(strlcat(name, ds->ds_snapname,
|
|
|
|
ZFS_MAX_DATASET_NAME_LEN), <,
|
|
|
|
ZFS_MAX_DATASET_NAME_LEN);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-06-16 00:28:36 +03:00
|
|
|
int
|
|
|
|
dsl_dataset_namelen(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
VERIFY0(dsl_dataset_get_snapname(ds));
|
|
|
|
mutex_enter(&ds->ds_lock);
|
2017-11-04 23:25:13 +03:00
|
|
|
int len = strlen(ds->ds_snapname);
|
2019-03-11 19:11:04 +03:00
|
|
|
mutex_exit(&ds->ds_lock);
|
2016-12-09 21:52:08 +03:00
|
|
|
/* add '@' if ds is a snap */
|
|
|
|
if (len > 0)
|
|
|
|
len++;
|
|
|
|
len += dsl_dir_namelen(ds->ds_dir);
|
2016-06-16 00:28:36 +03:00
|
|
|
return (len);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
void
|
2022-04-19 21:38:30 +03:00
|
|
|
dsl_dataset_rele(dsl_dataset_t *ds, const void *tag)
|
2008-12-03 23:09:06 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dmu_buf_rele(ds->ds_dbuf, tag);
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2018-10-03 19:47:11 +03:00
|
|
|
dsl_dataset_remove_key_mapping(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
dsl_dir_t *dd = ds->ds_dir;
|
|
|
|
|
|
|
|
if (dd == NULL || dd->dd_crypto_obj == 0)
|
|
|
|
return;
|
|
|
|
|
|
|
|
(void) spa_keystore_remove_mapping(dd->dd_pool->dp_spa,
|
|
|
|
ds->ds_object, ds);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2022-04-19 21:38:30 +03:00
|
|
|
dsl_dataset_rele_flags(dsl_dataset_t *ds, ds_hold_flags_t flags,
|
|
|
|
const void *tag)
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
{
|
2018-10-03 19:47:11 +03:00
|
|
|
if (flags & DS_HOLD_FLAG_DECRYPT)
|
|
|
|
dsl_dataset_remove_key_mapping(ds);
|
|
|
|
|
|
|
|
dsl_dataset_rele(ds, tag);
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2022-04-19 21:38:30 +03:00
|
|
|
dsl_dataset_disown(dsl_dataset_t *ds, ds_hold_flags_t flags, const void *tag)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2015-04-01 18:13:28 +03:00
|
|
|
ASSERT3P(ds->ds_owner, ==, tag);
|
|
|
|
ASSERT(ds->ds_dbuf != NULL);
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
mutex_enter(&ds->ds_lock);
|
2008-12-03 23:09:06 +03:00
|
|
|
ds->ds_owner = NULL;
|
2008-11-20 23:01:55 +03:00
|
|
|
mutex_exit(&ds->ds_lock);
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_long_rele(ds, tag);
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dsl_dataset_rele_flags(ds, flags, tag);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
boolean_t
|
2022-04-19 21:38:30 +03:00
|
|
|
dsl_dataset_tryown(dsl_dataset_t *ds, const void *tag, boolean_t override)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2008-12-03 23:09:06 +03:00
|
|
|
boolean_t gotit = FALSE;
|
|
|
|
|
2016-01-07 00:22:48 +03:00
|
|
|
ASSERT(dsl_pool_config_held(ds->ds_dir->dd_pool));
|
2008-11-20 23:01:55 +03:00
|
|
|
mutex_enter(&ds->ds_lock);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (ds->ds_owner == NULL && (override || !(DS_IS_INCONSISTENT(ds) ||
|
|
|
|
(dsl_dataset_feature_is_active(ds,
|
|
|
|
SPA_FEATURE_REDACTED_DATASETS) &&
|
|
|
|
!zfs_allow_redacted_dataset_mount)))) {
|
2010-05-29 00:45:14 +04:00
|
|
|
ds->ds_owner = tag;
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_long_hold(ds, tag);
|
2008-12-03 23:09:06 +03:00
|
|
|
gotit = TRUE;
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
mutex_exit(&ds->ds_lock);
|
2008-12-03 23:09:06 +03:00
|
|
|
return (gotit);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2016-01-07 00:22:48 +03:00
|
|
|
boolean_t
|
|
|
|
dsl_dataset_has_owner(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
boolean_t rv;
|
|
|
|
mutex_enter(&ds->ds_lock);
|
|
|
|
rv = (ds->ds_owner != NULL);
|
|
|
|
mutex_exit(&ds->ds_lock);
|
|
|
|
return (rv);
|
|
|
|
}
|
|
|
|
|
2023-02-14 03:37:46 +03:00
|
|
|
boolean_t
|
2018-10-16 21:15:04 +03:00
|
|
|
zfeature_active(spa_feature_t f, void *arg)
|
|
|
|
{
|
|
|
|
switch (spa_feature_table[f].fi_type) {
|
|
|
|
case ZFEATURE_TYPE_BOOLEAN: {
|
2020-08-18 01:40:17 +03:00
|
|
|
boolean_t val = (boolean_t)(uintptr_t)arg;
|
2018-10-16 21:15:04 +03:00
|
|
|
ASSERT(val == B_FALSE || val == B_TRUE);
|
|
|
|
return (val);
|
|
|
|
}
|
|
|
|
case ZFEATURE_TYPE_UINT64_ARRAY:
|
|
|
|
/*
|
|
|
|
* In this case, arg is a uint64_t array. The feature is active
|
|
|
|
* if the array is non-null.
|
|
|
|
*/
|
|
|
|
return (arg != NULL);
|
|
|
|
default:
|
|
|
|
panic("Invalid zfeature type %d", spa_feature_table[f].fi_type);
|
|
|
|
return (B_FALSE);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
boolean_t
|
|
|
|
dsl_dataset_feature_is_active(dsl_dataset_t *ds, spa_feature_t f)
|
|
|
|
{
|
|
|
|
return (zfeature_active(f, ds->ds_feature[f]));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The buffers passed out by this function are references to internal buffers;
|
|
|
|
* they should not be freed by callers of this function, and they should not be
|
|
|
|
* used after the dataset has been released.
|
|
|
|
*/
|
|
|
|
boolean_t
|
|
|
|
dsl_dataset_get_uint64_array_feature(dsl_dataset_t *ds, spa_feature_t f,
|
|
|
|
uint64_t *outlength, uint64_t **outp)
|
|
|
|
{
|
|
|
|
VERIFY(spa_feature_table[f].fi_type & ZFEATURE_TYPE_UINT64_ARRAY);
|
|
|
|
if (!dsl_dataset_feature_is_active(ds, f)) {
|
|
|
|
return (B_FALSE);
|
|
|
|
}
|
|
|
|
struct feature_type_uint64_array_arg *ftuaa = ds->ds_feature[f];
|
|
|
|
*outp = ftuaa->array;
|
|
|
|
*outlength = ftuaa->length;
|
|
|
|
return (B_TRUE);
|
|
|
|
}
|
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
void
|
2018-10-16 21:15:04 +03:00
|
|
|
dsl_dataset_activate_feature(uint64_t dsobj, spa_feature_t f, void *arg,
|
|
|
|
dmu_tx_t *tx)
|
2015-07-24 19:53:55 +03:00
|
|
|
{
|
|
|
|
spa_t *spa = dmu_tx_pool(tx)->dp_spa;
|
|
|
|
objset_t *mos = dmu_tx_pool(tx)->dp_meta_objset;
|
|
|
|
uint64_t zero = 0;
|
|
|
|
|
|
|
|
VERIFY(spa_feature_table[f].fi_flags & ZFEATURE_FLAG_PER_DATASET);
|
|
|
|
|
|
|
|
spa_feature_incr(spa, f, tx);
|
|
|
|
dmu_object_zapify(mos, dsobj, DMU_OT_DSL_DATASET, tx);
|
|
|
|
|
2018-10-16 21:15:04 +03:00
|
|
|
switch (spa_feature_table[f].fi_type) {
|
|
|
|
case ZFEATURE_TYPE_BOOLEAN:
|
2020-08-18 01:40:17 +03:00
|
|
|
ASSERT3S((boolean_t)(uintptr_t)arg, ==, B_TRUE);
|
2018-10-16 21:15:04 +03:00
|
|
|
VERIFY0(zap_add(mos, dsobj, spa_feature_table[f].fi_guid,
|
|
|
|
sizeof (zero), 1, &zero, tx));
|
|
|
|
break;
|
|
|
|
case ZFEATURE_TYPE_UINT64_ARRAY:
|
|
|
|
{
|
|
|
|
struct feature_type_uint64_array_arg *ftuaa = arg;
|
|
|
|
VERIFY0(zap_add(mos, dsobj, spa_feature_table[f].fi_guid,
|
|
|
|
sizeof (uint64_t), ftuaa->length, ftuaa->array, tx));
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
default:
|
|
|
|
panic("Invalid zfeature type %d", spa_feature_table[f].fi_type);
|
|
|
|
}
|
2015-07-24 19:53:55 +03:00
|
|
|
}
|
|
|
|
|
2019-12-09 23:26:33 +03:00
|
|
|
static void
|
2018-10-16 21:15:04 +03:00
|
|
|
dsl_dataset_deactivate_feature_impl(dsl_dataset_t *ds, spa_feature_t f,
|
|
|
|
dmu_tx_t *tx)
|
2015-07-24 19:53:55 +03:00
|
|
|
{
|
|
|
|
spa_t *spa = dmu_tx_pool(tx)->dp_spa;
|
|
|
|
objset_t *mos = dmu_tx_pool(tx)->dp_meta_objset;
|
2018-10-16 21:15:04 +03:00
|
|
|
uint64_t dsobj = ds->ds_object;
|
2015-07-24 19:53:55 +03:00
|
|
|
|
|
|
|
VERIFY(spa_feature_table[f].fi_flags & ZFEATURE_FLAG_PER_DATASET);
|
|
|
|
|
|
|
|
VERIFY0(zap_remove(mos, dsobj, spa_feature_table[f].fi_guid, tx));
|
|
|
|
spa_feature_decr(spa, f, tx);
|
2018-10-16 21:15:04 +03:00
|
|
|
ds->ds_feature[f] = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
dsl_dataset_deactivate_feature(dsl_dataset_t *ds, spa_feature_t f, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
unload_zfeature(ds, f);
|
|
|
|
dsl_dataset_deactivate_feature_impl(ds, f, tx);
|
2015-07-24 19:53:55 +03:00
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
uint64_t
|
2008-12-03 23:09:06 +03:00
|
|
|
dsl_dataset_create_sync_dd(dsl_dir_t *dd, dsl_dataset_t *origin,
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dsl_crypto_params_t *dcp, uint64_t flags, dmu_tx_t *tx)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
|
|
|
dsl_pool_t *dp = dd->dd_pool;
|
|
|
|
dmu_buf_t *dbuf;
|
|
|
|
dsl_dataset_phys_t *dsphys;
|
|
|
|
uint64_t dsobj;
|
|
|
|
objset_t *mos = dp->dp_meta_objset;
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
if (origin == NULL)
|
|
|
|
origin = dp->dp_origin_snap;
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
ASSERT(origin == NULL || origin->ds_dir->dd_pool == dp);
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT(origin == NULL || dsl_dataset_phys(origin)->ds_num_children > 0);
|
2008-11-20 23:01:55 +03:00
|
|
|
ASSERT(dmu_tx_is_syncing(tx));
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT(dsl_dir_phys(dd)->dd_head_dataset_obj == 0);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
dsobj = dmu_object_alloc(mos, DMU_OT_DSL_DATASET, 0,
|
|
|
|
DMU_OT_DSL_DATASET, sizeof (dsl_dataset_phys_t), tx);
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dmu_bonus_hold(mos, dsobj, FTAG, &dbuf));
|
2008-11-20 23:01:55 +03:00
|
|
|
dmu_buf_will_dirty(dbuf, tx);
|
|
|
|
dsphys = dbuf->db_data;
|
2022-02-25 16:26:54 +03:00
|
|
|
memset(dsphys, 0, sizeof (dsl_dataset_phys_t));
|
2008-11-20 23:01:55 +03:00
|
|
|
dsphys->ds_dir_obj = dd->dd_object;
|
|
|
|
dsphys->ds_flags = flags;
|
|
|
|
dsphys->ds_fsid_guid = unique_create();
|
|
|
|
(void) random_get_pseudo_bytes((void*)&dsphys->ds_guid,
|
|
|
|
sizeof (dsphys->ds_guid));
|
|
|
|
dsphys->ds_snapnames_zapobj =
|
|
|
|
zap_create_norm(mos, U8_TEXTPREP_TOUPPER, DMU_OT_DSL_DS_SNAP_MAP,
|
|
|
|
DMU_OT_NONE, 0, tx);
|
|
|
|
dsphys->ds_creation_time = gethrestime_sec();
|
2008-12-03 23:09:06 +03:00
|
|
|
dsphys->ds_creation_txg = tx->tx_txg == TXG_INITIAL ? 1 : tx->tx_txg;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
if (origin == NULL) {
|
|
|
|
dsphys->ds_deadlist_obj = dsl_deadlist_alloc(mos, tx);
|
|
|
|
} else {
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_t *ohds; /* head of the origin snapshot */
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
dsphys->ds_prev_snap_obj = origin->ds_object;
|
|
|
|
dsphys->ds_prev_snap_txg =
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin)->ds_creation_txg;
|
2012-12-14 03:24:15 +04:00
|
|
|
dsphys->ds_referenced_bytes =
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin)->ds_referenced_bytes;
|
2008-11-20 23:01:55 +03:00
|
|
|
dsphys->ds_compressed_bytes =
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin)->ds_compressed_bytes;
|
2008-11-20 23:01:55 +03:00
|
|
|
dsphys->ds_uncompressed_bytes =
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin)->ds_uncompressed_bytes;
|
2017-01-27 22:43:42 +03:00
|
|
|
rrw_enter(&origin->ds_bp_rwlock, RW_READER, FTAG);
|
2015-04-01 18:14:34 +03:00
|
|
|
dsphys->ds_bp = dsl_dataset_phys(origin)->ds_bp;
|
2017-01-27 22:43:42 +03:00
|
|
|
rrw_exit(&origin->ds_bp_rwlock, FTAG);
|
2014-09-12 07:45:50 +04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Inherit flags that describe the dataset's contents
|
|
|
|
* (INCONSISTENT) or properties (Case Insensitive).
|
|
|
|
*/
|
2015-04-01 18:14:34 +03:00
|
|
|
dsphys->ds_flags |= dsl_dataset_phys(origin)->ds_flags &
|
2014-09-12 07:45:50 +04:00
|
|
|
(DS_FLAG_INCONSISTENT | DS_FLAG_CI_DATASET);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2017-11-04 23:25:13 +03:00
|
|
|
for (spa_feature_t f = 0; f < SPA_FEATURES; f++) {
|
2018-10-16 21:15:04 +03:00
|
|
|
if (zfeature_active(f, origin->ds_feature[f])) {
|
|
|
|
dsl_dataset_activate_feature(dsobj, f,
|
|
|
|
origin->ds_feature[f], tx);
|
|
|
|
}
|
2015-07-24 19:53:55 +03:00
|
|
|
}
|
2014-11-03 23:15:08 +03:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
dmu_buf_will_dirty(origin->ds_dbuf, tx);
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin)->ds_num_children++;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_dataset_hold_obj(dp,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(origin->ds_dir)->dd_head_dataset_obj,
|
|
|
|
FTAG, &ohds));
|
2010-05-29 00:45:14 +04:00
|
|
|
dsphys->ds_deadlist_obj = dsl_deadlist_clone(&ohds->ds_deadlist,
|
|
|
|
dsphys->ds_prev_snap_txg, dsphys->ds_prev_snap_obj, tx);
|
|
|
|
dsl_dataset_rele(ohds, FTAG);
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
if (spa_version(dp->dp_spa) >= SPA_VERSION_NEXT_CLONES) {
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(origin)->ds_next_clones_obj == 0) {
|
|
|
|
dsl_dataset_phys(origin)->ds_next_clones_obj =
|
2008-12-03 23:09:06 +03:00
|
|
|
zap_create(mos,
|
|
|
|
DMU_OT_NEXT_CLONES, DMU_OT_NONE, 0, tx);
|
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(zap_add_int(mos,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin)->ds_next_clones_obj,
|
|
|
|
dsobj, tx));
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
dmu_buf_will_dirty(dd->dd_dbuf, tx);
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(dd)->dd_origin_obj = origin->ds_object;
|
2010-05-29 00:45:14 +04:00
|
|
|
if (spa_version(dp->dp_spa) >= SPA_VERSION_DIR_CLONES) {
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dir_phys(origin->ds_dir)->dd_clones == 0) {
|
2010-05-29 00:45:14 +04:00
|
|
|
dmu_buf_will_dirty(origin->ds_dir->dd_dbuf, tx);
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(origin->ds_dir)->dd_clones =
|
2010-05-29 00:45:14 +04:00
|
|
|
zap_create(mos,
|
|
|
|
DMU_OT_DSL_CLONES, DMU_OT_NONE, 0, tx);
|
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(zap_add_int(mos,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(origin->ds_dir)->dd_clones,
|
|
|
|
dsobj, tx));
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
/* handle encryption */
|
|
|
|
dsl_dataset_create_crypt_sync(dsobj, dd, origin, dcp, tx);
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
if (spa_version(dp->dp_spa) >= SPA_VERSION_UNIQUE_ACCURATE)
|
|
|
|
dsphys->ds_flags |= DS_FLAG_UNIQUE_ACCURATE;
|
|
|
|
|
|
|
|
dmu_buf_rele(dbuf, FTAG);
|
|
|
|
|
|
|
|
dmu_buf_will_dirty(dd->dd_dbuf, tx);
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(dd)->dd_head_dataset_obj = dsobj;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
return (dsobj);
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
static void
|
|
|
|
dsl_dataset_zero_zil(dsl_dataset_t *ds, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
objset_t *os;
|
|
|
|
|
|
|
|
VERIFY0(dmu_objset_from_ds(ds, &os));
|
2022-02-25 16:26:54 +03:00
|
|
|
if (memcmp(&os->os_zil_header, &zero_zil, sizeof (zero_zil)) != 0) {
|
2016-11-22 02:09:54 +03:00
|
|
|
dsl_pool_t *dp = ds->ds_dir->dd_pool;
|
|
|
|
zio_t *zio;
|
|
|
|
|
2022-02-25 16:26:54 +03:00
|
|
|
memset(&os->os_zil_header, 0, sizeof (os->os_zil_header));
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
if (os->os_encrypted)
|
2018-02-01 23:37:24 +03:00
|
|
|
os->os_next_write_raw[tx->tx_txg & TXG_MASK] = B_TRUE;
|
2016-11-22 02:09:54 +03:00
|
|
|
|
|
|
|
zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED);
|
|
|
|
dsl_dataset_sync(ds, zio, tx);
|
|
|
|
VERIFY0(zio_wait(zio));
|
|
|
|
dsl_dataset_sync_done(ds, tx);
|
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
uint64_t
|
|
|
|
dsl_dataset_create_sync(dsl_dir_t *pdd, const char *lastname,
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dsl_dataset_t *origin, uint64_t flags, cred_t *cr,
|
|
|
|
dsl_crypto_params_t *dcp, dmu_tx_t *tx)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
|
|
|
dsl_pool_t *dp = pdd->dd_pool;
|
|
|
|
uint64_t dsobj, ddobj;
|
|
|
|
dsl_dir_t *dd;
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT(dmu_tx_is_syncing(tx));
|
2008-11-20 23:01:55 +03:00
|
|
|
ASSERT(lastname[0] != '@');
|
2019-07-26 20:54:14 +03:00
|
|
|
/*
|
|
|
|
* Filesystems will eventually have their origin set to dp_origin_snap,
|
|
|
|
* but that's taken care of in dsl_dataset_create_sync_dd. When
|
|
|
|
* creating a filesystem, this function is called with origin equal to
|
|
|
|
* NULL.
|
|
|
|
*/
|
|
|
|
if (origin != NULL)
|
|
|
|
ASSERT3P(origin, !=, dp->dp_origin_snap);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
ddobj = dsl_dir_create_sync(dp, pdd, lastname, tx);
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_dir_hold_obj(dp, ddobj, lastname, FTAG, &dd));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dsobj = dsl_dataset_create_sync_dd(dd, origin, dcp,
|
2013-09-04 16:00:57 +04:00
|
|
|
flags & ~DS_CREATE_FLAG_NODIRTY, tx);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
dsl_deleg_set_create_perms(dd, tx, cr);
|
|
|
|
|
2019-07-26 20:54:14 +03:00
|
|
|
/*
|
|
|
|
* If we are creating a clone and the livelist feature is enabled,
|
|
|
|
* add the entry DD_FIELD_LIVELIST to ZAP.
|
|
|
|
*/
|
|
|
|
if (origin != NULL &&
|
|
|
|
spa_feature_is_enabled(dp->dp_spa, SPA_FEATURE_LIVELIST)) {
|
|
|
|
objset_t *mos = dd->dd_pool->dp_meta_objset;
|
|
|
|
dsl_dir_zapify(dd, tx);
|
|
|
|
uint64_t obj = dsl_deadlist_alloc(mos, tx);
|
|
|
|
VERIFY0(zap_add(mos, dd->dd_object, DD_FIELD_LIVELIST,
|
|
|
|
sizeof (uint64_t), 1, &obj, tx));
|
|
|
|
spa_feature_incr(dp->dp_spa, SPA_FEATURE_LIVELIST, tx);
|
|
|
|
}
|
|
|
|
|
2015-04-01 16:07:48 +03:00
|
|
|
/*
|
|
|
|
* Since we're creating a new node we know it's a leaf, so we can
|
|
|
|
* initialize the counts if the limit feature is active.
|
|
|
|
*/
|
|
|
|
if (spa_feature_is_active(dp->dp_spa, SPA_FEATURE_FS_SS_LIMIT)) {
|
|
|
|
uint64_t cnt = 0;
|
|
|
|
objset_t *os = dd->dd_pool->dp_meta_objset;
|
|
|
|
|
|
|
|
dsl_dir_zapify(dd, tx);
|
|
|
|
VERIFY0(zap_add(os, dd->dd_object, DD_FIELD_FILESYSTEM_COUNT,
|
|
|
|
sizeof (cnt), 1, &cnt, tx));
|
|
|
|
VERIFY0(zap_add(os, dd->dd_object, DD_FIELD_SNAPSHOT_COUNT,
|
|
|
|
sizeof (cnt), 1, &cnt, tx));
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dir_rele(dd, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-08-27 01:24:34 +04:00
|
|
|
/*
|
|
|
|
* If we are creating a clone, make sure we zero out any stale
|
|
|
|
* data from the origin snapshots zil header.
|
|
|
|
*/
|
2013-09-04 16:00:57 +04:00
|
|
|
if (origin != NULL && !(flags & DS_CREATE_FLAG_NODIRTY)) {
|
2010-08-27 01:24:34 +04:00
|
|
|
dsl_dataset_t *ds;
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_dataset_hold_obj(dp, dsobj, FTAG, &ds));
|
|
|
|
dsl_dataset_zero_zil(ds, tx);
|
2010-08-27 01:24:34 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
return (dsobj);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2013-09-04 16:00:57 +04:00
|
|
|
* The unique space in the head dataset can be calculated by subtracting
|
|
|
|
* the space used in the most recent snapshot, that is still being used
|
|
|
|
* in this file system, from the space currently in use. To figure out
|
|
|
|
* the space in the most recent snapshot still in use, we need to take
|
|
|
|
* the total space used in the snapshot and subtract out the space that
|
|
|
|
* has been freed up since the snapshot was taken.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
2013-09-04 16:00:57 +04:00
|
|
|
void
|
|
|
|
dsl_dataset_recalc_head_uniq(dsl_dataset_t *ds)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
uint64_t mrs_used;
|
|
|
|
uint64_t dlused, dlcomp, dluncomp;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2015-04-02 06:44:32 +03:00
|
|
|
ASSERT(!ds->ds_is_snapshot);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(ds)->ds_prev_snap_obj != 0)
|
|
|
|
mrs_used = dsl_dataset_phys(ds->ds_prev)->ds_referenced_bytes;
|
2013-09-04 16:00:57 +04:00
|
|
|
else
|
|
|
|
mrs_used = 0;
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_deadlist_space(&ds->ds_deadlist, &dlused, &dlcomp, &dluncomp);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT3U(dlused, <=, mrs_used);
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_unique_bytes =
|
|
|
|
dsl_dataset_phys(ds)->ds_referenced_bytes - (mrs_used - dlused);
|
2011-11-17 22:14:36 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (spa_version(ds->ds_dir->dd_pool->dp_spa) >=
|
|
|
|
SPA_VERSION_UNIQUE_ACCURATE)
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_flags |= DS_FLAG_UNIQUE_ACCURATE;
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
void
|
|
|
|
dsl_dataset_remove_from_next_clones(dsl_dataset_t *ds, uint64_t obj,
|
|
|
|
dmu_tx_t *tx)
|
2009-08-18 22:43:27 +04:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
objset_t *mos = ds->ds_dir->dd_pool->dp_meta_objset;
|
2019-12-05 23:37:00 +03:00
|
|
|
uint64_t count __maybe_unused;
|
2017-11-04 23:25:13 +03:00
|
|
|
int err;
|
2013-09-04 16:00:57 +04:00
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT(dsl_dataset_phys(ds)->ds_num_children >= 2);
|
|
|
|
err = zap_remove_int(mos, dsl_dataset_phys(ds)->ds_next_clones_obj,
|
|
|
|
obj, tx);
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* The err should not be ENOENT, but a bug in a previous version
|
|
|
|
* of the code could cause upgrade_clones_cb() to not set
|
|
|
|
* ds_next_snap_obj when it should, leading to a missing entry.
|
|
|
|
* If we knew that the pool was created after
|
|
|
|
* SPA_VERSION_NEXT_CLONES, we could assert that it isn't
|
|
|
|
* ENOENT. However, at least we can check that we don't have
|
|
|
|
* too many entries in the next_clones_obj even after failing to
|
|
|
|
* remove this one.
|
|
|
|
*/
|
|
|
|
if (err != ENOENT)
|
|
|
|
VERIFY0(err);
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT0(zap_count(mos, dsl_dataset_phys(ds)->ds_next_clones_obj,
|
2013-09-04 16:00:57 +04:00
|
|
|
&count));
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT3U(count, <=, dsl_dataset_phys(ds)->ds_num_children - 2);
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
2009-08-18 22:43:27 +04:00
|
|
|
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
blkptr_t *
|
|
|
|
dsl_dataset_get_blkptr(dsl_dataset_t *ds)
|
|
|
|
{
|
2015-04-01 18:14:34 +03:00
|
|
|
return (&dsl_dataset_phys(ds)->ds_bp);
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
spa_t *
|
|
|
|
dsl_dataset_get_spa(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (ds->ds_dir->dd_pool->dp_spa);
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
void
|
|
|
|
dsl_dataset_dirty(dsl_dataset_t *ds, dmu_tx_t *tx)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_t *dp;
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (ds == NULL) /* this is the meta-objset */
|
|
|
|
return;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
ASSERT(ds->ds_objset != NULL);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(ds)->ds_next_snap_obj != 0)
|
2008-11-20 23:01:55 +03:00
|
|
|
panic("dirtying snapshot!");
|
|
|
|
|
2016-11-22 02:09:54 +03:00
|
|
|
/* Must not dirty a dataset in the same txg where it got snapshotted. */
|
|
|
|
ASSERT3U(tx->tx_txg, >, dsl_dataset_phys(ds)->ds_prev_snap_txg);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2016-11-22 02:09:54 +03:00
|
|
|
dp = ds->ds_dir->dd_pool;
|
2013-09-04 16:00:57 +04:00
|
|
|
if (txg_list_add(&dp->dp_dirty_datasets, ds, tx->tx_txg)) {
|
2018-10-03 19:47:11 +03:00
|
|
|
objset_t *os = ds->ds_objset;
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/* up the hold count until we can be written out */
|
|
|
|
dmu_buf_add_ref(ds->ds_dbuf, ds);
|
2018-10-03 19:47:11 +03:00
|
|
|
|
|
|
|
/* if this dataset is encrypted, grab a reference to the DCK */
|
|
|
|
if (ds->ds_dir->dd_crypto_obj != 0 &&
|
|
|
|
!os->os_raw_receive &&
|
|
|
|
!os->os_next_write_raw[tx->tx_txg & TXG_MASK]) {
|
|
|
|
ASSERT3P(ds->ds_key_mapping, !=, NULL);
|
|
|
|
key_mapping_add_ref(ds->ds_key_mapping, ds);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_snapshot_reserve_space(dsl_dataset_t *ds, dmu_tx_t *tx)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
uint64_t asize;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (!dmu_tx_is_syncing(tx))
|
2008-12-03 23:09:06 +03:00
|
|
|
return (0);
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
2013-09-04 16:00:57 +04:00
|
|
|
* If there's an fs-only reservation, any blocks that might become
|
|
|
|
* owned by the snapshot dataset must be accommodated by space
|
|
|
|
* outside of the reservation.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT(ds->ds_reserved == 0 || DS_UNIQUE_IS_ACCURATE(ds));
|
2015-04-01 18:14:34 +03:00
|
|
|
asize = MIN(dsl_dataset_phys(ds)->ds_unique_bytes, ds->ds_reserved);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (asize > dsl_dir_space_available(ds->ds_dir, NULL, 0, TRUE))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOSPC));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
/*
|
2013-09-04 16:00:57 +04:00
|
|
|
* Propagate any reserved space for this snapshot to other
|
|
|
|
* snapshot checks in this sync group.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
2013-09-04 16:00:57 +04:00
|
|
|
if (asize > 0)
|
|
|
|
dsl_dir_willuse_space(ds->ds_dir, asize, tx);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_snapshot_check_impl(dsl_dataset_t *ds, const char *snapname,
|
2020-07-12 03:18:02 +03:00
|
|
|
dmu_tx_t *tx, boolean_t recv, uint64_t cnt, cred_t *cr, proc_t *proc)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
int error;
|
|
|
|
uint64_t value;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
ds->ds_trysnap_txg = tx->tx_txg;
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (!dmu_tx_is_syncing(tx))
|
2009-08-18 22:43:27 +04:00
|
|
|
return (0);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
/*
|
2013-09-04 16:00:57 +04:00
|
|
|
* We don't allow multiple snapshots of the same txg. If there
|
|
|
|
* is already one, try again.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(ds)->ds_prev_snap_txg >= tx->tx_txg)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EAGAIN));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
/*
|
2013-09-04 16:00:57 +04:00
|
|
|
* Check for conflicting snapshot name.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_snap_lookup(ds, snapname, &value);
|
|
|
|
if (error == 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EEXIST));
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != ENOENT)
|
|
|
|
return (error);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-07-27 21:51:50 +04:00
|
|
|
/*
|
|
|
|
* We don't allow taking snapshots of inconsistent datasets, such as
|
|
|
|
* those into which we are currently receiving. However, if we are
|
|
|
|
* creating this snapshot as part of a receive, this check will be
|
|
|
|
* executed atomically with respect to the completion of the receive
|
|
|
|
* itself but prior to the clearing of DS_FLAG_INCONSISTENT; in this
|
|
|
|
* case we ignore this, knowing it will be fixed up for us shortly in
|
|
|
|
* dmu_recv_end_sync().
|
|
|
|
*/
|
|
|
|
if (!recv && DS_IS_INCONSISTENT(ds))
|
|
|
|
return (SET_ERROR(EBUSY));
|
|
|
|
|
2015-04-01 16:07:48 +03:00
|
|
|
/*
|
|
|
|
* Skip the check for temporary snapshots or if we have already checked
|
|
|
|
* the counts in dsl_dataset_snapshot_check. This means we really only
|
|
|
|
* check the count here when we're receiving a stream.
|
|
|
|
*/
|
|
|
|
if (cnt != 0 && cr != NULL) {
|
|
|
|
error = dsl_fs_ss_limit_check(ds->ds_dir, cnt,
|
2020-07-12 03:18:02 +03:00
|
|
|
ZFS_PROP_SNAPSHOT_LIMIT, NULL, cr, proc);
|
2015-04-01 16:07:48 +03:00
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_snapshot_reserve_space(ds, tx);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2018-02-08 19:24:39 +03:00
|
|
|
int
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_snapshot_check(void *arg, dmu_tx_t *tx)
|
2008-12-03 23:09:06 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_snapshot_arg_t *ddsa = arg;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
|
|
|
nvpair_t *pair;
|
|
|
|
int rv = 0;
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2015-04-01 16:07:48 +03:00
|
|
|
/*
|
|
|
|
* Pre-compute how many total new snapshots will be created for each
|
|
|
|
* level in the tree and below. This is needed for validating the
|
|
|
|
* snapshot limit when either taking a recursive snapshot or when
|
|
|
|
* taking multiple snapshots.
|
|
|
|
*
|
|
|
|
* The problem is that the counts are not actually adjusted when
|
|
|
|
* we are checking, only when we finally sync. For a single snapshot,
|
|
|
|
* this is easy, the count will increase by 1 at each node up the tree,
|
|
|
|
* but its more complicated for the recursive/multiple snapshot case.
|
|
|
|
*
|
|
|
|
* The dsl_fs_ss_limit_check function does recursively check the count
|
|
|
|
* at each level up the tree but since it is validating each snapshot
|
|
|
|
* independently we need to be sure that we are validating the complete
|
|
|
|
* count for the entire set of snapshots. We do this by rolling up the
|
|
|
|
* counts for each component of the name into an nvlist and then
|
|
|
|
* checking each of those cases with the aggregated count.
|
|
|
|
*
|
|
|
|
* This approach properly handles not only the recursive snapshot
|
|
|
|
* case (where we get all of those on the ddsa_snaps list) but also
|
|
|
|
* the sibling case (e.g. snapshot a/b and a/c so that we will also
|
|
|
|
* validate the limit on 'a' using a count of 2).
|
|
|
|
*
|
|
|
|
* We validate the snapshot names in the third loop and only report
|
|
|
|
* name errors once.
|
|
|
|
*/
|
|
|
|
if (dmu_tx_is_syncing(tx)) {
|
|
|
|
char *nm;
|
|
|
|
nvlist_t *cnt_track = NULL;
|
|
|
|
cnt_track = fnvlist_alloc();
|
|
|
|
|
|
|
|
nm = kmem_alloc(MAXPATHLEN, KM_SLEEP);
|
|
|
|
|
|
|
|
/* Rollup aggregated counts into the cnt_track list */
|
|
|
|
for (pair = nvlist_next_nvpair(ddsa->ddsa_snaps, NULL);
|
|
|
|
pair != NULL;
|
|
|
|
pair = nvlist_next_nvpair(ddsa->ddsa_snaps, pair)) {
|
|
|
|
char *pdelim;
|
|
|
|
uint64_t val;
|
|
|
|
|
|
|
|
(void) strlcpy(nm, nvpair_name(pair), MAXPATHLEN);
|
|
|
|
pdelim = strchr(nm, '@');
|
|
|
|
if (pdelim == NULL)
|
|
|
|
continue;
|
|
|
|
*pdelim = '\0';
|
|
|
|
|
|
|
|
do {
|
|
|
|
if (nvlist_lookup_uint64(cnt_track, nm,
|
|
|
|
&val) == 0) {
|
|
|
|
/* update existing entry */
|
|
|
|
fnvlist_add_uint64(cnt_track, nm,
|
|
|
|
val + 1);
|
|
|
|
} else {
|
|
|
|
/* add to list */
|
|
|
|
fnvlist_add_uint64(cnt_track, nm, 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
pdelim = strrchr(nm, '/');
|
|
|
|
if (pdelim != NULL)
|
|
|
|
*pdelim = '\0';
|
|
|
|
} while (pdelim != NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
kmem_free(nm, MAXPATHLEN);
|
|
|
|
|
|
|
|
/* Check aggregated counts at each level */
|
|
|
|
for (pair = nvlist_next_nvpair(cnt_track, NULL);
|
|
|
|
pair != NULL; pair = nvlist_next_nvpair(cnt_track, pair)) {
|
|
|
|
int error = 0;
|
|
|
|
char *name;
|
|
|
|
uint64_t cnt = 0;
|
|
|
|
dsl_dataset_t *ds;
|
|
|
|
|
|
|
|
name = nvpair_name(pair);
|
|
|
|
cnt = fnvpair_value_uint64(pair);
|
|
|
|
ASSERT(cnt > 0);
|
|
|
|
|
|
|
|
error = dsl_dataset_hold(dp, name, FTAG, &ds);
|
|
|
|
if (error == 0) {
|
|
|
|
error = dsl_fs_ss_limit_check(ds->ds_dir, cnt,
|
|
|
|
ZFS_PROP_SNAPSHOT_LIMIT, NULL,
|
2020-07-12 03:18:02 +03:00
|
|
|
ddsa->ddsa_cr, ddsa->ddsa_proc);
|
2015-04-01 16:07:48 +03:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (error != 0) {
|
|
|
|
if (ddsa->ddsa_errors != NULL)
|
|
|
|
fnvlist_add_int32(ddsa->ddsa_errors,
|
|
|
|
name, error);
|
|
|
|
rv = error;
|
|
|
|
/* only report one error for this check */
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
nvlist_free(cnt_track);
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
for (pair = nvlist_next_nvpair(ddsa->ddsa_snaps, NULL);
|
|
|
|
pair != NULL; pair = nvlist_next_nvpair(ddsa->ddsa_snaps, pair)) {
|
|
|
|
int error = 0;
|
|
|
|
dsl_dataset_t *ds;
|
2016-07-26 22:08:51 +03:00
|
|
|
char *name, *atp = NULL;
|
2016-06-16 00:28:36 +03:00
|
|
|
char dsname[ZFS_MAX_DATASET_NAME_LEN];
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
name = nvpair_name(pair);
|
2016-06-16 00:28:36 +03:00
|
|
|
if (strlen(name) >= ZFS_MAX_DATASET_NAME_LEN)
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(ENAMETOOLONG);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error == 0) {
|
|
|
|
atp = strchr(name, '@');
|
|
|
|
if (atp == NULL)
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EINVAL);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error == 0)
|
|
|
|
(void) strlcpy(dsname, name, atp - name + 1);
|
|
|
|
}
|
|
|
|
if (error == 0)
|
|
|
|
error = dsl_dataset_hold(dp, dsname, FTAG, &ds);
|
|
|
|
if (error == 0) {
|
2015-04-01 16:07:48 +03:00
|
|
|
/* passing 0/NULL skips dsl_fs_ss_limit_check */
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_snapshot_check_impl(ds,
|
2020-07-12 03:18:02 +03:00
|
|
|
atp + 1, tx, B_FALSE, 0, NULL, NULL);
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
}
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0) {
|
|
|
|
if (ddsa->ddsa_errors != NULL) {
|
|
|
|
fnvlist_add_int32(ddsa->ddsa_errors,
|
|
|
|
name, error);
|
|
|
|
}
|
|
|
|
rv = error;
|
|
|
|
}
|
|
|
|
}
|
2015-04-01 16:07:48 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
return (rv);
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
void
|
|
|
|
dsl_dataset_snapshot_sync_impl(dsl_dataset_t *ds, const char *snapname,
|
|
|
|
dmu_tx_t *tx)
|
2010-05-29 00:45:14 +04:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_t *dp = ds->ds_dir->dd_pool;
|
|
|
|
dmu_buf_t *dbuf;
|
|
|
|
dsl_dataset_phys_t *dsphys;
|
|
|
|
uint64_t dsobj, crtxg;
|
|
|
|
objset_t *mos = dp->dp_meta_objset;
|
2019-12-05 23:37:00 +03:00
|
|
|
objset_t *os __maybe_unused;
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
ASSERT(RRW_WRITE_HELD(&dp->dp_config_rwlock));
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
/*
|
2013-09-04 16:00:57 +04:00
|
|
|
* If we are on an old pool, the zil must not be active, in which
|
|
|
|
* case it will be zeroed. Usually zil_suspend() accomplishes this.
|
2010-05-29 00:45:14 +04:00
|
|
|
*/
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT(spa_version(dmu_tx_pool(tx)->dp_spa) >= SPA_VERSION_FAST_SNAP ||
|
|
|
|
dmu_objset_from_ds(ds, &os) != 0 ||
|
2022-02-25 16:26:54 +03:00
|
|
|
memcmp(&os->os_phys->os_zil_header, &zero_zil,
|
2013-09-04 16:00:57 +04:00
|
|
|
sizeof (zero_zil)) == 0);
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2016-11-22 02:09:54 +03:00
|
|
|
/* Should not snapshot a dirty dataset. */
|
|
|
|
ASSERT(!txg_list_member(&ds->ds_dir->dd_pool->dp_dirty_datasets,
|
|
|
|
ds, tx->tx_txg));
|
|
|
|
|
2015-04-01 16:07:48 +03:00
|
|
|
dsl_fs_ss_count_adjust(ds->ds_dir, 1, DD_FIELD_SNAPSHOT_COUNT, tx);
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
/*
|
2013-09-04 16:00:57 +04:00
|
|
|
* The origin's ds_creation_txg has to be < TXG_INITIAL
|
2008-12-03 23:09:06 +03:00
|
|
|
*/
|
|
|
|
if (strcmp(snapname, ORIGIN_DIR_NAME) == 0)
|
|
|
|
crtxg = 1;
|
|
|
|
else
|
|
|
|
crtxg = tx->tx_txg;
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
dsobj = dmu_object_alloc(mos, DMU_OT_DSL_DATASET, 0,
|
|
|
|
DMU_OT_DSL_DATASET, sizeof (dsl_dataset_phys_t), tx);
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dmu_bonus_hold(mos, dsobj, FTAG, &dbuf));
|
2008-11-20 23:01:55 +03:00
|
|
|
dmu_buf_will_dirty(dbuf, tx);
|
|
|
|
dsphys = dbuf->db_data;
|
2022-02-25 16:26:54 +03:00
|
|
|
memset(dsphys, 0, sizeof (dsl_dataset_phys_t));
|
2008-11-20 23:01:55 +03:00
|
|
|
dsphys->ds_dir_obj = ds->ds_dir->dd_object;
|
|
|
|
dsphys->ds_fsid_guid = unique_create();
|
|
|
|
(void) random_get_pseudo_bytes((void*)&dsphys->ds_guid,
|
|
|
|
sizeof (dsphys->ds_guid));
|
2015-04-01 18:14:34 +03:00
|
|
|
dsphys->ds_prev_snap_obj = dsl_dataset_phys(ds)->ds_prev_snap_obj;
|
|
|
|
dsphys->ds_prev_snap_txg = dsl_dataset_phys(ds)->ds_prev_snap_txg;
|
2008-11-20 23:01:55 +03:00
|
|
|
dsphys->ds_next_snap_obj = ds->ds_object;
|
|
|
|
dsphys->ds_num_children = 1;
|
|
|
|
dsphys->ds_creation_time = gethrestime_sec();
|
2008-12-03 23:09:06 +03:00
|
|
|
dsphys->ds_creation_txg = crtxg;
|
2015-04-01 18:14:34 +03:00
|
|
|
dsphys->ds_deadlist_obj = dsl_dataset_phys(ds)->ds_deadlist_obj;
|
|
|
|
dsphys->ds_referenced_bytes = dsl_dataset_phys(ds)->ds_referenced_bytes;
|
|
|
|
dsphys->ds_compressed_bytes = dsl_dataset_phys(ds)->ds_compressed_bytes;
|
|
|
|
dsphys->ds_uncompressed_bytes =
|
|
|
|
dsl_dataset_phys(ds)->ds_uncompressed_bytes;
|
|
|
|
dsphys->ds_flags = dsl_dataset_phys(ds)->ds_flags;
|
2017-01-27 22:43:42 +03:00
|
|
|
rrw_enter(&ds->ds_bp_rwlock, RW_READER, FTAG);
|
2015-04-01 18:14:34 +03:00
|
|
|
dsphys->ds_bp = dsl_dataset_phys(ds)->ds_bp;
|
2017-01-27 22:43:42 +03:00
|
|
|
rrw_exit(&ds->ds_bp_rwlock, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
dmu_buf_rele(dbuf, FTAG);
|
|
|
|
|
2017-11-04 23:25:13 +03:00
|
|
|
for (spa_feature_t f = 0; f < SPA_FEATURES; f++) {
|
2018-10-16 21:15:04 +03:00
|
|
|
if (zfeature_active(f, ds->ds_feature[f])) {
|
|
|
|
dsl_dataset_activate_feature(dsobj, f,
|
|
|
|
ds->ds_feature[f], tx);
|
|
|
|
}
|
2015-07-24 19:53:55 +03:00
|
|
|
}
|
2014-11-03 23:15:08 +03:00
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT3U(ds->ds_prev != 0, ==,
|
|
|
|
dsl_dataset_phys(ds)->ds_prev_snap_obj != 0);
|
2008-11-20 23:01:55 +03:00
|
|
|
if (ds->ds_prev) {
|
2008-12-03 23:09:06 +03:00
|
|
|
uint64_t next_clones_obj =
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds->ds_prev)->ds_next_clones_obj;
|
|
|
|
ASSERT(dsl_dataset_phys(ds->ds_prev)->ds_next_snap_obj ==
|
2008-11-20 23:01:55 +03:00
|
|
|
ds->ds_object ||
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds->ds_prev)->ds_num_children > 1);
|
|
|
|
if (dsl_dataset_phys(ds->ds_prev)->ds_next_snap_obj ==
|
|
|
|
ds->ds_object) {
|
2008-11-20 23:01:55 +03:00
|
|
|
dmu_buf_will_dirty(ds->ds_prev->ds_dbuf, tx);
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT3U(dsl_dataset_phys(ds)->ds_prev_snap_txg, ==,
|
|
|
|
dsl_dataset_phys(ds->ds_prev)->ds_creation_txg);
|
|
|
|
dsl_dataset_phys(ds->ds_prev)->ds_next_snap_obj = dsobj;
|
2008-12-03 23:09:06 +03:00
|
|
|
} else if (next_clones_obj != 0) {
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_remove_from_next_clones(ds->ds_prev,
|
2010-05-29 00:45:14 +04:00
|
|
|
dsphys->ds_next_snap_obj, tx);
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(zap_add_int(mos,
|
2008-12-03 23:09:06 +03:00
|
|
|
next_clones_obj, dsobj, tx));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we have a reference-reservation on this dataset, we will
|
|
|
|
* need to increase the amount of refreservation being charged
|
|
|
|
* since our unique space is going to zero.
|
|
|
|
*/
|
|
|
|
if (ds->ds_reserved) {
|
2010-05-29 00:45:14 +04:00
|
|
|
int64_t delta;
|
|
|
|
ASSERT(DS_UNIQUE_IS_ACCURATE(ds));
|
2015-04-01 18:14:34 +03:00
|
|
|
delta = MIN(dsl_dataset_phys(ds)->ds_unique_bytes,
|
|
|
|
ds->ds_reserved);
|
2008-12-03 23:09:06 +03:00
|
|
|
dsl_dir_diduse_space(ds->ds_dir, DD_USED_REFRSRV,
|
2010-05-29 00:45:14 +04:00
|
|
|
delta, 0, 0, tx);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
dmu_buf_will_dirty(ds->ds_dbuf, tx);
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_deadlist_obj =
|
|
|
|
dsl_deadlist_clone(&ds->ds_deadlist, UINT64_MAX,
|
|
|
|
dsl_dataset_phys(ds)->ds_prev_snap_obj, tx);
|
2010-05-29 00:45:14 +04:00
|
|
|
dsl_deadlist_close(&ds->ds_deadlist);
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_deadlist_open(&ds->ds_deadlist, mos,
|
|
|
|
dsl_dataset_phys(ds)->ds_deadlist_obj);
|
2010-05-29 00:45:14 +04:00
|
|
|
dsl_deadlist_add_key(&ds->ds_deadlist,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_prev_snap_txg, tx);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dsl_bookmark_snapshotted(ds, tx);
|
2010-05-29 00:45:14 +04:00
|
|
|
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
if (dsl_dataset_remap_deadlist_exists(ds)) {
|
|
|
|
uint64_t remap_deadlist_obj =
|
|
|
|
dsl_dataset_get_remap_deadlist_object(ds);
|
|
|
|
/*
|
|
|
|
* Move the remap_deadlist to the snapshot. The head
|
|
|
|
* will create a new remap deadlist on demand, from
|
|
|
|
* dsl_dataset_block_remapped().
|
|
|
|
*/
|
|
|
|
dsl_dataset_unset_remap_deadlist_object(ds, tx);
|
|
|
|
dsl_deadlist_close(&ds->ds_remap_deadlist);
|
|
|
|
|
|
|
|
dmu_object_zapify(mos, dsobj, DMU_OT_DSL_DATASET, tx);
|
|
|
|
VERIFY0(zap_add(mos, dsobj, DS_FIELD_REMAP_DEADLIST,
|
|
|
|
sizeof (remap_deadlist_obj), 1, &remap_deadlist_obj, tx));
|
|
|
|
}
|
|
|
|
|
2019-02-04 22:24:55 +03:00
|
|
|
/*
|
|
|
|
* Create a ivset guid for this snapshot if the dataset is
|
|
|
|
* encrypted. This may be overridden by a raw receive. A
|
|
|
|
* previous implementation of this code did not have this
|
|
|
|
* field as part of the on-disk format for ZFS encryption
|
|
|
|
* (see errata #4). As part of the remediation for this
|
|
|
|
* issue, we ask the user to enable the bookmark_v2 feature
|
|
|
|
* which is now a dependency of the encryption feature. We
|
|
|
|
* use this as a heuristic to determine when the user has
|
|
|
|
* elected to correct any datasets created with the old code.
|
|
|
|
* As a result, we only do this step if the bookmark_v2
|
|
|
|
* feature is enabled, which limits the number of states a
|
|
|
|
* given pool / dataset can be in with regards to terms of
|
|
|
|
* correcting the issue.
|
|
|
|
*/
|
|
|
|
if (ds->ds_dir->dd_crypto_obj != 0 &&
|
|
|
|
spa_feature_is_enabled(dp->dp_spa, SPA_FEATURE_BOOKMARK_V2)) {
|
|
|
|
uint64_t ivset_guid = unique_create();
|
|
|
|
|
|
|
|
dmu_object_zapify(mos, dsobj, DMU_OT_DSL_DATASET, tx);
|
|
|
|
VERIFY0(zap_add(mos, dsobj, DS_FIELD_IVSET_GUID,
|
|
|
|
sizeof (ivset_guid), 1, &ivset_guid, tx));
|
|
|
|
}
|
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT3U(dsl_dataset_phys(ds)->ds_prev_snap_txg, <, tx->tx_txg);
|
|
|
|
dsl_dataset_phys(ds)->ds_prev_snap_obj = dsobj;
|
|
|
|
dsl_dataset_phys(ds)->ds_prev_snap_txg = crtxg;
|
|
|
|
dsl_dataset_phys(ds)->ds_unique_bytes = 0;
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
if (spa_version(dp->dp_spa) >= SPA_VERSION_UNIQUE_ACCURATE)
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_flags |= DS_FLAG_UNIQUE_ACCURATE;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
VERIFY0(zap_add(mos, dsl_dataset_phys(ds)->ds_snapnames_zapobj,
|
2013-09-04 16:00:57 +04:00
|
|
|
snapname, 8, 1, &dsobj, tx));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
if (ds->ds_prev)
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds->ds_prev, ds);
|
|
|
|
VERIFY0(dsl_dataset_hold_obj(dp,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_prev_snap_obj, ds, &ds->ds_prev));
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
dsl_scan_ds_snapshotted(ds, tx);
|
|
|
|
|
2022-08-03 02:45:30 +03:00
|
|
|
dsl_dir_snap_cmtime_update(ds->ds_dir, tx);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2022-08-22 22:36:22 +03:00
|
|
|
if (zfs_snapshot_history_enabled)
|
|
|
|
spa_history_log_internal_ds(ds->ds_prev, "snapshot", tx, " ");
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2018-02-08 19:24:39 +03:00
|
|
|
void
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_snapshot_sync(void *arg, dmu_tx_t *tx)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_snapshot_arg_t *ddsa = arg;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
|
|
|
nvpair_t *pair;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
for (pair = nvlist_next_nvpair(ddsa->ddsa_snaps, NULL);
|
|
|
|
pair != NULL; pair = nvlist_next_nvpair(ddsa->ddsa_snaps, pair)) {
|
|
|
|
dsl_dataset_t *ds;
|
|
|
|
char *name, *atp;
|
2016-06-16 00:28:36 +03:00
|
|
|
char dsname[ZFS_MAX_DATASET_NAME_LEN];
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
name = nvpair_name(pair);
|
|
|
|
atp = strchr(name, '@');
|
|
|
|
(void) strlcpy(dsname, name, atp - name + 1);
|
|
|
|
VERIFY0(dsl_dataset_hold(dp, dsname, FTAG, &ds));
|
|
|
|
|
|
|
|
dsl_dataset_snapshot_sync_impl(ds, atp + 1, tx);
|
|
|
|
if (ddsa->ddsa_props != NULL) {
|
|
|
|
dsl_props_set_sync_impl(ds->ds_prev,
|
|
|
|
ZPROP_SRC_LOCAL, ddsa->ddsa_props, tx);
|
|
|
|
}
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* The snapshots must all be in the same pool.
|
|
|
|
* All-or-nothing: if there are any failures, nothing will be modified.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
dsl_dataset_snapshot(nvlist_t *snaps, nvlist_t *props, nvlist_t *errors)
|
2011-11-17 22:14:36 +04:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_snapshot_arg_t ddsa;
|
|
|
|
nvpair_t *pair;
|
|
|
|
boolean_t needsuspend;
|
|
|
|
int error;
|
|
|
|
spa_t *spa;
|
|
|
|
char *firstname;
|
|
|
|
nvlist_t *suspended = NULL;
|
2011-11-17 22:14:36 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
pair = nvlist_next_nvpair(snaps, NULL);
|
|
|
|
if (pair == NULL)
|
|
|
|
return (0);
|
|
|
|
firstname = nvpair_name(pair);
|
|
|
|
|
|
|
|
error = spa_open(firstname, &spa, FTAG);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
needsuspend = (spa_version(spa) < SPA_VERSION_FAST_SNAP);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
|
|
|
|
if (needsuspend) {
|
|
|
|
suspended = fnvlist_alloc();
|
|
|
|
for (pair = nvlist_next_nvpair(snaps, NULL); pair != NULL;
|
|
|
|
pair = nvlist_next_nvpair(snaps, pair)) {
|
2016-06-16 00:28:36 +03:00
|
|
|
char fsname[ZFS_MAX_DATASET_NAME_LEN];
|
2013-09-04 16:00:57 +04:00
|
|
|
char *snapname = nvpair_name(pair);
|
|
|
|
char *atp;
|
|
|
|
void *cookie;
|
|
|
|
|
|
|
|
atp = strchr(snapname, '@');
|
|
|
|
if (atp == NULL) {
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EINVAL);
|
2013-09-04 16:00:57 +04:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
(void) strlcpy(fsname, snapname, atp - snapname + 1);
|
|
|
|
|
|
|
|
error = zil_suspend(fsname, &cookie);
|
|
|
|
if (error != 0)
|
|
|
|
break;
|
|
|
|
fnvlist_add_uint64(suspended, fsname,
|
|
|
|
(uintptr_t)cookie);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
ddsa.ddsa_snaps = snaps;
|
|
|
|
ddsa.ddsa_props = props;
|
|
|
|
ddsa.ddsa_errors = errors;
|
2015-04-01 16:07:48 +03:00
|
|
|
ddsa.ddsa_cr = CRED();
|
2020-07-12 03:18:02 +03:00
|
|
|
ddsa.ddsa_proc = curproc;
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
if (error == 0) {
|
|
|
|
error = dsl_sync_task(firstname, dsl_dataset_snapshot_check,
|
|
|
|
dsl_dataset_snapshot_sync, &ddsa,
|
2014-11-03 23:28:43 +03:00
|
|
|
fnvlist_num_pairs(snaps) * 3, ZFS_SPACE_CHECK_NORMAL);
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
if (suspended != NULL) {
|
|
|
|
for (pair = nvlist_next_nvpair(suspended, NULL); pair != NULL;
|
|
|
|
pair = nvlist_next_nvpair(suspended, pair)) {
|
|
|
|
zil_resume((void *)(uintptr_t)
|
|
|
|
fnvpair_value_uint64(pair));
|
|
|
|
}
|
|
|
|
fnvlist_free(suspended);
|
|
|
|
}
|
|
|
|
|
async zvol minor node creation interferes with receive
When we finish a zfs receive, dmu_recv_end_sync() calls
zvol_create_minors(async=TRUE). This kicks off some other threads that
create the minor device nodes (in /dev/zvol/poolname/...). These async
threads call zvol_prefetch_minors_impl() and zvol_create_minor(), which
both call dmu_objset_own(), which puts a "long hold" on the dataset.
Since the zvol minor node creation is asynchronous, this can happen
after the `ZFS_IOC_RECV[_NEW]` ioctl and `zfs receive` process have
completed.
After the first receive ioctl has completed, userland may attempt to do
another receive into the same dataset (e.g. the next incremental
stream). This second receive and the asynchronous minor node creation
can interfere with one another in several different ways, because they
both require exclusive access to the dataset:
1. When the second receive is finishing up, dmu_recv_end_check() does
dsl_dataset_handoff_check(), which can fail with EBUSY if the async
minor node creation already has a "long hold" on this dataset. This
causes the 2nd receive to fail.
2. The async udev rule can fail if zvol_id and/or systemd-udevd try to
open the device while the the second receive's async attempt at minor
node creation owns the dataset (via zvol_prefetch_minors_impl). This
causes the minor node (/dev/zd*) to exist, but the udev-generated
/dev/zvol/... to not exist.
3. The async minor node creation can silently fail with EBUSY if the
first receive's zvol_create_minor() trys to own the dataset while the
second receive's zvol_prefetch_minors_impl already owns the dataset.
To address these problems, this change synchronously creates the minor
node. To avoid the lock ordering problems that the asynchrony was
introduced to fix (see #3681), we create the minor nodes from open
context, with no locks held, rather than from syncing contex as was
originally done.
Implementation notes:
We generally do not need to traverse children or prefetch anything (e.g.
when running the recv, snapshot, create, or clone subcommands of zfs).
We only need recursion when importing/opening a pool and when loading
encryption keys. The existing recursive, asynchronous, prefetching code
is preserved for use in these cases.
Channel programs may need to create zvol minor nodes, when creating a
snapshot of a zvol with the snapdev property set. We figure out what
snapshots are created when running the LUA program in syncing context.
In this case we need to remember what snapshots were created, and then
try to create their minor nodes from open context, after the LUA code
has completed.
There are additional zvol use cases that asynchronously own the dataset,
which can cause similar problems. E.g. changing the volmode or snapdev
properties. These are less problematic because they are not recursive
and don't touch datasets that are not involved in the operation, there
is still potential for interference with subsequent operations. In the
future, these cases should be similarly converted to create the zvol
minor node synchronously from open context.
The async tasks of removing and renaming minors do not own the objset,
so they do not have this problem. However, it may make sense to also
convert these operations to happen synchronously from open context, in
the future.
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-65948
Closes #7863
Closes #9885
2020-02-03 20:33:14 +03:00
|
|
|
if (error == 0) {
|
|
|
|
for (pair = nvlist_next_nvpair(snaps, NULL); pair != NULL;
|
|
|
|
pair = nvlist_next_nvpair(snaps, pair)) {
|
|
|
|
zvol_create_minor(nvpair_name(pair));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
typedef struct dsl_dataset_snapshot_tmp_arg {
|
|
|
|
const char *ddsta_fsname;
|
|
|
|
const char *ddsta_snapname;
|
|
|
|
minor_t ddsta_cleanup_minor;
|
|
|
|
const char *ddsta_htag;
|
|
|
|
} dsl_dataset_snapshot_tmp_arg_t;
|
|
|
|
|
|
|
|
static int
|
|
|
|
dsl_dataset_snapshot_tmp_check(void *arg, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
dsl_dataset_snapshot_tmp_arg_t *ddsta = arg;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
|
|
|
dsl_dataset_t *ds;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
error = dsl_dataset_hold(dp, ddsta->ddsta_fsname, FTAG, &ds);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
2015-04-01 16:07:48 +03:00
|
|
|
/* NULL cred means no limit check for tmp snapshot */
|
2013-07-27 21:51:50 +04:00
|
|
|
error = dsl_dataset_snapshot_check_impl(ds, ddsta->ddsta_snapname,
|
2020-07-12 03:18:02 +03:00
|
|
|
tx, B_FALSE, 0, NULL, NULL);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (spa_version(dp->dp_spa) < SPA_VERSION_USERREFS) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
|
|
|
error = dsl_dataset_user_hold_check_one(NULL, ddsta->ddsta_htag,
|
|
|
|
B_TRUE, tx);
|
|
|
|
if (error != 0) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
dsl_dataset_snapshot_tmp_sync(void *arg, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
dsl_dataset_snapshot_tmp_arg_t *ddsta = arg;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
2016-07-26 22:08:51 +03:00
|
|
|
dsl_dataset_t *ds = NULL;
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
VERIFY0(dsl_dataset_hold(dp, ddsta->ddsta_fsname, FTAG, &ds));
|
|
|
|
|
|
|
|
dsl_dataset_snapshot_sync_impl(ds, ddsta->ddsta_snapname, tx);
|
|
|
|
dsl_dataset_user_hold_sync_one(ds->ds_prev, ddsta->ddsta_htag,
|
|
|
|
ddsta->ddsta_cleanup_minor, gethrestime_sec(), tx);
|
|
|
|
dsl_destroy_snapshot_sync_impl(ds->ds_prev, B_TRUE, tx);
|
|
|
|
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
dsl_dataset_snapshot_tmp(const char *fsname, const char *snapname,
|
|
|
|
minor_t cleanup_minor, const char *htag)
|
|
|
|
{
|
|
|
|
dsl_dataset_snapshot_tmp_arg_t ddsta;
|
|
|
|
int error;
|
|
|
|
spa_t *spa;
|
|
|
|
boolean_t needsuspend;
|
|
|
|
void *cookie;
|
|
|
|
|
|
|
|
ddsta.ddsta_fsname = fsname;
|
|
|
|
ddsta.ddsta_snapname = snapname;
|
|
|
|
ddsta.ddsta_cleanup_minor = cleanup_minor;
|
|
|
|
ddsta.ddsta_htag = htag;
|
|
|
|
|
|
|
|
error = spa_open(fsname, &spa, FTAG);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
needsuspend = (spa_version(spa) < SPA_VERSION_FAST_SNAP);
|
|
|
|
spa_close(spa, FTAG);
|
|
|
|
|
|
|
|
if (needsuspend) {
|
|
|
|
error = zil_suspend(fsname, &cookie);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
error = dsl_sync_task(fsname, dsl_dataset_snapshot_tmp_check,
|
2014-11-03 23:28:43 +03:00
|
|
|
dsl_dataset_snapshot_tmp_sync, &ddsta, 3, ZFS_SPACE_CHECK_RESERVED);
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
if (needsuspend)
|
|
|
|
zil_resume(cookie);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
dsl_dataset_sync(dsl_dataset_t *ds, zio_t *zio, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
ASSERT(dmu_tx_is_syncing(tx));
|
|
|
|
ASSERT(ds->ds_objset != NULL);
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT(dsl_dataset_phys(ds)->ds_next_snap_obj == 0);
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* in case we had to change ds_fsid_guid when we opened it,
|
|
|
|
* sync it out now.
|
|
|
|
*/
|
|
|
|
dmu_buf_will_dirty(ds->ds_dbuf, tx);
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_fsid_guid = ds->ds_fsid_guid;
|
2013-09-04 16:00:57 +04:00
|
|
|
|
2016-01-07 00:22:48 +03:00
|
|
|
if (ds->ds_resume_bytes[tx->tx_txg & TXG_MASK] != 0) {
|
|
|
|
VERIFY0(zap_update(tx->tx_pool->dp_meta_objset,
|
|
|
|
ds->ds_object, DS_FIELD_RESUME_OBJECT, 8, 1,
|
|
|
|
&ds->ds_resume_object[tx->tx_txg & TXG_MASK], tx));
|
|
|
|
VERIFY0(zap_update(tx->tx_pool->dp_meta_objset,
|
|
|
|
ds->ds_object, DS_FIELD_RESUME_OFFSET, 8, 1,
|
|
|
|
&ds->ds_resume_offset[tx->tx_txg & TXG_MASK], tx));
|
|
|
|
VERIFY0(zap_update(tx->tx_pool->dp_meta_objset,
|
|
|
|
ds->ds_object, DS_FIELD_RESUME_BYTES, 8, 1,
|
|
|
|
&ds->ds_resume_bytes[tx->tx_txg & TXG_MASK], tx));
|
|
|
|
ds->ds_resume_object[tx->tx_txg & TXG_MASK] = 0;
|
|
|
|
ds->ds_resume_offset[tx->tx_txg & TXG_MASK] = 0;
|
|
|
|
ds->ds_resume_bytes[tx->tx_txg & TXG_MASK] = 0;
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dmu_objset_sync(ds->ds_objset, zio, tx);
|
|
|
|
}
|
|
|
|
|
2019-07-26 20:54:14 +03:00
|
|
|
/*
|
|
|
|
* Check if the percentage of blocks shared between the clone and the
|
|
|
|
* snapshot (as opposed to those that are clone only) is below a certain
|
|
|
|
* threshold
|
|
|
|
*/
|
2019-08-14 06:16:23 +03:00
|
|
|
static boolean_t
|
2019-07-26 20:54:14 +03:00
|
|
|
dsl_livelist_should_disable(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
uint64_t used, referenced;
|
|
|
|
int percent_shared;
|
|
|
|
|
|
|
|
used = dsl_dir_get_usedds(ds->ds_dir);
|
|
|
|
referenced = dsl_get_referenced(ds);
|
|
|
|
if (referenced == 0)
|
|
|
|
return (B_FALSE);
|
|
|
|
percent_shared = (100 * (referenced - used)) / referenced;
|
|
|
|
if (percent_shared <= zfs_livelist_min_percent_shared)
|
|
|
|
return (B_TRUE);
|
|
|
|
return (B_FALSE);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check if it is possible to combine two livelist entries into one.
|
|
|
|
* This is the case if the combined number of 'live' blkptrs (ALLOCs that
|
|
|
|
* don't have a matching FREE) is under the maximum sublist size.
|
|
|
|
* We check this by subtracting twice the total number of frees from the total
|
|
|
|
* number of blkptrs. FREEs are counted twice because each FREE blkptr
|
|
|
|
* will cancel out an ALLOC blkptr when the livelist is processed.
|
|
|
|
*/
|
|
|
|
static boolean_t
|
|
|
|
dsl_livelist_should_condense(dsl_deadlist_entry_t *first,
|
|
|
|
dsl_deadlist_entry_t *next)
|
|
|
|
{
|
|
|
|
uint64_t total_free = first->dle_bpobj.bpo_phys->bpo_num_freed +
|
|
|
|
next->dle_bpobj.bpo_phys->bpo_num_freed;
|
|
|
|
uint64_t total_entries = first->dle_bpobj.bpo_phys->bpo_num_blkptrs +
|
|
|
|
next->dle_bpobj.bpo_phys->bpo_num_blkptrs;
|
|
|
|
if ((total_entries - (2 * total_free)) < zfs_livelist_max_entries)
|
|
|
|
return (B_TRUE);
|
|
|
|
return (B_FALSE);
|
|
|
|
}
|
|
|
|
|
|
|
|
typedef struct try_condense_arg {
|
|
|
|
spa_t *spa;
|
|
|
|
dsl_dataset_t *ds;
|
|
|
|
} try_condense_arg_t;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Iterate over the livelist entries, searching for a pair to condense.
|
|
|
|
* A nonzero return value means stop, 0 means keep looking.
|
|
|
|
*/
|
2016-11-22 02:09:54 +03:00
|
|
|
static int
|
2019-07-26 20:54:14 +03:00
|
|
|
dsl_livelist_try_condense(void *arg, dsl_deadlist_entry_t *first)
|
2016-11-22 02:09:54 +03:00
|
|
|
{
|
2019-07-26 20:54:14 +03:00
|
|
|
try_condense_arg_t *tca = arg;
|
|
|
|
spa_t *spa = tca->spa;
|
|
|
|
dsl_dataset_t *ds = tca->ds;
|
|
|
|
dsl_deadlist_t *ll = &ds->ds_dir->dd_livelist;
|
|
|
|
dsl_deadlist_entry_t *next;
|
|
|
|
|
|
|
|
/* The condense thread has not yet been created at import */
|
|
|
|
if (spa->spa_livelist_condense_zthr == NULL)
|
|
|
|
return (1);
|
|
|
|
|
|
|
|
/* A condense is already in progress */
|
|
|
|
if (spa->spa_to_condense.ds != NULL)
|
|
|
|
return (1);
|
|
|
|
|
|
|
|
next = AVL_NEXT(&ll->dl_tree, &first->dle_node);
|
|
|
|
/* The livelist has only one entry - don't condense it */
|
|
|
|
if (next == NULL)
|
|
|
|
return (1);
|
|
|
|
|
|
|
|
/* Next is the newest entry - don't condense it */
|
|
|
|
if (AVL_NEXT(&ll->dl_tree, &next->dle_node) == NULL)
|
|
|
|
return (1);
|
|
|
|
|
|
|
|
/* This pair is not ready to condense but keep looking */
|
|
|
|
if (!dsl_livelist_should_condense(first, next))
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Add a ref to prevent the dataset from being evicted while
|
|
|
|
* the condense zthr or synctask are running. Ref will be
|
|
|
|
* released at the end of the condense synctask
|
|
|
|
*/
|
|
|
|
dmu_buf_add_ref(ds->ds_dbuf, spa);
|
|
|
|
|
|
|
|
spa->spa_to_condense.ds = ds;
|
|
|
|
spa->spa_to_condense.first = first;
|
|
|
|
spa->spa_to_condense.next = next;
|
|
|
|
spa->spa_to_condense.syncing = B_FALSE;
|
|
|
|
spa->spa_to_condense.cancelled = B_FALSE;
|
|
|
|
|
|
|
|
zthr_wakeup(spa->spa_livelist_condense_zthr);
|
|
|
|
return (1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
dsl_flush_pending_livelist(dsl_dataset_t *ds, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
dsl_dir_t *dd = ds->ds_dir;
|
|
|
|
spa_t *spa = ds->ds_dir->dd_pool->dp_spa;
|
|
|
|
dsl_deadlist_entry_t *last = dsl_deadlist_last(&dd->dd_livelist);
|
|
|
|
|
|
|
|
/* Check if we need to add a new sub-livelist */
|
|
|
|
if (last == NULL) {
|
|
|
|
/* The livelist is empty */
|
|
|
|
dsl_deadlist_add_key(&dd->dd_livelist,
|
|
|
|
tx->tx_txg - 1, tx);
|
|
|
|
} else if (spa_sync_pass(spa) == 1) {
|
|
|
|
/*
|
|
|
|
* Check if the newest entry is full. If it is, make a new one.
|
|
|
|
* We only do this once per sync because we could overfill a
|
|
|
|
* sublist in one sync pass and don't want to add another entry
|
|
|
|
* for a txg that is already represented. This ensures that
|
|
|
|
* blkptrs born in the same txg are stored in the same sublist.
|
|
|
|
*/
|
|
|
|
bpobj_t bpobj = last->dle_bpobj;
|
|
|
|
uint64_t all = bpobj.bpo_phys->bpo_num_blkptrs;
|
|
|
|
uint64_t free = bpobj.bpo_phys->bpo_num_freed;
|
|
|
|
uint64_t alloc = all - free;
|
|
|
|
if (alloc > zfs_livelist_max_entries) {
|
|
|
|
dsl_deadlist_add_key(&dd->dd_livelist,
|
|
|
|
tx->tx_txg - 1, tx);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Insert each entry into the on-disk livelist */
|
|
|
|
bplist_iterate(&dd->dd_pending_allocs,
|
|
|
|
dsl_deadlist_insert_alloc_cb, &dd->dd_livelist, tx);
|
|
|
|
bplist_iterate(&dd->dd_pending_frees,
|
|
|
|
dsl_deadlist_insert_free_cb, &dd->dd_livelist, tx);
|
|
|
|
|
|
|
|
/* Attempt to condense every pair of adjacent entries */
|
|
|
|
try_condense_arg_t arg = {
|
|
|
|
.spa = spa,
|
|
|
|
.ds = ds
|
|
|
|
};
|
|
|
|
dsl_deadlist_iterate(&dd->dd_livelist, dsl_livelist_try_condense,
|
|
|
|
&arg);
|
2016-11-22 02:09:54 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
dsl_dataset_sync_done(dsl_dataset_t *ds, dmu_tx_t *tx)
|
|
|
|
{
|
2017-03-21 04:36:00 +03:00
|
|
|
objset_t *os = ds->ds_objset;
|
2016-11-22 02:09:54 +03:00
|
|
|
|
|
|
|
bplist_iterate(&ds->ds_pending_deadlist,
|
2019-07-26 20:54:14 +03:00
|
|
|
dsl_deadlist_insert_alloc_cb, &ds->ds_deadlist, tx);
|
|
|
|
|
|
|
|
if (dsl_deadlist_is_open(&ds->ds_dir->dd_livelist)) {
|
|
|
|
dsl_flush_pending_livelist(ds, tx);
|
|
|
|
if (dsl_livelist_should_disable(ds)) {
|
|
|
|
dsl_dir_remove_livelist(ds->ds_dir, tx, B_TRUE);
|
|
|
|
}
|
|
|
|
}
|
2016-11-22 02:09:54 +03:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dsl_bookmark_sync_done(ds, tx);
|
|
|
|
|
2021-06-10 19:42:31 +03:00
|
|
|
multilist_destroy(&os->os_synced_dnodes);
|
2017-03-21 04:36:00 +03:00
|
|
|
|
2018-10-03 19:47:11 +03:00
|
|
|
if (os->os_encrypted)
|
|
|
|
os->os_next_write_raw[tx->tx_txg & TXG_MASK] = B_FALSE;
|
|
|
|
else
|
|
|
|
ASSERT0(os->os_next_write_raw[tx->tx_txg & TXG_MASK]);
|
|
|
|
|
2023-02-14 03:37:46 +03:00
|
|
|
for (spa_feature_t f = 0; f < SPA_FEATURES; f++) {
|
|
|
|
if (zfeature_active(f,
|
|
|
|
ds->ds_feature_activation[f])) {
|
|
|
|
if (zfeature_active(f, ds->ds_feature[f]))
|
|
|
|
continue;
|
|
|
|
dsl_dataset_activate_feature(ds->ds_object, f,
|
|
|
|
ds->ds_feature_activation[f], tx);
|
|
|
|
ds->ds_feature[f] = ds->ds_feature_activation[f];
|
|
|
|
}
|
|
|
|
}
|
2023-02-24 04:14:52 +03:00
|
|
|
|
|
|
|
ASSERT(!dmu_objset_is_dirty(os, dmu_tx_get_txg(tx)));
|
2016-11-22 02:09:54 +03:00
|
|
|
}
|
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
int
|
|
|
|
get_clones_stat_impl(dsl_dataset_t *ds, nvlist_t *val)
|
2013-09-04 16:00:57 +04:00
|
|
|
{
|
|
|
|
uint64_t count = 0;
|
|
|
|
objset_t *mos = ds->ds_dir->dd_pool->dp_meta_objset;
|
|
|
|
zap_cursor_t zc;
|
|
|
|
zap_attribute_t za;
|
|
|
|
|
|
|
|
ASSERT(dsl_pool_config_held(ds->ds_dir->dd_pool));
|
2011-11-17 22:14:36 +04:00
|
|
|
|
|
|
|
/*
|
2013-09-04 16:00:57 +04:00
|
|
|
* There may be missing entries in ds_next_clones_obj
|
2011-11-17 22:14:36 +04:00
|
|
|
* due to a bug in a previous version of the code.
|
|
|
|
* Only trust it if it has the right number of entries.
|
|
|
|
*/
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(ds)->ds_next_clones_obj != 0) {
|
|
|
|
VERIFY0(zap_count(mos, dsl_dataset_phys(ds)->ds_next_clones_obj,
|
2011-11-17 22:14:36 +04:00
|
|
|
&count));
|
|
|
|
}
|
2018-02-08 19:16:23 +03:00
|
|
|
if (count != dsl_dataset_phys(ds)->ds_num_children - 1) {
|
2020-02-27 03:09:17 +03:00
|
|
|
return (SET_ERROR(ENOENT));
|
2018-02-08 19:16:23 +03:00
|
|
|
}
|
2015-04-01 18:14:34 +03:00
|
|
|
for (zap_cursor_init(&zc, mos,
|
|
|
|
dsl_dataset_phys(ds)->ds_next_clones_obj);
|
2011-11-17 22:14:36 +04:00
|
|
|
zap_cursor_retrieve(&zc, &za) == 0;
|
|
|
|
zap_cursor_advance(&zc)) {
|
|
|
|
dsl_dataset_t *clone;
|
2016-06-16 00:28:36 +03:00
|
|
|
char buf[ZFS_MAX_DATASET_NAME_LEN];
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_dataset_hold_obj(ds->ds_dir->dd_pool,
|
|
|
|
za.za_first_integer, FTAG, &clone));
|
2011-11-17 22:14:36 +04:00
|
|
|
dsl_dir_name(clone->ds_dir, buf);
|
2013-09-04 16:00:57 +04:00
|
|
|
fnvlist_add_boolean(val, buf);
|
2011-11-17 22:14:36 +04:00
|
|
|
dsl_dataset_rele(clone, FTAG);
|
|
|
|
}
|
|
|
|
zap_cursor_fini(&zc);
|
2018-02-08 19:16:23 +03:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
get_clones_stat(dsl_dataset_t *ds, nvlist_t *nv)
|
|
|
|
{
|
|
|
|
nvlist_t *propval = fnvlist_alloc();
|
2021-01-28 08:31:51 +03:00
|
|
|
nvlist_t *val = fnvlist_alloc();
|
2018-02-08 19:16:23 +03:00
|
|
|
|
|
|
|
if (get_clones_stat_impl(ds, val) == 0) {
|
|
|
|
fnvlist_add_nvlist(propval, ZPROP_VALUE, val);
|
|
|
|
fnvlist_add_nvlist(nv, zfs_prop_to_name(ZFS_PROP_CLONES),
|
|
|
|
propval);
|
|
|
|
}
|
|
|
|
|
2011-11-17 22:14:36 +04:00
|
|
|
nvlist_free(val);
|
|
|
|
nvlist_free(propval);
|
|
|
|
}
|
|
|
|
|
2021-11-09 01:35:05 +03:00
|
|
|
static char *
|
|
|
|
get_receive_resume_token_impl(dsl_dataset_t *ds)
|
2016-01-07 00:22:48 +03:00
|
|
|
{
|
2021-11-09 01:35:05 +03:00
|
|
|
if (!dsl_dataset_has_resume_receive_state(ds))
|
|
|
|
return (NULL);
|
|
|
|
|
2016-01-07 00:22:48 +03:00
|
|
|
dsl_pool_t *dp = ds->ds_dir->dd_pool;
|
2021-11-09 01:35:05 +03:00
|
|
|
char *str;
|
|
|
|
void *packed;
|
|
|
|
uint8_t *compressed;
|
|
|
|
uint64_t val;
|
|
|
|
nvlist_t *token_nv = fnvlist_alloc();
|
|
|
|
size_t packed_size, compressed_size;
|
|
|
|
|
|
|
|
if (zap_lookup(dp->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_RESUME_FROMGUID, sizeof (val), 1, &val) == 0) {
|
|
|
|
fnvlist_add_uint64(token_nv, "fromguid", val);
|
|
|
|
}
|
|
|
|
if (zap_lookup(dp->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_RESUME_OBJECT, sizeof (val), 1, &val) == 0) {
|
|
|
|
fnvlist_add_uint64(token_nv, "object", val);
|
|
|
|
}
|
|
|
|
if (zap_lookup(dp->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_RESUME_OFFSET, sizeof (val), 1, &val) == 0) {
|
|
|
|
fnvlist_add_uint64(token_nv, "offset", val);
|
|
|
|
}
|
|
|
|
if (zap_lookup(dp->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_RESUME_BYTES, sizeof (val), 1, &val) == 0) {
|
|
|
|
fnvlist_add_uint64(token_nv, "bytes", val);
|
|
|
|
}
|
|
|
|
if (zap_lookup(dp->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_RESUME_TOGUID, sizeof (val), 1, &val) == 0) {
|
|
|
|
fnvlist_add_uint64(token_nv, "toguid", val);
|
|
|
|
}
|
|
|
|
char buf[MAXNAMELEN];
|
|
|
|
if (zap_lookup(dp->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_RESUME_TONAME, 1, sizeof (buf), buf) == 0) {
|
|
|
|
fnvlist_add_string(token_nv, "toname", buf);
|
|
|
|
}
|
|
|
|
if (zap_contains(dp->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_RESUME_LARGEBLOCK) == 0) {
|
|
|
|
fnvlist_add_boolean(token_nv, "largeblockok");
|
|
|
|
}
|
|
|
|
if (zap_contains(dp->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_RESUME_EMBEDOK) == 0) {
|
|
|
|
fnvlist_add_boolean(token_nv, "embedok");
|
|
|
|
}
|
|
|
|
if (zap_contains(dp->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_RESUME_COMPRESSOK) == 0) {
|
|
|
|
fnvlist_add_boolean(token_nv, "compressok");
|
|
|
|
}
|
|
|
|
if (zap_contains(dp->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_RESUME_RAWOK) == 0) {
|
|
|
|
fnvlist_add_boolean(token_nv, "rawok");
|
|
|
|
}
|
|
|
|
if (dsl_dataset_feature_is_active(ds,
|
|
|
|
SPA_FEATURE_REDACTED_DATASETS)) {
|
2022-02-14 21:04:50 +03:00
|
|
|
uint64_t num_redact_snaps = 0;
|
|
|
|
uint64_t *redact_snaps = NULL;
|
|
|
|
VERIFY3B(dsl_dataset_get_uint64_array_feature(ds,
|
2021-11-09 01:35:05 +03:00
|
|
|
SPA_FEATURE_REDACTED_DATASETS, &num_redact_snaps,
|
2022-02-14 21:04:50 +03:00
|
|
|
&redact_snaps), ==, B_TRUE);
|
2021-11-09 01:35:05 +03:00
|
|
|
fnvlist_add_uint64_array(token_nv, "redact_snaps",
|
|
|
|
redact_snaps, num_redact_snaps);
|
|
|
|
}
|
|
|
|
if (zap_contains(dp->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_RESUME_REDACT_BOOKMARK_SNAPS) == 0) {
|
2022-02-14 21:04:50 +03:00
|
|
|
uint64_t num_redact_snaps = 0, int_size = 0;
|
|
|
|
uint64_t *redact_snaps = NULL;
|
2021-11-09 01:35:05 +03:00
|
|
|
VERIFY0(zap_length(dp->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_RESUME_REDACT_BOOKMARK_SNAPS, &int_size,
|
|
|
|
&num_redact_snaps));
|
|
|
|
ASSERT3U(int_size, ==, sizeof (uint64_t));
|
2016-01-07 00:22:48 +03:00
|
|
|
|
2021-11-09 01:35:05 +03:00
|
|
|
redact_snaps = kmem_alloc(int_size * num_redact_snaps,
|
|
|
|
KM_SLEEP);
|
|
|
|
VERIFY0(zap_lookup(dp->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_RESUME_REDACT_BOOKMARK_SNAPS, int_size,
|
|
|
|
num_redact_snaps, redact_snaps));
|
|
|
|
fnvlist_add_uint64_array(token_nv, "book_redact_snaps",
|
|
|
|
redact_snaps, num_redact_snaps);
|
|
|
|
kmem_free(redact_snaps, int_size * num_redact_snaps);
|
|
|
|
}
|
|
|
|
packed = fnvlist_pack(token_nv, &packed_size);
|
|
|
|
fnvlist_free(token_nv);
|
|
|
|
compressed = kmem_alloc(packed_size, KM_SLEEP);
|
|
|
|
|
|
|
|
compressed_size = gzip_compress(packed, compressed,
|
|
|
|
packed_size, packed_size, 6);
|
|
|
|
|
|
|
|
zio_cksum_t cksum;
|
|
|
|
fletcher_4_native_varsize(compressed, compressed_size, &cksum);
|
|
|
|
|
|
|
|
size_t alloc_size = compressed_size * 2 + 1;
|
|
|
|
str = kmem_alloc(alloc_size, KM_SLEEP);
|
|
|
|
for (int i = 0; i < compressed_size; i++) {
|
|
|
|
size_t offset = i * 2;
|
|
|
|
(void) snprintf(str + offset, alloc_size - offset,
|
|
|
|
"%02x", compressed[i]);
|
|
|
|
}
|
|
|
|
str[compressed_size * 2] = '\0';
|
|
|
|
char *propval = kmem_asprintf("%u-%llx-%llx-%s",
|
|
|
|
ZFS_SEND_RESUME_TOKEN_VERSION,
|
|
|
|
(longlong_t)cksum.zc_word[0],
|
|
|
|
(longlong_t)packed_size, str);
|
|
|
|
kmem_free(packed, packed_size);
|
|
|
|
kmem_free(str, alloc_size);
|
|
|
|
kmem_free(compressed, packed_size);
|
|
|
|
return (propval);
|
2018-02-08 19:16:23 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2021-11-09 01:35:05 +03:00
|
|
|
* Returns a string that represents the receive resume state token. It should
|
|
|
|
* be freed with strfree(). NULL is returned if no resume state is present.
|
2018-02-08 19:16:23 +03:00
|
|
|
*/
|
|
|
|
char *
|
2021-11-09 01:35:05 +03:00
|
|
|
get_receive_resume_token(dsl_dataset_t *ds)
|
2018-02-08 19:16:23 +03:00
|
|
|
{
|
2021-11-09 01:35:05 +03:00
|
|
|
/*
|
|
|
|
* A failed "newfs" (e.g. full) resumable receive leaves
|
|
|
|
* the stats set on this dataset. Check here for the prop.
|
|
|
|
*/
|
|
|
|
char *token = get_receive_resume_token_impl(ds);
|
|
|
|
if (token != NULL)
|
|
|
|
return (token);
|
|
|
|
/*
|
|
|
|
* A failed incremental resumable receive leaves the
|
|
|
|
* stats set on our child named "%recv". Check the child
|
|
|
|
* for the prop.
|
|
|
|
*/
|
|
|
|
/* 6 extra bytes for /%recv */
|
|
|
|
char name[ZFS_MAX_DATASET_NAME_LEN + 6];
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_dataset_t *recv_ds;
|
2021-11-09 01:35:05 +03:00
|
|
|
dsl_dataset_name(ds, name);
|
|
|
|
if (strlcat(name, "/", sizeof (name)) < sizeof (name) &&
|
|
|
|
strlcat(name, recv_clone_name, sizeof (name)) < sizeof (name) &&
|
|
|
|
dsl_dataset_hold(ds->ds_dir->dd_pool, name, FTAG, &recv_ds) == 0) {
|
|
|
|
token = get_receive_resume_token_impl(recv_ds);
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_dataset_rele(recv_ds, FTAG);
|
2016-01-07 00:22:48 +03:00
|
|
|
}
|
2021-11-09 01:35:05 +03:00
|
|
|
return (token);
|
2018-02-08 19:16:23 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_refratio(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
uint64_t ratio = dsl_dataset_phys(ds)->ds_compressed_bytes == 0 ? 100 :
|
|
|
|
(dsl_dataset_phys(ds)->ds_uncompressed_bytes * 100 /
|
|
|
|
dsl_dataset_phys(ds)->ds_compressed_bytes);
|
|
|
|
return (ratio);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_logicalreferenced(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (dsl_dataset_phys(ds)->ds_uncompressed_bytes);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_compressratio(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
if (ds->ds_is_snapshot) {
|
|
|
|
return (dsl_get_refratio(ds));
|
|
|
|
} else {
|
|
|
|
dsl_dir_t *dd = ds->ds_dir;
|
|
|
|
mutex_enter(&dd->dd_lock);
|
|
|
|
uint64_t val = dsl_dir_get_compressratio(dd);
|
|
|
|
mutex_exit(&dd->dd_lock);
|
|
|
|
return (val);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_used(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
if (ds->ds_is_snapshot) {
|
|
|
|
return (dsl_dataset_phys(ds)->ds_unique_bytes);
|
|
|
|
} else {
|
|
|
|
dsl_dir_t *dd = ds->ds_dir;
|
|
|
|
mutex_enter(&dd->dd_lock);
|
|
|
|
uint64_t val = dsl_dir_get_used(dd);
|
|
|
|
mutex_exit(&dd->dd_lock);
|
|
|
|
return (val);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_creation(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (dsl_dataset_phys(ds)->ds_creation_time);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_creationtxg(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (dsl_dataset_phys(ds)->ds_creation_txg);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_refquota(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (ds->ds_quota);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_refreservation(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (ds->ds_reserved);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_guid(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (dsl_dataset_phys(ds)->ds_guid);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_unique(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (dsl_dataset_phys(ds)->ds_unique_bytes);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_objsetid(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (ds->ds_object);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_userrefs(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (ds->ds_userrefs);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_defer_destroy(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (DS_IS_DEFER_DESTROY(ds) ? 1 : 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_referenced(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (dsl_dataset_phys(ds)->ds_referenced_bytes);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_numclones(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
ASSERT(ds->ds_is_snapshot);
|
|
|
|
return (dsl_dataset_phys(ds)->ds_num_children - 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_get_inconsistent(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return ((dsl_dataset_phys(ds)->ds_flags & DS_FLAG_INCONSISTENT) ?
|
|
|
|
1 : 0);
|
|
|
|
}
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
uint64_t
|
|
|
|
dsl_get_redacted(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (dsl_dataset_feature_is_active(ds,
|
|
|
|
SPA_FEATURE_REDACTED_DATASETS));
|
|
|
|
}
|
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
uint64_t
|
|
|
|
dsl_get_available(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
uint64_t refdbytes = dsl_get_referenced(ds);
|
|
|
|
uint64_t availbytes = dsl_dir_space_available(ds->ds_dir,
|
|
|
|
NULL, 0, TRUE);
|
|
|
|
if (ds->ds_reserved > dsl_dataset_phys(ds)->ds_unique_bytes) {
|
|
|
|
availbytes +=
|
|
|
|
ds->ds_reserved - dsl_dataset_phys(ds)->ds_unique_bytes;
|
|
|
|
}
|
|
|
|
if (ds->ds_quota != 0) {
|
|
|
|
/*
|
|
|
|
* Adjust available bytes according to refquota
|
|
|
|
*/
|
|
|
|
if (refdbytes < ds->ds_quota) {
|
|
|
|
availbytes = MIN(availbytes,
|
|
|
|
ds->ds_quota - refdbytes);
|
|
|
|
} else {
|
|
|
|
availbytes = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return (availbytes);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
dsl_get_written(dsl_dataset_t *ds, uint64_t *written)
|
|
|
|
{
|
|
|
|
dsl_pool_t *dp = ds->ds_dir->dd_pool;
|
|
|
|
dsl_dataset_t *prev;
|
|
|
|
int err = dsl_dataset_hold_obj(dp,
|
|
|
|
dsl_dataset_phys(ds)->ds_prev_snap_obj, FTAG, &prev);
|
|
|
|
if (err == 0) {
|
|
|
|
uint64_t comp, uncomp;
|
|
|
|
err = dsl_dataset_space_written(prev, ds, written,
|
|
|
|
&comp, &uncomp);
|
|
|
|
dsl_dataset_rele(prev, FTAG);
|
|
|
|
}
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* 'snap' should be a buffer of size ZFS_MAX_DATASET_NAME_LEN.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
dsl_get_prev_snap(dsl_dataset_t *ds, char *snap)
|
|
|
|
{
|
|
|
|
dsl_pool_t *dp = ds->ds_dir->dd_pool;
|
|
|
|
if (ds->ds_prev != NULL && ds->ds_prev != dp->dp_origin_snap) {
|
|
|
|
dsl_dataset_name(ds->ds_prev, snap);
|
|
|
|
return (0);
|
|
|
|
} else {
|
2020-02-27 03:09:17 +03:00
|
|
|
return (SET_ERROR(ENOENT));
|
2018-02-08 19:16:23 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
void
|
|
|
|
dsl_get_redact_snaps(dsl_dataset_t *ds, nvlist_t *propval)
|
|
|
|
{
|
|
|
|
uint64_t nsnaps;
|
|
|
|
uint64_t *snaps;
|
|
|
|
if (dsl_dataset_get_uint64_array_feature(ds,
|
|
|
|
SPA_FEATURE_REDACTED_DATASETS, &nsnaps, &snaps)) {
|
|
|
|
fnvlist_add_uint64_array(propval, ZPROP_VALUE, snaps,
|
|
|
|
nsnaps);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
/*
|
|
|
|
* Returns the mountpoint property and source for the given dataset in the value
|
|
|
|
* and source buffers. The value buffer must be at least as large as MAXPATHLEN
|
|
|
|
* and the source buffer as least as large a ZFS_MAX_DATASET_NAME_LEN.
|
|
|
|
* Returns 0 on success and an error on failure.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
dsl_get_mountpoint(dsl_dataset_t *ds, const char *dsname, char *value,
|
|
|
|
char *source)
|
|
|
|
{
|
|
|
|
int error;
|
|
|
|
dsl_pool_t *dp = ds->ds_dir->dd_pool;
|
|
|
|
|
2019-09-03 03:56:41 +03:00
|
|
|
/* Retrieve the mountpoint value stored in the zap object */
|
2018-02-08 19:16:23 +03:00
|
|
|
error = dsl_prop_get_ds(ds, zfs_prop_to_name(ZFS_PROP_MOUNTPOINT), 1,
|
|
|
|
ZAP_MAXVALUELEN, value, source);
|
|
|
|
if (error != 0) {
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Process the dsname and source to find the full mountpoint string.
|
|
|
|
* Can be skipped for 'legacy' or 'none'.
|
|
|
|
*/
|
|
|
|
if (value[0] == '/') {
|
|
|
|
char *buf = kmem_alloc(ZAP_MAXVALUELEN, KM_SLEEP);
|
|
|
|
char *root = buf;
|
|
|
|
const char *relpath;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we inherit the mountpoint, even from a dataset
|
|
|
|
* with a received value, the source will be the path of
|
|
|
|
* the dataset we inherit from. If source is
|
|
|
|
* ZPROP_SOURCE_VAL_RECVD, the received value is not
|
|
|
|
* inherited.
|
|
|
|
*/
|
|
|
|
if (strcmp(source, ZPROP_SOURCE_VAL_RECVD) == 0) {
|
|
|
|
relpath = "";
|
|
|
|
} else {
|
|
|
|
ASSERT0(strncmp(dsname, source, strlen(source)));
|
|
|
|
relpath = dsname + strlen(source);
|
|
|
|
if (relpath[0] == '/')
|
|
|
|
relpath++;
|
|
|
|
}
|
|
|
|
|
|
|
|
spa_altroot(dp->dp_spa, root, ZAP_MAXVALUELEN);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Special case an alternate root of '/'. This will
|
|
|
|
* avoid having multiple leading slashes in the
|
|
|
|
* mountpoint path.
|
|
|
|
*/
|
|
|
|
if (strcmp(root, "/") == 0)
|
|
|
|
root++;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the mountpoint is '/' then skip over this
|
|
|
|
* if we are obtaining either an alternate root or
|
|
|
|
* an inherited mountpoint.
|
|
|
|
*/
|
|
|
|
char *mnt = value;
|
|
|
|
if (value[1] == '\0' && (root[0] != '\0' ||
|
|
|
|
relpath[0] != '\0'))
|
|
|
|
mnt = value + 1;
|
|
|
|
|
Fix unsafe string operations
Coverity caught unsafe use of `strcpy()` in `ztest_dmu_objset_own()`,
`nfs_init_tmpfile()` and `dump_snapshot()`. It also caught an unsafe use
of `strlcat()` in `nfs_init_tmpfile()`.
Inspired by this, I did an audit of every single usage of `strcpy()` and
`strcat()` in the code. If I could not prove that the usage was safe, I
changed the code to use either `strlcpy()` or `strlcat()`, depending on
which function was originally used. In some cases, `snprintf()` was used
to replace multiple uses of `strcat` because it was cleaner.
Whenever I changed a function, I preferred to use `sizeof(dst)` when the
compiler is able to provide the string size via that. When it could not
because the string was passed by a caller, I checked the entire call
tree of the function to find out how big the buffer was and hard coded
it. Hardcoding is less than ideal, but it is safe unless someone shrinks
the buffer sizes being passed.
Additionally, Coverity reported three more string related issues:
* It caught a case where we do an overlapping memory copy in a call to
`snprintf()`. We fix that via `kmem_strdup()` and `kmem_strfree()`.
* It caught `sizeof (buf)` being used instead of `buflen` in
`zdb_nicenum()`'s call to `zfs_nicenum()`, which is passed to
`snprintf()`. We change that to pass `buflen`.
* It caught a theoretical unterminated string passed to `strcmp()`.
This one is likely a false positive, but we have the information
needed to do this more safely, so we change this to silence the false
positive not just in coverity, but potentially other static analysis
tools too. We switch to `strncmp()`.
* There was a false positive in tests/zfs-tests/cmd/dir_rd_update.c. We
suppress it by switching to `snprintf()` since other static analysis
tools might complain about it too. Interestingly, there is a possible
real bug there too, since it assumes that the passed directory path
ends with '/'. We add a '/' to fix that potential bug.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13913
2022-09-28 02:47:24 +03:00
|
|
|
mnt = kmem_strdup(mnt);
|
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
if (relpath[0] == '\0') {
|
|
|
|
(void) snprintf(value, ZAP_MAXVALUELEN, "%s%s",
|
|
|
|
root, mnt);
|
|
|
|
} else {
|
|
|
|
(void) snprintf(value, ZAP_MAXVALUELEN, "%s%s%s%s",
|
|
|
|
root, mnt, relpath[0] == '@' ? "" : "/",
|
|
|
|
relpath);
|
|
|
|
}
|
|
|
|
kmem_free(buf, ZAP_MAXVALUELEN);
|
Fix unsafe string operations
Coverity caught unsafe use of `strcpy()` in `ztest_dmu_objset_own()`,
`nfs_init_tmpfile()` and `dump_snapshot()`. It also caught an unsafe use
of `strlcat()` in `nfs_init_tmpfile()`.
Inspired by this, I did an audit of every single usage of `strcpy()` and
`strcat()` in the code. If I could not prove that the usage was safe, I
changed the code to use either `strlcpy()` or `strlcat()`, depending on
which function was originally used. In some cases, `snprintf()` was used
to replace multiple uses of `strcat` because it was cleaner.
Whenever I changed a function, I preferred to use `sizeof(dst)` when the
compiler is able to provide the string size via that. When it could not
because the string was passed by a caller, I checked the entire call
tree of the function to find out how big the buffer was and hard coded
it. Hardcoding is less than ideal, but it is safe unless someone shrinks
the buffer sizes being passed.
Additionally, Coverity reported three more string related issues:
* It caught a case where we do an overlapping memory copy in a call to
`snprintf()`. We fix that via `kmem_strdup()` and `kmem_strfree()`.
* It caught `sizeof (buf)` being used instead of `buflen` in
`zdb_nicenum()`'s call to `zfs_nicenum()`, which is passed to
`snprintf()`. We change that to pass `buflen`.
* It caught a theoretical unterminated string passed to `strcmp()`.
This one is likely a false positive, but we have the information
needed to do this more safely, so we change this to silence the false
positive not just in coverity, but potentially other static analysis
tools too. We switch to `strncmp()`.
* There was a false positive in tests/zfs-tests/cmd/dir_rd_update.c. We
suppress it by switching to `snprintf()` since other static analysis
tools might complain about it too. Interestingly, there is a possible
real bug there too, since it assumes that the passed directory path
ends with '/'. We add a '/' to fix that potential bug.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13913
2022-09-28 02:47:24 +03:00
|
|
|
kmem_strfree(mnt);
|
2018-02-08 19:16:23 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
return (0);
|
2016-01-07 00:22:48 +03:00
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
void
|
|
|
|
dsl_dataset_stats(dsl_dataset_t *ds, nvlist_t *nv)
|
|
|
|
{
|
2021-11-09 01:35:05 +03:00
|
|
|
dsl_pool_t *dp __maybe_unused = ds->ds_dir->dd_pool;
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
ASSERT(dsl_pool_config_held(dp));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_REFRATIO,
|
|
|
|
dsl_get_refratio(ds));
|
2013-02-22 13:23:09 +04:00
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_LOGICALREFERENCED,
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_get_logicalreferenced(ds));
|
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_COMPRESSRATIO,
|
|
|
|
dsl_get_compressratio(ds));
|
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_USED,
|
|
|
|
dsl_get_used(ds));
|
2013-08-28 15:45:09 +04:00
|
|
|
|
2015-04-02 06:44:32 +03:00
|
|
|
if (ds->ds_is_snapshot) {
|
2013-08-28 15:45:09 +04:00
|
|
|
get_clones_stat(ds, nv);
|
|
|
|
} else {
|
2018-02-08 19:16:23 +03:00
|
|
|
char buf[ZFS_MAX_DATASET_NAME_LEN];
|
|
|
|
if (dsl_get_prev_snap(ds, buf) == 0)
|
|
|
|
dsl_prop_nvlist_add_string(nv, ZFS_PROP_PREV_SNAP,
|
|
|
|
buf);
|
2013-08-28 15:45:09 +04:00
|
|
|
dsl_dir_stats(ds->ds_dir, nv);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
nvlist_t *propval = fnvlist_alloc();
|
|
|
|
dsl_get_redact_snaps(ds, propval);
|
|
|
|
fnvlist_add_nvlist(nv, zfs_prop_to_name(ZFS_PROP_REDACT_SNAPS),
|
|
|
|
propval);
|
|
|
|
nvlist_free(propval);
|
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_AVAILABLE,
|
|
|
|
dsl_get_available(ds));
|
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_REFERENCED,
|
|
|
|
dsl_get_referenced(ds));
|
2008-11-20 23:01:55 +03:00
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_CREATION,
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_get_creation(ds));
|
2008-11-20 23:01:55 +03:00
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_CREATETXG,
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_get_creationtxg(ds));
|
2008-11-20 23:01:55 +03:00
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_REFQUOTA,
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_get_refquota(ds));
|
2008-11-20 23:01:55 +03:00
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_REFRESERVATION,
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_get_refreservation(ds));
|
2008-12-03 23:09:06 +03:00
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_GUID,
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_get_guid(ds));
|
2010-05-29 00:45:14 +04:00
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_UNIQUE,
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_get_unique(ds));
|
2010-05-29 00:45:14 +04:00
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_OBJSETID,
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_get_objsetid(ds));
|
2010-05-29 00:45:14 +04:00
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_USERREFS,
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_get_userrefs(ds));
|
2009-08-18 22:43:27 +04:00
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_DEFER_DESTROY,
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_get_defer_destroy(ds));
|
2022-08-03 02:45:30 +03:00
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_SNAPSHOTS_CHANGED,
|
|
|
|
dsl_dir_snap_cmtime(ds->ds_dir).tv_sec);
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dsl_dataset_crypt_stats(ds, nv);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(ds)->ds_prev_snap_obj != 0) {
|
2018-02-08 19:16:23 +03:00
|
|
|
uint64_t written;
|
|
|
|
if (dsl_get_written(ds, &written) == 0) {
|
|
|
|
dsl_prop_nvlist_add_uint64(nv, ZFS_PROP_WRITTEN,
|
|
|
|
written);
|
2011-11-17 22:14:36 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-01-07 00:22:48 +03:00
|
|
|
if (!dsl_dataset_is_snapshot(ds)) {
|
2021-11-09 01:35:05 +03:00
|
|
|
char *token = get_receive_resume_token(ds);
|
|
|
|
if (token != NULL) {
|
|
|
|
dsl_prop_nvlist_add_string(nv,
|
|
|
|
ZFS_PROP_RECEIVE_RESUME_TOKEN, token);
|
|
|
|
kmem_strfree(token);
|
2016-01-07 00:22:48 +03:00
|
|
|
}
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
dsl_dataset_fast_stat(dsl_dataset_t *ds, dmu_objset_stats_t *stat)
|
|
|
|
{
|
2019-12-05 23:37:00 +03:00
|
|
|
dsl_pool_t *dp __maybe_unused = ds->ds_dir->dd_pool;
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT(dsl_pool_config_held(dp));
|
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
stat->dds_creation_txg = dsl_get_creationtxg(ds);
|
|
|
|
stat->dds_inconsistent = dsl_get_inconsistent(ds);
|
|
|
|
stat->dds_guid = dsl_get_guid(ds);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
stat->dds_redacted = dsl_get_redacted(ds);
|
2013-08-28 15:45:09 +04:00
|
|
|
stat->dds_origin[0] = '\0';
|
2015-04-02 06:44:32 +03:00
|
|
|
if (ds->ds_is_snapshot) {
|
2008-11-20 23:01:55 +03:00
|
|
|
stat->dds_is_snapshot = B_TRUE;
|
2018-02-08 19:16:23 +03:00
|
|
|
stat->dds_num_clones = dsl_get_numclones(ds);
|
2009-01-16 00:59:39 +03:00
|
|
|
} else {
|
|
|
|
stat->dds_is_snapshot = B_FALSE;
|
|
|
|
stat->dds_num_clones = 0;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-08-28 15:45:09 +04:00
|
|
|
if (dsl_dir_is_clone(ds->ds_dir)) {
|
2018-02-08 19:16:23 +03:00
|
|
|
dsl_dir_get_origin(ds->ds_dir, stat->dds_origin);
|
2013-08-28 15:45:09 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
dsl_dataset_fsid_guid(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (ds->ds_fsid_guid);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
dsl_dataset_space(dsl_dataset_t *ds,
|
|
|
|
uint64_t *refdbytesp, uint64_t *availbytesp,
|
|
|
|
uint64_t *usedobjsp, uint64_t *availobjsp)
|
|
|
|
{
|
2015-04-01 18:14:34 +03:00
|
|
|
*refdbytesp = dsl_dataset_phys(ds)->ds_referenced_bytes;
|
2008-11-20 23:01:55 +03:00
|
|
|
*availbytesp = dsl_dir_space_available(ds->ds_dir, NULL, 0, TRUE);
|
2015-04-01 18:14:34 +03:00
|
|
|
if (ds->ds_reserved > dsl_dataset_phys(ds)->ds_unique_bytes)
|
|
|
|
*availbytesp +=
|
|
|
|
ds->ds_reserved - dsl_dataset_phys(ds)->ds_unique_bytes;
|
2008-11-20 23:01:55 +03:00
|
|
|
if (ds->ds_quota != 0) {
|
|
|
|
/*
|
|
|
|
* Adjust available bytes according to refquota
|
|
|
|
*/
|
|
|
|
if (*refdbytesp < ds->ds_quota)
|
|
|
|
*availbytesp = MIN(*availbytesp,
|
|
|
|
ds->ds_quota - *refdbytesp);
|
|
|
|
else
|
|
|
|
*availbytesp = 0;
|
|
|
|
}
|
2017-01-27 22:43:42 +03:00
|
|
|
rrw_enter(&ds->ds_bp_rwlock, RW_READER, FTAG);
|
2015-04-01 18:14:34 +03:00
|
|
|
*usedobjsp = BP_GET_FILL(&dsl_dataset_phys(ds)->ds_bp);
|
2017-01-27 22:43:42 +03:00
|
|
|
rrw_exit(&ds->ds_bp_rwlock, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
*availobjsp = DN_MAX_OBJECT - *usedobjsp;
|
|
|
|
}
|
|
|
|
|
|
|
|
boolean_t
|
2013-07-29 22:55:16 +04:00
|
|
|
dsl_dataset_modified_since_snap(dsl_dataset_t *ds, dsl_dataset_t *snap)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2019-12-05 23:37:00 +03:00
|
|
|
dsl_pool_t *dp __maybe_unused = ds->ds_dir->dd_pool;
|
2017-01-27 22:43:42 +03:00
|
|
|
uint64_t birth;
|
|
|
|
|
|
|
|
ASSERT(dsl_pool_config_held(dp));
|
2013-07-29 22:55:16 +04:00
|
|
|
if (snap == NULL)
|
2008-11-20 23:01:55 +03:00
|
|
|
return (B_FALSE);
|
2017-01-27 22:43:42 +03:00
|
|
|
rrw_enter(&ds->ds_bp_rwlock, RW_READER, FTAG);
|
|
|
|
birth = dsl_dataset_get_blkptr(ds)->blk_birth;
|
|
|
|
rrw_exit(&ds->ds_bp_rwlock, FTAG);
|
|
|
|
if (birth > dsl_dataset_phys(snap)->ds_creation_txg) {
|
2013-07-29 22:55:16 +04:00
|
|
|
objset_t *os, *os_snap;
|
2010-08-27 01:24:34 +04:00
|
|
|
/*
|
|
|
|
* It may be that only the ZIL differs, because it was
|
|
|
|
* reset in the head. Don't count that as being
|
|
|
|
* modified.
|
|
|
|
*/
|
|
|
|
if (dmu_objset_from_ds(ds, &os) != 0)
|
|
|
|
return (B_TRUE);
|
2013-07-29 22:55:16 +04:00
|
|
|
if (dmu_objset_from_ds(snap, &os_snap) != 0)
|
2010-08-27 01:24:34 +04:00
|
|
|
return (B_TRUE);
|
2022-02-25 16:26:54 +03:00
|
|
|
return (memcmp(&os->os_phys->os_meta_dnode,
|
2013-07-29 22:55:16 +04:00
|
|
|
&os_snap->os_phys->os_meta_dnode,
|
2010-08-27 01:24:34 +04:00
|
|
|
sizeof (os->os_phys->os_meta_dnode)) != 0);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
return (B_FALSE);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rename_snapshot_check_impl(dsl_pool_t *dp,
|
|
|
|
dsl_dataset_t *hds, void *arg)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2021-12-12 18:06:44 +03:00
|
|
|
(void) dp;
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rename_snapshot_arg_t *ddrsa = arg;
|
|
|
|
int error;
|
2008-11-20 23:01:55 +03:00
|
|
|
uint64_t val;
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_snap_lookup(hds, ddrsa->ddrsa_oldsnapname, &val);
|
|
|
|
if (error != 0) {
|
|
|
|
/* ignore nonexistent snapshots */
|
|
|
|
return (error == ENOENT ? 0 : error);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/* new name should not exist */
|
|
|
|
error = dsl_dataset_snap_lookup(hds, ddrsa->ddrsa_newsnapname, &val);
|
|
|
|
if (error == 0)
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EEXIST);
|
2013-09-04 16:00:57 +04:00
|
|
|
else if (error == ENOENT)
|
|
|
|
error = 0;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
/* dataset name + 1 for the "@" + the new snapshot name must fit */
|
2013-09-04 16:00:57 +04:00
|
|
|
if (dsl_dir_namelen(hds->ds_dir) + 1 +
|
2016-06-16 00:28:36 +03:00
|
|
|
strlen(ddrsa->ddrsa_newsnapname) >= ZFS_MAX_DATASET_NAME_LEN)
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(ENAMETOOLONG);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2022-09-02 23:31:19 +03:00
|
|
|
int
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rename_snapshot_check(void *arg, dmu_tx_t *tx)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rename_snapshot_arg_t *ddrsa = arg;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
2008-11-20 23:01:55 +03:00
|
|
|
dsl_dataset_t *hds;
|
2013-09-04 16:00:57 +04:00
|
|
|
int error;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_hold(dp, ddrsa->ddrsa_fsname, FTAG, &hds);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (ddrsa->ddrsa_recursive) {
|
|
|
|
error = dmu_objset_find_dp(dp, hds->ds_dir->dd_object,
|
|
|
|
dsl_dataset_rename_snapshot_check_impl, ddrsa,
|
|
|
|
DS_FIND_CHILDREN);
|
|
|
|
} else {
|
|
|
|
error = dsl_dataset_rename_snapshot_check_impl(dp, hds, ddrsa);
|
|
|
|
}
|
2008-12-03 23:09:06 +03:00
|
|
|
dsl_dataset_rele(hds, FTAG);
|
2013-09-04 16:00:57 +04:00
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rename_snapshot_sync_impl(dsl_pool_t *dp,
|
|
|
|
dsl_dataset_t *hds, void *arg)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rename_snapshot_arg_t *ddrsa = arg;
|
|
|
|
dsl_dataset_t *ds;
|
|
|
|
uint64_t val;
|
|
|
|
dmu_tx_t *tx = ddrsa->ddrsa_tx;
|
|
|
|
int error;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_snap_lookup(hds, ddrsa->ddrsa_oldsnapname, &val);
|
|
|
|
ASSERT(error == 0 || error == ENOENT);
|
|
|
|
if (error == ENOENT) {
|
|
|
|
/* ignore nonexistent snapshots */
|
|
|
|
return (0);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_dataset_hold_obj(dp, val, FTAG, &ds));
|
|
|
|
|
|
|
|
/* log before we change the name */
|
|
|
|
spa_history_log_internal_ds(ds, "rename", tx,
|
|
|
|
"-> @%s", ddrsa->ddrsa_newsnapname);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2015-04-01 16:07:48 +03:00
|
|
|
VERIFY0(dsl_dataset_snap_remove(hds, ddrsa->ddrsa_oldsnapname, tx,
|
|
|
|
B_FALSE));
|
2013-09-04 16:00:57 +04:00
|
|
|
mutex_enter(&ds->ds_lock);
|
2016-09-29 22:06:14 +03:00
|
|
|
(void) strlcpy(ds->ds_snapname, ddrsa->ddrsa_newsnapname,
|
|
|
|
sizeof (ds->ds_snapname));
|
2013-09-04 16:00:57 +04:00
|
|
|
mutex_exit(&ds->ds_lock);
|
2015-04-01 18:14:34 +03:00
|
|
|
VERIFY0(zap_add(dp->dp_meta_objset,
|
|
|
|
dsl_dataset_phys(hds)->ds_snapnames_zapobj,
|
2013-09-04 16:00:57 +04:00
|
|
|
ds->ds_snapname, 8, 1, &ds->ds_object, tx));
|
2014-03-22 13:07:14 +04:00
|
|
|
zvol_rename_minors(dp->dp_spa, ddrsa->ddrsa_oldsnapname,
|
|
|
|
ddrsa->ddrsa_newsnapname, B_TRUE);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2022-09-02 23:31:19 +03:00
|
|
|
void
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rename_snapshot_sync(void *arg, dmu_tx_t *tx)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rename_snapshot_arg_t *ddrsa = arg;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
2016-07-26 22:08:51 +03:00
|
|
|
dsl_dataset_t *hds = NULL;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_dataset_hold(dp, ddrsa->ddrsa_fsname, FTAG, &hds));
|
|
|
|
ddrsa->ddrsa_tx = tx;
|
|
|
|
if (ddrsa->ddrsa_recursive) {
|
|
|
|
VERIFY0(dmu_objset_find_dp(dp, hds->ds_dir->dd_object,
|
|
|
|
dsl_dataset_rename_snapshot_sync_impl, ddrsa,
|
|
|
|
DS_FIND_CHILDREN));
|
|
|
|
} else {
|
|
|
|
VERIFY0(dsl_dataset_rename_snapshot_sync_impl(dp, hds, ddrsa));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(hds, FTAG);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
int
|
|
|
|
dsl_dataset_rename_snapshot(const char *fsname,
|
|
|
|
const char *oldsnapname, const char *newsnapname, boolean_t recursive)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rename_snapshot_arg_t ddrsa;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
ddrsa.ddrsa_fsname = fsname;
|
|
|
|
ddrsa.ddrsa_oldsnapname = oldsnapname;
|
|
|
|
ddrsa.ddrsa_newsnapname = newsnapname;
|
|
|
|
ddrsa.ddrsa_recursive = recursive;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2014-03-22 13:07:14 +04:00
|
|
|
return (dsl_sync_task(fsname, dsl_dataset_rename_snapshot_check,
|
2014-11-03 23:28:43 +03:00
|
|
|
dsl_dataset_rename_snapshot_sync, &ddrsa,
|
2014-03-22 13:07:14 +04:00
|
|
|
1, ZFS_SPACE_CHECK_RESERVED));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2013-07-27 21:50:07 +04:00
|
|
|
/*
|
|
|
|
* If we're doing an ownership handoff, we need to make sure that there is
|
|
|
|
* only one long hold on the dataset. We're not allowed to change anything here
|
|
|
|
* so we don't permanently release the long hold or regular hold here. We want
|
|
|
|
* to do this only when syncing to avoid the dataset unexpectedly going away
|
2017-01-20 00:56:36 +03:00
|
|
|
* when we release the long hold.
|
2013-07-27 21:50:07 +04:00
|
|
|
*/
|
|
|
|
static int
|
|
|
|
dsl_dataset_handoff_check(dsl_dataset_t *ds, void *owner, dmu_tx_t *tx)
|
|
|
|
{
|
2020-04-01 20:02:06 +03:00
|
|
|
boolean_t held = B_FALSE;
|
2013-07-27 21:50:07 +04:00
|
|
|
|
|
|
|
if (!dmu_tx_is_syncing(tx))
|
|
|
|
return (0);
|
|
|
|
|
2020-04-01 20:02:06 +03:00
|
|
|
dsl_dir_t *dd = ds->ds_dir;
|
|
|
|
mutex_enter(&dd->dd_activity_lock);
|
|
|
|
uint64_t holds = zfs_refcount_count(&ds->ds_longholds) -
|
|
|
|
(owner != NULL ? 1 : 0);
|
|
|
|
/*
|
|
|
|
* The value of dd_activity_waiters can chance as soon as we drop the
|
|
|
|
* lock, but we're fine with that; new waiters coming in or old
|
|
|
|
* waiters leaving doesn't cause problems, since we're going to cancel
|
|
|
|
* waiters later anyway. The goal of this check is to verify that no
|
|
|
|
* non-waiters have long-holds, and all new long-holds will be
|
|
|
|
* prevented because we're holding the pool config as writer.
|
|
|
|
*/
|
|
|
|
if (holds != dd->dd_activity_waiters)
|
|
|
|
held = B_TRUE;
|
|
|
|
mutex_exit(&dd->dd_activity_lock);
|
2013-07-27 21:50:07 +04:00
|
|
|
|
|
|
|
if (held)
|
|
|
|
return (SET_ERROR(EBUSY));
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2018-02-08 19:20:33 +03:00
|
|
|
int
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rollback_check(void *arg, dmu_tx_t *tx)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-07-27 21:50:07 +04:00
|
|
|
dsl_dataset_rollback_arg_t *ddra = arg;
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
2008-11-20 23:01:55 +03:00
|
|
|
dsl_dataset_t *ds;
|
2013-09-04 16:00:57 +04:00
|
|
|
int64_t unused_refres_delta;
|
|
|
|
int error;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-07-27 21:50:07 +04:00
|
|
|
error = dsl_dataset_hold(dp, ddra->ddra_fsname, FTAG, &ds);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/* must not be a snapshot */
|
2015-04-02 06:44:32 +03:00
|
|
|
if (ds->ds_is_snapshot) {
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/* must have a most recent snapshot */
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(ds)->ds_prev_snap_txg < TXG_INITIAL) {
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2017-07-27 15:58:52 +03:00
|
|
|
return (SET_ERROR(ESRCH));
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2016-11-22 02:09:54 +03:00
|
|
|
/*
|
|
|
|
* No rollback to a snapshot created in the current txg, because
|
|
|
|
* the rollback may dirty the dataset and create blocks that are
|
|
|
|
* not reachable from the rootbp while having a birth txg that
|
|
|
|
* falls into the snapshot's range.
|
|
|
|
*/
|
|
|
|
if (dmu_tx_is_syncing(tx) &&
|
|
|
|
dsl_dataset_phys(ds)->ds_prev_snap_txg >= tx->tx_txg) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
return (SET_ERROR(EAGAIN));
|
|
|
|
}
|
|
|
|
|
2017-03-11 21:26:47 +03:00
|
|
|
/*
|
|
|
|
* If the expected target snapshot is specified, then check that
|
|
|
|
* the latest snapshot is it.
|
|
|
|
*/
|
|
|
|
if (ddra->ddra_tosnap != NULL) {
|
2017-07-27 15:58:52 +03:00
|
|
|
dsl_dataset_t *snapds;
|
|
|
|
|
|
|
|
/* Check if the target snapshot exists at all. */
|
|
|
|
error = dsl_dataset_hold(dp, ddra->ddra_tosnap, FTAG, &snapds);
|
|
|
|
if (error != 0) {
|
|
|
|
/*
|
|
|
|
* ESRCH is used to signal that the target snapshot does
|
|
|
|
* not exist, while ENOENT is used to report that
|
|
|
|
* the rolled back dataset does not exist.
|
|
|
|
* ESRCH is also used to cover other cases where the
|
|
|
|
* target snapshot is not related to the dataset being
|
|
|
|
* rolled back such as being in a different pool.
|
|
|
|
*/
|
|
|
|
if (error == ENOENT || error == EXDEV)
|
|
|
|
error = SET_ERROR(ESRCH);
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
ASSERT(snapds->ds_is_snapshot);
|
2017-03-11 21:26:47 +03:00
|
|
|
|
2017-07-27 15:58:52 +03:00
|
|
|
/* Check if the snapshot is the latest snapshot indeed. */
|
|
|
|
if (snapds != ds->ds_prev) {
|
|
|
|
/*
|
|
|
|
* Distinguish between the case where the only problem
|
|
|
|
* is intervening snapshots (EEXIST) vs the snapshot
|
|
|
|
* not being a valid target for rollback (ESRCH).
|
|
|
|
*/
|
|
|
|
if (snapds->ds_dir == ds->ds_dir ||
|
|
|
|
(dsl_dir_is_clone(ds->ds_dir) &&
|
|
|
|
dsl_dir_phys(ds->ds_dir)->dd_origin_obj ==
|
|
|
|
snapds->ds_object)) {
|
|
|
|
error = SET_ERROR(EEXIST);
|
|
|
|
} else {
|
|
|
|
error = SET_ERROR(ESRCH);
|
|
|
|
}
|
|
|
|
dsl_dataset_rele(snapds, FTAG);
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
dsl_dataset_rele(snapds, FTAG);
|
2017-03-11 21:26:47 +03:00
|
|
|
}
|
|
|
|
|
2013-12-12 02:33:41 +04:00
|
|
|
/* must not have any bookmarks after the most recent snapshot */
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (dsl_bookmark_latest_txg(ds) >
|
|
|
|
dsl_dataset_phys(ds)->ds_prev_snap_txg) {
|
2017-07-27 15:58:52 +03:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
return (SET_ERROR(EEXIST));
|
2013-12-12 02:33:41 +04:00
|
|
|
}
|
|
|
|
|
2013-07-27 21:50:07 +04:00
|
|
|
error = dsl_dataset_handoff_check(ds, ddra->ddra_owner, tx);
|
|
|
|
if (error != 0) {
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2013-07-27 21:50:07 +04:00
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* Check if the snap we are rolling back to uses more than
|
|
|
|
* the refquota.
|
|
|
|
*/
|
|
|
|
if (ds->ds_quota != 0 &&
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds->ds_prev)->ds_referenced_bytes > ds->ds_quota) {
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EDQUOT));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* When we do the clone swap, we will temporarily use more space
|
|
|
|
* due to the refreservation (the head will no longer have any
|
|
|
|
* unique space, so the entire amount of the refreservation will need
|
|
|
|
* to be free). We will immediately destroy the clone, freeing
|
|
|
|
* this space, but the freeing happens over many txg's.
|
|
|
|
*/
|
|
|
|
unused_refres_delta = (int64_t)MIN(ds->ds_reserved,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_unique_bytes);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (unused_refres_delta > 0 &&
|
|
|
|
unused_refres_delta >
|
|
|
|
dsl_dir_space_available(ds->ds_dir, NULL, 0, TRUE)) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOSPC));
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
return (0);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2018-02-08 19:20:33 +03:00
|
|
|
void
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rollback_sync(void *arg, dmu_tx_t *tx)
|
|
|
|
{
|
2013-07-27 21:50:07 +04:00
|
|
|
dsl_dataset_rollback_arg_t *ddra = arg;
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
|
|
|
dsl_dataset_t *ds, *clone;
|
|
|
|
uint64_t cloneobj;
|
2016-06-16 00:28:36 +03:00
|
|
|
char namebuf[ZFS_MAX_DATASET_NAME_LEN];
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-07-27 21:50:07 +04:00
|
|
|
VERIFY0(dsl_dataset_hold(dp, ddra->ddra_fsname, FTAG, &ds));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-08-14 23:42:31 +04:00
|
|
|
dsl_dataset_name(ds->ds_prev, namebuf);
|
|
|
|
fnvlist_add_string(ddra->ddra_result, "target", namebuf);
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
cloneobj = dsl_dataset_create_sync(ds->ds_dir, "%rollback",
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
ds->ds_prev, DS_CREATE_FLAG_NODIRTY, kcred, NULL, tx);
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
VERIFY0(dsl_dataset_hold_obj(dp, cloneobj, FTAG, &clone));
|
|
|
|
|
|
|
|
dsl_dataset_clone_swap_sync_impl(clone, ds, tx);
|
|
|
|
dsl_dataset_zero_zil(ds, tx);
|
|
|
|
|
|
|
|
dsl_destroy_head_sync_impl(clone, tx);
|
|
|
|
|
|
|
|
dsl_dataset_rele(clone, FTAG);
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
}
|
|
|
|
|
2013-07-27 21:50:07 +04:00
|
|
|
/*
|
2013-08-14 23:42:31 +04:00
|
|
|
* Rolls back the given filesystem or volume to the most recent snapshot.
|
|
|
|
* The name of the most recent snapshot will be returned under key "target"
|
|
|
|
* in the result nvlist.
|
2013-07-27 21:50:07 +04:00
|
|
|
*
|
2013-08-14 23:42:31 +04:00
|
|
|
* If owner != NULL:
|
2013-07-27 21:50:07 +04:00
|
|
|
* - The existing dataset MUST be owned by the specified owner at entry
|
|
|
|
* - Upon return, dataset will still be held by the same owner, whether we
|
|
|
|
* succeed or not.
|
|
|
|
*
|
|
|
|
* This mode is required any time the existing filesystem is mounted. See
|
|
|
|
* notes above zfs_suspend_fs() for further details.
|
|
|
|
*/
|
2013-09-04 16:00:57 +04:00
|
|
|
int
|
2017-03-11 21:26:47 +03:00
|
|
|
dsl_dataset_rollback(const char *fsname, const char *tosnap, void *owner,
|
|
|
|
nvlist_t *result)
|
2013-09-04 16:00:57 +04:00
|
|
|
{
|
2013-07-27 21:50:07 +04:00
|
|
|
dsl_dataset_rollback_arg_t ddra;
|
|
|
|
|
|
|
|
ddra.ddra_fsname = fsname;
|
2017-03-11 21:26:47 +03:00
|
|
|
ddra.ddra_tosnap = tosnap;
|
2013-07-27 21:50:07 +04:00
|
|
|
ddra.ddra_owner = owner;
|
2013-08-14 23:42:31 +04:00
|
|
|
ddra.ddra_result = result;
|
2013-07-27 21:50:07 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
return (dsl_sync_task(fsname, dsl_dataset_rollback_check,
|
2014-11-03 23:28:43 +03:00
|
|
|
dsl_dataset_rollback_sync, &ddra,
|
|
|
|
1, ZFS_SPACE_CHECK_RESERVED));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
struct promotenode {
|
|
|
|
list_node_t link;
|
|
|
|
dsl_dataset_t *ds;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int snaplist_space(list_t *l, uint64_t mintxg, uint64_t *spacep);
|
2013-09-04 16:00:57 +04:00
|
|
|
static int promote_hold(dsl_dataset_promote_arg_t *ddpa, dsl_pool_t *dp,
|
2022-04-19 21:38:30 +03:00
|
|
|
const void *tag);
|
|
|
|
static void promote_rele(dsl_dataset_promote_arg_t *ddpa, const void *tag);
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
int
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_promote_check(void *arg, dmu_tx_t *tx)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_promote_arg_t *ddpa = arg;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
|
|
|
dsl_dataset_t *hds;
|
|
|
|
struct promotenode *snap;
|
2008-11-20 23:01:55 +03:00
|
|
|
int err;
|
2010-05-29 00:45:14 +04:00
|
|
|
uint64_t unused;
|
2015-04-01 16:07:48 +03:00
|
|
|
uint64_t ss_mv_cnt;
|
2015-07-02 16:04:17 +03:00
|
|
|
size_t max_snap_len;
|
2018-02-08 19:16:23 +03:00
|
|
|
boolean_t conflicting_snaps;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
err = promote_hold(ddpa, dp, FTAG);
|
|
|
|
if (err != 0)
|
|
|
|
return (err);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
hds = ddpa->ddpa_clone;
|
2015-07-02 16:04:17 +03:00
|
|
|
max_snap_len = MAXNAMELEN - strlen(ddpa->ddpa_clonename) - 1;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(hds)->ds_flags & DS_FLAG_NOPROMOTE) {
|
2013-09-04 16:00:57 +04:00
|
|
|
promote_rele(ddpa, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EXDEV));
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
snap = list_head(&ddpa->shared_snaps);
|
|
|
|
if (snap == NULL) {
|
|
|
|
err = SET_ERROR(ENOENT);
|
|
|
|
goto out;
|
|
|
|
}
|
2022-10-01 02:59:51 +03:00
|
|
|
dsl_dataset_t *const origin_ds = snap->ds;
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Encrypted clones share a DSL Crypto Key with their origin's dsl dir.
|
|
|
|
* When doing a promote we must make sure the encryption root for
|
|
|
|
* both the target and the target's origin does not change to avoid
|
|
|
|
* needing to rewrap encryption keys
|
|
|
|
*/
|
|
|
|
err = dsl_dataset_promote_crypt_check(hds->ds_dir, origin_ds->ds_dir);
|
|
|
|
if (err != 0)
|
|
|
|
goto out;
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* Compute and check the amount of space to transfer. Since this is
|
|
|
|
* so expensive, don't do the preliminary check.
|
|
|
|
*/
|
|
|
|
if (!dmu_tx_is_syncing(tx)) {
|
|
|
|
promote_rele(ddpa, FTAG);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/* compute origin's new unique space */
|
2013-09-04 16:00:57 +04:00
|
|
|
snap = list_tail(&ddpa->clone_snaps);
|
2018-02-20 22:19:42 +03:00
|
|
|
ASSERT(snap != NULL);
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT3U(dsl_dataset_phys(snap->ds)->ds_prev_snap_obj, ==,
|
|
|
|
origin_ds->ds_object);
|
2010-05-29 00:45:14 +04:00
|
|
|
dsl_deadlist_space_range(&snap->ds->ds_deadlist,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin_ds)->ds_prev_snap_txg, UINT64_MAX,
|
2013-09-04 16:00:57 +04:00
|
|
|
&ddpa->unique, &unused, &unused);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
/*
|
|
|
|
* Walk the snapshots that we are moving
|
|
|
|
*
|
|
|
|
* Compute space to transfer. Consider the incremental changes
|
2013-09-04 16:00:57 +04:00
|
|
|
* to used by each snapshot:
|
2008-12-03 23:09:06 +03:00
|
|
|
* (my used) = (prev's used) + (blocks born) - (blocks killed)
|
|
|
|
* So each snapshot gave birth to:
|
|
|
|
* (blocks born) = (my used) - (prev's used) + (blocks killed)
|
|
|
|
* So a sequence would look like:
|
|
|
|
* (uN - u(N-1) + kN) + ... + (u1 - u0 + k1) + (u0 - 0 + k0)
|
|
|
|
* Which simplifies to:
|
|
|
|
* uN + kN + kN-1 + ... + k1 + k0
|
|
|
|
* Note however, if we stop before we reach the ORIGIN we get:
|
|
|
|
* uN + kN + kN-1 + ... + kM - uM-1
|
|
|
|
*/
|
2018-02-08 19:16:23 +03:00
|
|
|
conflicting_snaps = B_FALSE;
|
2015-04-01 16:07:48 +03:00
|
|
|
ss_mv_cnt = 0;
|
2015-04-01 18:14:34 +03:00
|
|
|
ddpa->used = dsl_dataset_phys(origin_ds)->ds_referenced_bytes;
|
|
|
|
ddpa->comp = dsl_dataset_phys(origin_ds)->ds_compressed_bytes;
|
|
|
|
ddpa->uncomp = dsl_dataset_phys(origin_ds)->ds_uncompressed_bytes;
|
2013-09-04 16:00:57 +04:00
|
|
|
for (snap = list_head(&ddpa->shared_snaps); snap;
|
|
|
|
snap = list_next(&ddpa->shared_snaps, snap)) {
|
2008-11-20 23:01:55 +03:00
|
|
|
uint64_t val, dlused, dlcomp, dluncomp;
|
2008-12-03 23:09:06 +03:00
|
|
|
dsl_dataset_t *ds = snap->ds;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2015-04-01 16:07:48 +03:00
|
|
|
ss_mv_cnt++;
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* If there are long holds, we won't be able to evict
|
|
|
|
* the objset.
|
|
|
|
*/
|
|
|
|
if (dsl_dataset_long_held(ds)) {
|
2013-03-08 22:41:28 +04:00
|
|
|
err = SET_ERROR(EBUSY);
|
2013-09-04 16:00:57 +04:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/* Check that the snapshot name does not conflict */
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_dataset_get_snapname(ds));
|
2015-07-02 16:04:17 +03:00
|
|
|
if (strlen(ds->ds_snapname) >= max_snap_len) {
|
|
|
|
err = SET_ERROR(ENAMETOOLONG);
|
|
|
|
goto out;
|
|
|
|
}
|
2008-12-03 23:09:06 +03:00
|
|
|
err = dsl_dataset_snap_lookup(hds, ds->ds_snapname, &val);
|
2010-05-29 00:45:14 +04:00
|
|
|
if (err == 0) {
|
2018-02-08 19:16:23 +03:00
|
|
|
fnvlist_add_boolean(ddpa->err_ds,
|
|
|
|
snap->ds->ds_snapname);
|
|
|
|
conflicting_snaps = B_TRUE;
|
|
|
|
} else if (err != ENOENT) {
|
2010-05-29 00:45:14 +04:00
|
|
|
goto out;
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
/* The very first snapshot does not have a deadlist */
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(ds)->ds_prev_snap_obj == 0)
|
2008-12-03 23:09:06 +03:00
|
|
|
continue;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
dsl_deadlist_space(&ds->ds_deadlist,
|
|
|
|
&dlused, &dlcomp, &dluncomp);
|
2013-09-04 16:00:57 +04:00
|
|
|
ddpa->used += dlused;
|
|
|
|
ddpa->comp += dlcomp;
|
|
|
|
ddpa->uncomp += dluncomp;
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
/*
|
|
|
|
* Check that bookmarks that are being transferred don't have
|
|
|
|
* name conflicts.
|
|
|
|
*/
|
2022-10-01 02:59:51 +03:00
|
|
|
for (dsl_bookmark_node_t *dbn = avl_first(&origin_ds->ds_bookmarks);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dbn != NULL && dbn->dbn_phys.zbm_creation_txg <=
|
|
|
|
dsl_dataset_phys(origin_ds)->ds_creation_txg;
|
2022-10-01 02:59:51 +03:00
|
|
|
dbn = AVL_NEXT(&origin_ds->ds_bookmarks, dbn)) {
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (strlen(dbn->dbn_name) >= max_snap_len) {
|
|
|
|
err = SET_ERROR(ENAMETOOLONG);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
zfs_bookmark_phys_t bm;
|
|
|
|
err = dsl_bookmark_lookup_impl(ddpa->ddpa_clone,
|
|
|
|
dbn->dbn_name, &bm);
|
|
|
|
|
|
|
|
if (err == 0) {
|
|
|
|
fnvlist_add_boolean(ddpa->err_ds, dbn->dbn_name);
|
|
|
|
conflicting_snaps = B_TRUE;
|
|
|
|
} else if (err == ESRCH) {
|
|
|
|
err = 0;
|
Cleanup: Address Clang's static analyzer's unused code complaints
These were categorized as the following:
* Dead assignment 23
* Dead increment 4
* Dead initialization 6
* Dead nested assignment 18
Most of these are harmless, but since actual issues can hide among them,
we correct them.
That said, there were a few return values that were being ignored that
appeared to merit some correction:
* `destroy_callback()` in `cmd/zfs/zfs_main.c` ignored the error from
`destroy_batched()`. We handle it by returning -1 if there is an
error.
* `zfs_do_upgrade()` in `cmd/zfs/zfs_main.c` ignored the error from
`zfs_for_each()`. We handle it by doing a binary OR of the error
value from the subsequent `zfs_for_each()` call to the existing
value. This is how errors are mostly handled inside `zfs_for_each()`.
The error value here is passed to exit from the zfs command, so doing
a binary or on it is better than what we did previously.
* `get_zap_prop()` in `module/zfs/zcp_get.c` ignored the error from
`dsl_prop_get_ds()` when the property is not of type string. We
return an error when it does. There is a small concern that the
`zfs_get_temporary_prop()` call would handle things, but in the case
that it does not, we would be pushing an uninitialized numval onto
the lua stack. It is expected that `dsl_prop_get_ds()` will succeed
anytime that `zfs_get_temporary_prop()` does, so that not giving it a
chance to fix things is not a problem.
* `draid_merge_impl()` in `tests/zfs-tests/cmd/draid.c` used
`nvlist_add_nvlist()` twice in ways in which errors are expected to
be impossible, so we switch to `fnvlist_add_nvlist()`.
A few notable ones did not merit use of the return value, so we
suppressed it with `(void)`:
* `write_free_diffs()` in `lib/libzfs/libzfs_diff.c` ignored the error
value from `describe_free()`. A look through the commit history
revealed that this was intentional.
* `arc_evict_hdr()` in `module/zfs/arc.c` did not need to use the
returned handle from `arc_hdr_realloc()` because it is already
referenced in lists.
* `spa_vdev_detach()` in `module/zfs/spa.c` has a comment explicitly
saying not to use the error from `vdev_label_init()` because whatever
causes the error could be the reason why a detach is being done.
Unfortunately, I am not presently able to analyze the kernel modules
with Clang's static analyzer, so I could have missed some cases of this.
In cases where reports were present in code that is duplicated between
Linux and FreeBSD, I made a conscious effort to fix the FreeBSD version
too.
After this commit is merged, regressions like dee8934 should become
extremely obvious with Clang's static analyzer since a regression would
appear in the results as the only instance of unused code. That assumes
that Coverity does not catch the issue first.
My local branch with fixes from all of my outstanding non-draft pull
requests shows 118 reports from Clang's static anlayzer after this
patch. That is down by 51 from 169.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13986
2022-10-14 23:37:54 +03:00
|
|
|
}
|
|
|
|
if (err != 0) {
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
/*
|
|
|
|
* In order to return the full list of conflicting snapshots, we check
|
|
|
|
* whether there was a conflict after traversing all of them.
|
|
|
|
*/
|
|
|
|
if (conflicting_snaps) {
|
|
|
|
err = SET_ERROR(EEXIST);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
/*
|
|
|
|
* If we are a clone of a clone then we never reached ORIGIN,
|
|
|
|
* so we need to subtract out the clone origin's used space.
|
|
|
|
*/
|
2013-09-04 16:00:57 +04:00
|
|
|
if (ddpa->origin_origin) {
|
2015-04-01 18:14:34 +03:00
|
|
|
ddpa->used -=
|
|
|
|
dsl_dataset_phys(ddpa->origin_origin)->ds_referenced_bytes;
|
|
|
|
ddpa->comp -=
|
|
|
|
dsl_dataset_phys(ddpa->origin_origin)->ds_compressed_bytes;
|
2013-09-04 16:00:57 +04:00
|
|
|
ddpa->uncomp -=
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ddpa->origin_origin)->
|
|
|
|
ds_uncompressed_bytes;
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2015-04-01 16:07:48 +03:00
|
|
|
/* Check that there is enough space and limit headroom here */
|
2008-12-03 23:09:06 +03:00
|
|
|
err = dsl_dir_transfer_possible(origin_ds->ds_dir, hds->ds_dir,
|
2020-07-12 03:18:02 +03:00
|
|
|
0, ss_mv_cnt, ddpa->used, ddpa->cr, ddpa->proc);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (err != 0)
|
|
|
|
goto out;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
/*
|
|
|
|
* Compute the amounts of space that will be used by snapshots
|
|
|
|
* after the promotion (for both origin and clone). For each,
|
|
|
|
* it is the amount of space that will be on all of their
|
|
|
|
* deadlists (that was not born before their new origin).
|
|
|
|
*/
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dir_phys(hds->ds_dir)->dd_flags & DD_FLAG_USED_BREAKDOWN) {
|
2008-12-03 23:09:06 +03:00
|
|
|
uint64_t space;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Note, typically this will not be a clone of a clone,
|
2010-05-29 00:45:14 +04:00
|
|
|
* so dd_origin_txg will be < TXG_INITIAL, so
|
|
|
|
* these snaplist_space() -> dsl_deadlist_space_range()
|
2008-12-03 23:09:06 +03:00
|
|
|
* calls will be fast because they do not have to
|
|
|
|
* iterate over all bps.
|
|
|
|
*/
|
2013-09-04 16:00:57 +04:00
|
|
|
snap = list_head(&ddpa->origin_snaps);
|
2016-10-01 01:56:17 +03:00
|
|
|
if (snap == NULL) {
|
|
|
|
err = SET_ERROR(ENOENT);
|
|
|
|
goto out;
|
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
err = snaplist_space(&ddpa->shared_snaps,
|
|
|
|
snap->ds->ds_dir->dd_origin_txg, &ddpa->cloneusedsnap);
|
|
|
|
if (err != 0)
|
|
|
|
goto out;
|
2008-12-03 23:09:06 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
err = snaplist_space(&ddpa->clone_snaps,
|
2010-05-29 00:45:14 +04:00
|
|
|
snap->ds->ds_dir->dd_origin_txg, &space);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (err != 0)
|
|
|
|
goto out;
|
|
|
|
ddpa->cloneusedsnap += space;
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dir_phys(origin_ds->ds_dir)->dd_flags &
|
|
|
|
DD_FLAG_USED_BREAKDOWN) {
|
2013-09-04 16:00:57 +04:00
|
|
|
err = snaplist_space(&ddpa->origin_snaps,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin_ds)->ds_creation_txg,
|
|
|
|
&ddpa->originusedsnap);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (err != 0)
|
|
|
|
goto out;
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
out:
|
2013-09-04 16:00:57 +04:00
|
|
|
promote_rele(ddpa, FTAG);
|
2010-05-29 00:45:14 +04:00
|
|
|
return (err);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
void
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_promote_sync(void *arg, dmu_tx_t *tx)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_promote_arg_t *ddpa = arg;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
|
|
|
dsl_dataset_t *hds;
|
|
|
|
struct promotenode *snap;
|
|
|
|
dsl_dataset_t *origin_ds;
|
2008-12-03 23:09:06 +03:00
|
|
|
dsl_dataset_t *origin_head;
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dir_t *dd;
|
2008-11-20 23:01:55 +03:00
|
|
|
dsl_dir_t *odd = NULL;
|
2008-12-03 23:09:06 +03:00
|
|
|
uint64_t oldnext_obj;
|
|
|
|
int64_t delta;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2019-07-26 20:54:14 +03:00
|
|
|
ASSERT(nvlist_empty(ddpa->err_ds));
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(promote_hold(ddpa, dp, FTAG));
|
|
|
|
hds = ddpa->ddpa_clone;
|
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT0(dsl_dataset_phys(hds)->ds_flags & DS_FLAG_NOPROMOTE);
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
snap = list_head(&ddpa->shared_snaps);
|
|
|
|
origin_ds = snap->ds;
|
|
|
|
dd = hds->ds_dir;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
snap = list_head(&ddpa->origin_snaps);
|
2008-12-03 23:09:06 +03:00
|
|
|
origin_head = snap->ds;
|
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/*
|
|
|
|
* We need to explicitly open odd, since origin_ds's dd will be
|
|
|
|
* changing.
|
|
|
|
*/
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_dir_hold_obj(dp, origin_ds->ds_dir->dd_object,
|
2008-11-20 23:01:55 +03:00
|
|
|
NULL, FTAG, &odd));
|
|
|
|
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
dsl_dataset_promote_crypt_sync(hds->ds_dir, odd, tx);
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
/* change origin's next snap */
|
|
|
|
dmu_buf_will_dirty(origin_ds->ds_dbuf, tx);
|
2015-04-01 18:14:34 +03:00
|
|
|
oldnext_obj = dsl_dataset_phys(origin_ds)->ds_next_snap_obj;
|
2013-09-04 16:00:57 +04:00
|
|
|
snap = list_tail(&ddpa->clone_snaps);
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT3U(dsl_dataset_phys(snap->ds)->ds_prev_snap_obj, ==,
|
|
|
|
origin_ds->ds_object);
|
|
|
|
dsl_dataset_phys(origin_ds)->ds_next_snap_obj = snap->ds->ds_object;
|
2008-12-03 23:09:06 +03:00
|
|
|
|
|
|
|
/* change the origin's next clone */
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(origin_ds)->ds_next_clones_obj) {
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_remove_from_next_clones(origin_ds,
|
|
|
|
snap->ds->ds_object, tx);
|
|
|
|
VERIFY0(zap_add_int(dp->dp_meta_objset,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin_ds)->ds_next_clones_obj,
|
2008-12-03 23:09:06 +03:00
|
|
|
oldnext_obj, tx));
|
|
|
|
}
|
|
|
|
|
|
|
|
/* change origin */
|
|
|
|
dmu_buf_will_dirty(dd->dd_dbuf, tx);
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT3U(dsl_dir_phys(dd)->dd_origin_obj, ==, origin_ds->ds_object);
|
|
|
|
dsl_dir_phys(dd)->dd_origin_obj = dsl_dir_phys(odd)->dd_origin_obj;
|
2010-05-29 00:45:14 +04:00
|
|
|
dd->dd_origin_txg = origin_head->ds_dir->dd_origin_txg;
|
2008-12-03 23:09:06 +03:00
|
|
|
dmu_buf_will_dirty(odd->dd_dbuf, tx);
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(odd)->dd_origin_obj = origin_ds->ds_object;
|
2010-05-29 00:45:14 +04:00
|
|
|
origin_head->ds_dir->dd_origin_txg =
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin_ds)->ds_creation_txg;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
/* change dd_clone entries */
|
|
|
|
if (spa_version(dp->dp_spa) >= SPA_VERSION_DIR_CLONES) {
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(zap_remove_int(dp->dp_meta_objset,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(odd)->dd_clones, hds->ds_object, tx));
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(zap_add_int(dp->dp_meta_objset,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(ddpa->origin_origin->ds_dir)->dd_clones,
|
2010-05-29 00:45:14 +04:00
|
|
|
hds->ds_object, tx));
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(zap_remove_int(dp->dp_meta_objset,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(ddpa->origin_origin->ds_dir)->dd_clones,
|
2010-05-29 00:45:14 +04:00
|
|
|
origin_head->ds_object, tx));
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dir_phys(dd)->dd_clones == 0) {
|
|
|
|
dsl_dir_phys(dd)->dd_clones =
|
|
|
|
zap_create(dp->dp_meta_objset, DMU_OT_DSL_CLONES,
|
|
|
|
DMU_OT_NONE, 0, tx);
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(zap_add_int(dp->dp_meta_objset,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(dd)->dd_clones, origin_head->ds_object, tx));
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
/*
|
|
|
|
* Move bookmarks to this dir.
|
|
|
|
*/
|
|
|
|
dsl_bookmark_node_t *dbn_next;
|
|
|
|
for (dsl_bookmark_node_t *dbn = avl_first(&origin_head->ds_bookmarks);
|
|
|
|
dbn != NULL && dbn->dbn_phys.zbm_creation_txg <=
|
|
|
|
dsl_dataset_phys(origin_ds)->ds_creation_txg;
|
|
|
|
dbn = dbn_next) {
|
|
|
|
dbn_next = AVL_NEXT(&origin_head->ds_bookmarks, dbn);
|
|
|
|
|
|
|
|
avl_remove(&origin_head->ds_bookmarks, dbn);
|
|
|
|
VERIFY0(zap_remove(dp->dp_meta_objset,
|
|
|
|
origin_head->ds_bookmarks_obj, dbn->dbn_name, tx));
|
|
|
|
|
|
|
|
dsl_bookmark_node_add(hds, dbn, tx);
|
|
|
|
}
|
|
|
|
|
|
|
|
dsl_bookmark_next_changed(hds, origin_ds, tx);
|
|
|
|
|
2008-12-03 23:09:06 +03:00
|
|
|
/* move snapshots to this dir */
|
2013-09-04 16:00:57 +04:00
|
|
|
for (snap = list_head(&ddpa->shared_snaps); snap;
|
|
|
|
snap = list_next(&ddpa->shared_snaps, snap)) {
|
2008-12-03 23:09:06 +03:00
|
|
|
dsl_dataset_t *ds = snap->ds;
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* Property callbacks are registered to a particular
|
|
|
|
* dsl_dir. Since ours is changing, evict the objset
|
|
|
|
* so that they will be unregistered from the old dsl_dir.
|
|
|
|
*/
|
2010-05-29 00:45:14 +04:00
|
|
|
if (ds->ds_objset) {
|
|
|
|
dmu_objset_evict(ds->ds_objset);
|
|
|
|
ds->ds_objset = NULL;
|
2008-12-03 23:09:06 +03:00
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/* move snap name entry */
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_dataset_get_snapname(ds));
|
|
|
|
VERIFY0(dsl_dataset_snap_remove(origin_head,
|
2015-04-01 16:07:48 +03:00
|
|
|
ds->ds_snapname, tx, B_TRUE));
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(zap_add(dp->dp_meta_objset,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(hds)->ds_snapnames_zapobj, ds->ds_snapname,
|
2008-11-20 23:01:55 +03:00
|
|
|
8, 1, &ds->ds_object, tx));
|
2015-04-01 16:07:48 +03:00
|
|
|
dsl_fs_ss_count_adjust(hds->ds_dir, 1,
|
|
|
|
DD_FIELD_SNAPSHOT_COUNT, tx);
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2008-11-20 23:01:55 +03:00
|
|
|
/* change containing dsl_dir */
|
|
|
|
dmu_buf_will_dirty(ds->ds_dbuf, tx);
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT3U(dsl_dataset_phys(ds)->ds_dir_obj, ==, odd->dd_object);
|
|
|
|
dsl_dataset_phys(ds)->ds_dir_obj = dd->dd_object;
|
2008-11-20 23:01:55 +03:00
|
|
|
ASSERT3P(ds->ds_dir, ==, odd);
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dir_rele(ds->ds_dir, ds);
|
|
|
|
VERIFY0(dsl_dir_hold_obj(dp, dd->dd_object,
|
2008-11-20 23:01:55 +03:00
|
|
|
NULL, ds, &ds->ds_dir));
|
|
|
|
|
2010-05-29 00:45:14 +04:00
|
|
|
/* move any clone references */
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(ds)->ds_next_clones_obj &&
|
2010-05-29 00:45:14 +04:00
|
|
|
spa_version(dp->dp_spa) >= SPA_VERSION_DIR_CLONES) {
|
|
|
|
zap_cursor_t zc;
|
|
|
|
zap_attribute_t za;
|
|
|
|
|
|
|
|
for (zap_cursor_init(&zc, dp->dp_meta_objset,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_next_clones_obj);
|
2010-05-29 00:45:14 +04:00
|
|
|
zap_cursor_retrieve(&zc, &za) == 0;
|
2013-09-04 16:00:57 +04:00
|
|
|
zap_cursor_advance(&zc)) {
|
|
|
|
dsl_dataset_t *cnds;
|
|
|
|
uint64_t o;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (za.za_first_integer == oldnext_obj) {
|
|
|
|
/*
|
|
|
|
* We've already moved the
|
|
|
|
* origin's reference.
|
|
|
|
*/
|
|
|
|
continue;
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_dataset_hold_obj(dp,
|
|
|
|
za.za_first_integer, FTAG, &cnds));
|
2015-04-01 18:14:34 +03:00
|
|
|
o = dsl_dir_phys(cnds->ds_dir)->
|
|
|
|
dd_head_dataset_obj;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(zap_remove_int(dp->dp_meta_objset,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(odd)->dd_clones, o, tx));
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(zap_add_int(dp->dp_meta_objset,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(dd)->dd_clones, o, tx));
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(cnds, FTAG);
|
|
|
|
}
|
|
|
|
zap_cursor_fini(&zc);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT(!dsl_prop_hascb(ds));
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2013-09-04 16:00:57 +04:00
|
|
|
* Change space accounting.
|
|
|
|
* Note, pa->*usedsnap and dd_used_breakdown[SNAP] will either
|
|
|
|
* both be valid, or both be 0 (resulting in delta == 0). This
|
|
|
|
* is true for each of {clone,origin} independently.
|
2008-11-20 23:01:55 +03:00
|
|
|
*/
|
2012-01-20 22:58:57 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
delta = ddpa->cloneusedsnap -
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(dd)->dd_used_breakdown[DD_USED_SNAP];
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT3S(delta, >=, 0);
|
|
|
|
ASSERT3U(ddpa->used, >=, delta);
|
|
|
|
dsl_dir_diduse_space(dd, DD_USED_SNAP, delta, 0, 0, tx);
|
|
|
|
dsl_dir_diduse_space(dd, DD_USED_HEAD,
|
|
|
|
ddpa->used - delta, ddpa->comp, ddpa->uncomp, tx);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
delta = ddpa->originusedsnap -
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(odd)->dd_used_breakdown[DD_USED_SNAP];
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT3S(delta, <=, 0);
|
|
|
|
ASSERT3U(ddpa->used, >=, -delta);
|
|
|
|
dsl_dir_diduse_space(odd, DD_USED_SNAP, delta, 0, 0, tx);
|
|
|
|
dsl_dir_diduse_space(odd, DD_USED_HEAD,
|
|
|
|
-ddpa->used - delta, -ddpa->comp, -ddpa->uncomp, tx);
|
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin_ds)->ds_unique_bytes = ddpa->unique;
|
2013-09-04 16:00:57 +04:00
|
|
|
|
2019-07-26 20:54:14 +03:00
|
|
|
/*
|
|
|
|
* Since livelists are specific to a clone's origin txg, they
|
|
|
|
* are no longer accurate. Destroy the livelist from the clone being
|
|
|
|
* promoted. If the origin dataset is a clone, destroy its livelist
|
|
|
|
* as well.
|
|
|
|
*/
|
|
|
|
dsl_dir_remove_livelist(dd, tx, B_TRUE);
|
zfs promote does not delete livelist of origin
When a clone is promoted, its livelist is no longer accurate, so it is
discarded. If the clone's origin is also a clone (i.e. we are promoting
a clone of a clone), then the origin's livelist is also no longer
accurate, so it should be discarded, but the code doesn't actually do
that.
Consider a pool with:
* Filesystem A
* Clone B, a clone of A
* Clone C, a clone of B
If we promote C, it discards C's livelist. It should discard B's
livelist, but that is not happening. The impact is that when B is
destroyed, we use the livelist to find the blocks to free, but the
livelist is no longer correct so we end up freeing blocks that are still
in use by C. The incorrectly-freed blocks can be reallocated causing
checksum errors. And when C is destroyed it can double-free the
incorrectly-freed blocks.
The problem is that we remove the livelist of `origin_ds->ds_dir`, but
the origin snapshot has already been moved to the promoted dsl_dir. So
this is actually trying to remove the livelist of the promoted dsl_dir,
which was already removed. As explained in a comment in the beginning
of `dsl_dataset_promote_sync()`, we need to use the saved `odd` for the
origin's dsl_dir.
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10652
2020-07-31 18:59:00 +03:00
|
|
|
dsl_dir_remove_livelist(odd, tx, B_TRUE);
|
2019-07-26 20:54:14 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/* log history record */
|
2019-09-12 23:28:26 +03:00
|
|
|
spa_history_log_internal_ds(hds, "promote", tx, " ");
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
dsl_dir_rele(odd, FTAG);
|
|
|
|
promote_rele(ddpa, FTAG);
|
Improve zpool status output, list all affected datasets
Currently, determining which datasets are affected by corruption is
a manual process.
The primary difficulty in reporting the list of affected snapshots is
that since the error was initially found, the snapshot where the error
originally occurred in, may have been deleted. To solve this issue, we
add the ID of the head dataset of the original snapshot which the error
was detected in, to the stored error report. Then any time a filesystem
is deleted, the errors associated with it are deleted as well. Any time
a clone promote occurs, we modify reports associated with the original
head to refer to the new head. The stored error reports are identified
by this head ID, the birth time of the block which the error occurred
in, as well as some information about the error itself are also stored.
Once this information is stored, we can find the set of datasets
affected by an error by walking back the list of snapshots in the given
head until we find one with the appropriate birth txg, and then traverse
through the snapshots of the clone family, terminating a branch if the
block was replaced in a given snapshot. Then we report this information
back to libzfs, and to the zpool status command, where it is displayed
as follows:
pool: test
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:00:00 with 800 errors on Fri Dec 3
08:27:57 2021
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
sdb ONLINE 0 0 1.58K
errors: Permanent errors have been detected in the following files:
test@1:/test.0.0
/test/test.0.0
/test/1clone/test.0.0
A new feature flag is introduced to mark the presence of this change, as
well as promotion and backwards compatibility logic. This is an updated
version of #9175. Rebase required fixing the tests, updating the ABI of
libzfs, updating the man pages, fixing bugs, fixing the error returns,
and updating the old on-disk error logs to the new format when
activating the feature.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Co-authored-by: TulsiJain <tulsi.jain@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #9175
Closes #12812
2022-04-26 03:25:42 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Transfer common error blocks from old head to new head.
|
|
|
|
*/
|
|
|
|
if (spa_feature_is_enabled(dp->dp_spa, SPA_FEATURE_HEAD_ERRLOG)) {
|
|
|
|
uint64_t old_head = origin_head->ds_object;
|
|
|
|
uint64_t new_head = hds->ds_object;
|
|
|
|
spa_swap_errlog(dp->dp_spa, new_head, old_head, tx);
|
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* Make a list of dsl_dataset_t's for the snapshots between first_obj
|
|
|
|
* (exclusive) and last_obj (inclusive). The list will be in reverse
|
|
|
|
* order (last_obj will be the list_head()). If first_obj == 0, do all
|
|
|
|
* snapshots back to this dataset's origin.
|
|
|
|
*/
|
2008-11-20 23:01:55 +03:00
|
|
|
static int
|
2013-09-04 16:00:57 +04:00
|
|
|
snaplist_make(dsl_pool_t *dp,
|
2022-04-19 21:38:30 +03:00
|
|
|
uint64_t first_obj, uint64_t last_obj, list_t *l, const void *tag)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
uint64_t obj = last_obj;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
list_create(l, sizeof (struct promotenode),
|
|
|
|
offsetof(struct promotenode, link));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
while (obj != first_obj) {
|
|
|
|
dsl_dataset_t *ds;
|
|
|
|
struct promotenode *snap;
|
|
|
|
int err;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
err = dsl_dataset_hold_obj(dp, obj, tag, &ds);
|
|
|
|
ASSERT(err != ENOENT);
|
|
|
|
if (err != 0)
|
|
|
|
return (err);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (first_obj == 0)
|
2015-04-01 18:14:34 +03:00
|
|
|
first_obj = dsl_dir_phys(ds->ds_dir)->dd_origin_obj;
|
2013-09-04 16:00:57 +04:00
|
|
|
|
2014-11-21 03:09:39 +03:00
|
|
|
snap = kmem_alloc(sizeof (*snap), KM_SLEEP);
|
2013-09-04 16:00:57 +04:00
|
|
|
snap->ds = ds;
|
|
|
|
list_insert_tail(l, snap);
|
2015-04-01 18:14:34 +03:00
|
|
|
obj = dsl_dataset_phys(ds)->ds_prev_snap_obj;
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
static int
|
|
|
|
snaplist_space(list_t *l, uint64_t mintxg, uint64_t *spacep)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
struct promotenode *snap;
|
2013-08-28 15:45:09 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
*spacep = 0;
|
|
|
|
for (snap = list_head(l); snap; snap = list_next(l, snap)) {
|
|
|
|
uint64_t used, comp, uncomp;
|
|
|
|
dsl_deadlist_space_range(&snap->ds->ds_deadlist,
|
|
|
|
mintxg, UINT64_MAX, &used, &comp, &uncomp);
|
|
|
|
*spacep += used;
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
return (0);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
static void
|
2022-04-19 21:38:30 +03:00
|
|
|
snaplist_destroy(list_t *l, const void *tag)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
struct promotenode *snap;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (l == NULL || !list_link_active(&l->list_head))
|
|
|
|
return;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
while ((snap = list_tail(l)) != NULL) {
|
|
|
|
list_remove(l, snap);
|
|
|
|
dsl_dataset_rele(snap->ds, tag);
|
|
|
|
kmem_free(snap, sizeof (*snap));
|
|
|
|
}
|
|
|
|
list_destroy(l);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2022-04-19 21:38:30 +03:00
|
|
|
promote_hold(dsl_dataset_promote_arg_t *ddpa, dsl_pool_t *dp, const void *tag)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
int error;
|
|
|
|
dsl_dir_t *dd;
|
|
|
|
struct promotenode *snap;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_hold(dp, ddpa->ddpa_clonename, tag,
|
|
|
|
&ddpa->ddpa_clone);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
dd = ddpa->ddpa_clone->ds_dir;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2015-04-02 06:44:32 +03:00
|
|
|
if (ddpa->ddpa_clone->ds_is_snapshot ||
|
2013-09-04 16:00:57 +04:00
|
|
|
!dsl_dir_is_clone(dd)) {
|
|
|
|
dsl_dataset_rele(ddpa->ddpa_clone, tag);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
error = snaplist_make(dp, 0, dsl_dir_phys(dd)->dd_origin_obj,
|
2013-09-04 16:00:57 +04:00
|
|
|
&ddpa->shared_snaps, tag);
|
|
|
|
if (error != 0)
|
|
|
|
goto out;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = snaplist_make(dp, 0, ddpa->ddpa_clone->ds_object,
|
|
|
|
&ddpa->clone_snaps, tag);
|
|
|
|
if (error != 0)
|
|
|
|
goto out;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
snap = list_head(&ddpa->shared_snaps);
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT3U(snap->ds->ds_object, ==, dsl_dir_phys(dd)->dd_origin_obj);
|
|
|
|
error = snaplist_make(dp, dsl_dir_phys(dd)->dd_origin_obj,
|
|
|
|
dsl_dir_phys(snap->ds->ds_dir)->dd_head_dataset_obj,
|
2013-09-04 16:00:57 +04:00
|
|
|
&ddpa->origin_snaps, tag);
|
|
|
|
if (error != 0)
|
|
|
|
goto out;
|
2009-02-18 23:51:31 +03:00
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dir_phys(snap->ds->ds_dir)->dd_origin_obj != 0) {
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_hold_obj(dp,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(snap->ds->ds_dir)->dd_origin_obj,
|
2013-09-04 16:00:57 +04:00
|
|
|
tag, &ddpa->origin_origin);
|
|
|
|
if (error != 0)
|
|
|
|
goto out;
|
2009-02-18 23:51:31 +03:00
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
out:
|
|
|
|
if (error != 0)
|
|
|
|
promote_rele(ddpa, tag);
|
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
2022-04-19 21:38:30 +03:00
|
|
|
promote_rele(dsl_dataset_promote_arg_t *ddpa, const void *tag)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
snaplist_destroy(&ddpa->shared_snaps, tag);
|
|
|
|
snaplist_destroy(&ddpa->clone_snaps, tag);
|
|
|
|
snaplist_destroy(&ddpa->origin_snaps, tag);
|
|
|
|
if (ddpa->origin_origin != NULL)
|
|
|
|
dsl_dataset_rele(ddpa->origin_origin, tag);
|
|
|
|
dsl_dataset_rele(ddpa->ddpa_clone, tag);
|
|
|
|
}
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* Promote a clone.
|
|
|
|
*
|
|
|
|
* If it fails due to a conflicting snapshot name, "conflsnap" will be filled
|
2016-06-16 00:28:36 +03:00
|
|
|
* in with the name. (It must be at least ZFS_MAX_DATASET_NAME_LEN bytes long.)
|
2013-09-04 16:00:57 +04:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
dsl_dataset_promote(const char *name, char *conflsnap)
|
|
|
|
{
|
|
|
|
dsl_dataset_promote_arg_t ddpa = { 0 };
|
|
|
|
uint64_t numsnaps;
|
|
|
|
int error;
|
2018-02-08 19:16:23 +03:00
|
|
|
nvpair_t *snap_pair;
|
2013-09-04 16:00:57 +04:00
|
|
|
objset_t *os;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* We will modify space proportional to the number of
|
|
|
|
* snapshots. Compute numsnaps.
|
|
|
|
*/
|
|
|
|
error = dmu_objset_hold(name, FTAG, &os);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
error = zap_count(dmu_objset_pool(os)->dp_meta_objset,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(dmu_objset_ds(os))->ds_snapnames_zapobj,
|
|
|
|
&numsnaps);
|
2013-09-04 16:00:57 +04:00
|
|
|
dmu_objset_rele(os, FTAG);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
ddpa.ddpa_clonename = name;
|
2018-02-08 19:16:23 +03:00
|
|
|
ddpa.err_ds = fnvlist_alloc();
|
2015-04-01 16:07:48 +03:00
|
|
|
ddpa.cr = CRED();
|
2020-07-12 03:18:02 +03:00
|
|
|
ddpa.proc = curproc;
|
2013-08-28 15:45:09 +04:00
|
|
|
|
2018-02-08 19:16:23 +03:00
|
|
|
error = dsl_sync_task(name, dsl_dataset_promote_check,
|
2014-11-03 23:28:43 +03:00
|
|
|
dsl_dataset_promote_sync, &ddpa,
|
2018-02-08 19:16:23 +03:00
|
|
|
2 + numsnaps, ZFS_SPACE_CHECK_RESERVED);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return the first conflicting snapshot found.
|
|
|
|
*/
|
|
|
|
snap_pair = nvlist_next_nvpair(ddpa.err_ds, NULL);
|
|
|
|
if (snap_pair != NULL && conflsnap != NULL)
|
2020-06-07 21:42:12 +03:00
|
|
|
(void) strlcpy(conflsnap, nvpair_name(snap_pair),
|
|
|
|
ZFS_MAX_DATASET_NAME_LEN);
|
2018-02-08 19:16:23 +03:00
|
|
|
|
|
|
|
fnvlist_free(ddpa.err_ds);
|
|
|
|
return (error);
|
2008-11-20 23:01:55 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
int
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_clone_swap_check_impl(dsl_dataset_t *clone,
|
2013-07-27 21:50:07 +04:00
|
|
|
dsl_dataset_t *origin_head, boolean_t force, void *owner, dmu_tx_t *tx)
|
2008-11-20 23:01:55 +03:00
|
|
|
{
|
2016-06-09 22:29:09 +03:00
|
|
|
/*
|
|
|
|
* "slack" factor for received datasets with refquota set on them.
|
|
|
|
* See the bottom of this function for details on its use.
|
|
|
|
*/
|
2016-07-27 11:26:38 +03:00
|
|
|
uint64_t refquota_slack = (uint64_t)DMU_MAX_ACCESS *
|
|
|
|
spa_asize_inflation;
|
2013-09-04 16:00:57 +04:00
|
|
|
int64_t unused_refres_delta;
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/* they should both be heads */
|
2015-04-02 06:44:32 +03:00
|
|
|
if (clone->ds_is_snapshot ||
|
|
|
|
origin_head->ds_is_snapshot)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-07-29 22:55:16 +04:00
|
|
|
/* if we are not forcing, the branch point should be just before them */
|
|
|
|
if (!force && clone->ds_prev != origin_head->ds_prev)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2008-11-20 23:01:55 +03:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/* clone should be the clone (unless they are unrelated) */
|
|
|
|
if (clone->ds_prev != NULL &&
|
|
|
|
clone->ds_prev != clone->ds_dir->dd_pool->dp_origin_snap &&
|
2013-07-29 22:55:16 +04:00
|
|
|
origin_head->ds_dir != clone->ds_prev->ds_dir)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/* the clone should be a child of the origin */
|
|
|
|
if (clone->ds_dir->dd_parent != origin_head->ds_dir)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/* origin_head shouldn't be modified unless 'force' */
|
2013-07-29 22:55:16 +04:00
|
|
|
if (!force &&
|
|
|
|
dsl_dataset_modified_since_snap(origin_head, origin_head->ds_prev))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ETXTBSY));
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/* origin_head should have no long holds (e.g. is not mounted) */
|
2013-07-27 21:50:07 +04:00
|
|
|
if (dsl_dataset_handoff_check(origin_head, owner, tx))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EBUSY));
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
/* check amount of any unconsumed refreservation */
|
|
|
|
unused_refres_delta =
|
|
|
|
(int64_t)MIN(origin_head->ds_reserved,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin_head)->ds_unique_bytes) -
|
2013-09-04 16:00:57 +04:00
|
|
|
(int64_t)MIN(origin_head->ds_reserved,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(clone)->ds_unique_bytes);
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
if (unused_refres_delta > 0 &&
|
|
|
|
unused_refres_delta >
|
|
|
|
dsl_dir_space_available(origin_head->ds_dir, NULL, 0, TRUE))
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOSPC));
|
2013-09-04 16:00:57 +04:00
|
|
|
|
2016-06-09 22:29:09 +03:00
|
|
|
/*
|
|
|
|
* The clone can't be too much over the head's refquota.
|
|
|
|
*
|
|
|
|
* To ensure that the entire refquota can be used, we allow one
|
2019-09-03 03:56:41 +03:00
|
|
|
* transaction to exceed the refquota. Therefore, this check
|
2016-06-09 22:29:09 +03:00
|
|
|
* needs to also allow for the space referenced to be more than the
|
|
|
|
* refquota. The maximum amount of space that one transaction can use
|
|
|
|
* on disk is DMU_MAX_ACCESS * spa_asize_inflation. Allowing this
|
|
|
|
* overage ensures that we are able to receive a filesystem that
|
|
|
|
* exceeds the refquota on the source system.
|
|
|
|
*
|
|
|
|
* So that overage is the refquota_slack we use below.
|
|
|
|
*/
|
2013-09-04 16:00:57 +04:00
|
|
|
if (origin_head->ds_quota != 0 &&
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(clone)->ds_referenced_bytes >
|
2016-06-09 22:29:09 +03:00
|
|
|
origin_head->ds_quota + refquota_slack)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EDQUOT));
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
return (0);
|
2010-08-27 01:24:34 +04:00
|
|
|
}
|
|
|
|
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
static void
|
|
|
|
dsl_dataset_swap_remap_deadlists(dsl_dataset_t *clone,
|
|
|
|
dsl_dataset_t *origin, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
uint64_t clone_remap_dl_obj, origin_remap_dl_obj;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
|
|
|
|
|
|
|
ASSERT(dsl_pool_sync_context(dp));
|
|
|
|
|
|
|
|
clone_remap_dl_obj = dsl_dataset_get_remap_deadlist_object(clone);
|
|
|
|
origin_remap_dl_obj = dsl_dataset_get_remap_deadlist_object(origin);
|
|
|
|
|
|
|
|
if (clone_remap_dl_obj != 0) {
|
|
|
|
dsl_deadlist_close(&clone->ds_remap_deadlist);
|
|
|
|
dsl_dataset_unset_remap_deadlist_object(clone, tx);
|
|
|
|
}
|
|
|
|
if (origin_remap_dl_obj != 0) {
|
|
|
|
dsl_deadlist_close(&origin->ds_remap_deadlist);
|
|
|
|
dsl_dataset_unset_remap_deadlist_object(origin, tx);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (clone_remap_dl_obj != 0) {
|
|
|
|
dsl_dataset_set_remap_deadlist_object(origin,
|
|
|
|
clone_remap_dl_obj, tx);
|
|
|
|
dsl_deadlist_open(&origin->ds_remap_deadlist,
|
|
|
|
dp->dp_meta_objset, clone_remap_dl_obj);
|
|
|
|
}
|
|
|
|
if (origin_remap_dl_obj != 0) {
|
|
|
|
dsl_dataset_set_remap_deadlist_object(clone,
|
|
|
|
origin_remap_dl_obj, tx);
|
|
|
|
dsl_deadlist_open(&clone->ds_remap_deadlist,
|
|
|
|
dp->dp_meta_objset, origin_remap_dl_obj);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2010-08-27 01:24:34 +04:00
|
|
|
void
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_clone_swap_sync_impl(dsl_dataset_t *clone,
|
|
|
|
dsl_dataset_t *origin_head, dmu_tx_t *tx)
|
2010-08-27 01:24:34 +04:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
|
|
|
int64_t unused_refres_delta;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT(clone->ds_reserved == 0);
|
2016-06-09 22:29:09 +03:00
|
|
|
/*
|
|
|
|
* NOTE: On DEBUG kernels there could be a race between this and
|
|
|
|
* the check function if spa_asize_inflation is adjusted...
|
|
|
|
*/
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT(origin_head->ds_quota == 0 ||
|
2016-06-09 22:29:09 +03:00
|
|
|
dsl_dataset_phys(clone)->ds_unique_bytes <= origin_head->ds_quota +
|
|
|
|
DMU_MAX_ACCESS * spa_asize_inflation);
|
2013-07-29 22:55:16 +04:00
|
|
|
ASSERT3P(clone->ds_prev, ==, origin_head->ds_prev);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2020-04-01 20:02:06 +03:00
|
|
|
dsl_dir_cancel_waiters(origin_head->ds_dir);
|
|
|
|
|
2015-07-24 19:53:55 +03:00
|
|
|
/*
|
|
|
|
* Swap per-dataset feature flags.
|
|
|
|
*/
|
2017-02-08 20:27:48 +03:00
|
|
|
for (spa_feature_t f = 0; f < SPA_FEATURES; f++) {
|
2015-07-24 19:53:55 +03:00
|
|
|
if (!(spa_feature_table[f].fi_flags &
|
|
|
|
ZFEATURE_FLAG_PER_DATASET)) {
|
2018-10-16 21:15:04 +03:00
|
|
|
ASSERT(!dsl_dataset_feature_is_active(clone, f));
|
|
|
|
ASSERT(!dsl_dataset_feature_is_active(origin_head, f));
|
2015-07-24 19:53:55 +03:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2018-10-16 21:15:04 +03:00
|
|
|
boolean_t clone_inuse = dsl_dataset_feature_is_active(clone, f);
|
|
|
|
void *clone_feature = clone->ds_feature[f];
|
|
|
|
boolean_t origin_head_inuse =
|
|
|
|
dsl_dataset_feature_is_active(origin_head, f);
|
|
|
|
void *origin_head_feature = origin_head->ds_feature[f];
|
|
|
|
|
|
|
|
if (clone_inuse)
|
|
|
|
dsl_dataset_deactivate_feature_impl(clone, f, tx);
|
|
|
|
if (origin_head_inuse)
|
|
|
|
dsl_dataset_deactivate_feature_impl(origin_head, f, tx);
|
2015-07-24 19:53:55 +03:00
|
|
|
|
|
|
|
if (clone_inuse) {
|
2018-10-16 21:15:04 +03:00
|
|
|
dsl_dataset_activate_feature(origin_head->ds_object, f,
|
|
|
|
clone_feature, tx);
|
|
|
|
origin_head->ds_feature[f] = clone_feature;
|
2015-07-24 19:53:55 +03:00
|
|
|
}
|
|
|
|
if (origin_head_inuse) {
|
2018-10-16 21:15:04 +03:00
|
|
|
dsl_dataset_activate_feature(clone->ds_object, f,
|
|
|
|
origin_head_feature, tx);
|
|
|
|
clone->ds_feature[f] = origin_head_feature;
|
2015-07-24 19:53:55 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dmu_buf_will_dirty(clone->ds_dbuf, tx);
|
|
|
|
dmu_buf_will_dirty(origin_head->ds_dbuf, tx);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (clone->ds_objset != NULL) {
|
|
|
|
dmu_objset_evict(clone->ds_objset);
|
|
|
|
clone->ds_objset = NULL;
|
|
|
|
}
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (origin_head->ds_objset != NULL) {
|
|
|
|
dmu_objset_evict(origin_head->ds_objset);
|
|
|
|
origin_head->ds_objset = NULL;
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
unused_refres_delta =
|
|
|
|
(int64_t)MIN(origin_head->ds_reserved,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin_head)->ds_unique_bytes) -
|
2013-09-04 16:00:57 +04:00
|
|
|
(int64_t)MIN(origin_head->ds_reserved,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(clone)->ds_unique_bytes);
|
2013-09-04 16:00:57 +04:00
|
|
|
|
|
|
|
/*
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
* Reset origin's unique bytes.
|
2013-09-04 16:00:57 +04:00
|
|
|
*/
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_t *origin = clone->ds_prev;
|
|
|
|
uint64_t comp, uncomp;
|
|
|
|
|
|
|
|
dmu_buf_will_dirty(origin->ds_dbuf, tx);
|
|
|
|
dsl_deadlist_space_range(&clone->ds_deadlist,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin)->ds_prev_snap_txg, UINT64_MAX,
|
|
|
|
&dsl_dataset_phys(origin)->ds_unique_bytes, &comp, &uncomp);
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* swap blkptrs */
|
|
|
|
{
|
2017-01-27 22:43:42 +03:00
|
|
|
rrw_enter(&clone->ds_bp_rwlock, RW_WRITER, FTAG);
|
|
|
|
rrw_enter(&origin_head->ds_bp_rwlock, RW_WRITER, FTAG);
|
2017-11-04 23:25:13 +03:00
|
|
|
blkptr_t tmp;
|
2015-04-01 18:14:34 +03:00
|
|
|
tmp = dsl_dataset_phys(origin_head)->ds_bp;
|
|
|
|
dsl_dataset_phys(origin_head)->ds_bp =
|
|
|
|
dsl_dataset_phys(clone)->ds_bp;
|
|
|
|
dsl_dataset_phys(clone)->ds_bp = tmp;
|
2017-01-27 22:43:42 +03:00
|
|
|
rrw_exit(&origin_head->ds_bp_rwlock, FTAG);
|
|
|
|
rrw_exit(&clone->ds_bp_rwlock, FTAG);
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* set dd_*_bytes */
|
|
|
|
{
|
|
|
|
int64_t dused, dcomp, duncomp;
|
|
|
|
uint64_t cdl_used, cdl_comp, cdl_uncomp;
|
|
|
|
uint64_t odl_used, odl_comp, odl_uncomp;
|
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
ASSERT3U(dsl_dir_phys(clone->ds_dir)->
|
2013-09-04 16:00:57 +04:00
|
|
|
dd_used_breakdown[DD_USED_SNAP], ==, 0);
|
|
|
|
|
|
|
|
dsl_deadlist_space(&clone->ds_deadlist,
|
|
|
|
&cdl_used, &cdl_comp, &cdl_uncomp);
|
|
|
|
dsl_deadlist_space(&origin_head->ds_deadlist,
|
|
|
|
&odl_used, &odl_comp, &odl_uncomp);
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
dused = dsl_dataset_phys(clone)->ds_referenced_bytes +
|
|
|
|
cdl_used -
|
|
|
|
(dsl_dataset_phys(origin_head)->ds_referenced_bytes +
|
|
|
|
odl_used);
|
|
|
|
dcomp = dsl_dataset_phys(clone)->ds_compressed_bytes +
|
|
|
|
cdl_comp -
|
|
|
|
(dsl_dataset_phys(origin_head)->ds_compressed_bytes +
|
|
|
|
odl_comp);
|
|
|
|
duncomp = dsl_dataset_phys(clone)->ds_uncompressed_bytes +
|
2013-09-04 16:00:57 +04:00
|
|
|
cdl_uncomp -
|
2015-04-01 18:14:34 +03:00
|
|
|
(dsl_dataset_phys(origin_head)->ds_uncompressed_bytes +
|
|
|
|
odl_uncomp);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dir_diduse_space(origin_head->ds_dir, DD_USED_HEAD,
|
|
|
|
dused, dcomp, duncomp, tx);
|
|
|
|
dsl_dir_diduse_space(clone->ds_dir, DD_USED_HEAD,
|
|
|
|
-dused, -dcomp, -duncomp, tx);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
|
|
|
/*
|
2013-09-04 16:00:57 +04:00
|
|
|
* The difference in the space used by snapshots is the
|
|
|
|
* difference in snapshot space due to the head's
|
|
|
|
* deadlist (since that's the only thing that's
|
|
|
|
* changing that affects the snapused).
|
2009-08-18 22:43:27 +04:00
|
|
|
*/
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_deadlist_space_range(&clone->ds_deadlist,
|
|
|
|
origin_head->ds_dir->dd_origin_txg, UINT64_MAX,
|
|
|
|
&cdl_used, &cdl_comp, &cdl_uncomp);
|
|
|
|
dsl_deadlist_space_range(&origin_head->ds_deadlist,
|
|
|
|
origin_head->ds_dir->dd_origin_txg, UINT64_MAX,
|
|
|
|
&odl_used, &odl_comp, &odl_uncomp);
|
|
|
|
dsl_dir_transfer_space(origin_head->ds_dir, cdl_used - odl_used,
|
|
|
|
DD_USED_HEAD, DD_USED_SNAP, tx);
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/* swap ds_*_bytes */
|
2015-04-01 18:14:34 +03:00
|
|
|
SWITCH64(dsl_dataset_phys(origin_head)->ds_referenced_bytes,
|
|
|
|
dsl_dataset_phys(clone)->ds_referenced_bytes);
|
|
|
|
SWITCH64(dsl_dataset_phys(origin_head)->ds_compressed_bytes,
|
|
|
|
dsl_dataset_phys(clone)->ds_compressed_bytes);
|
|
|
|
SWITCH64(dsl_dataset_phys(origin_head)->ds_uncompressed_bytes,
|
|
|
|
dsl_dataset_phys(clone)->ds_uncompressed_bytes);
|
|
|
|
SWITCH64(dsl_dataset_phys(origin_head)->ds_unique_bytes,
|
|
|
|
dsl_dataset_phys(clone)->ds_unique_bytes);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/* apply any parent delta for change in unconsumed refreservation */
|
|
|
|
dsl_dir_diduse_space(origin_head->ds_dir, DD_USED_REFRSRV,
|
|
|
|
unused_refres_delta, 0, 0, tx);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* Swap deadlists.
|
|
|
|
*/
|
|
|
|
dsl_deadlist_close(&clone->ds_deadlist);
|
|
|
|
dsl_deadlist_close(&origin_head->ds_deadlist);
|
2015-04-01 18:14:34 +03:00
|
|
|
SWITCH64(dsl_dataset_phys(origin_head)->ds_deadlist_obj,
|
|
|
|
dsl_dataset_phys(clone)->ds_deadlist_obj);
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_deadlist_open(&clone->ds_deadlist, dp->dp_meta_objset,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(clone)->ds_deadlist_obj);
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_deadlist_open(&origin_head->ds_deadlist, dp->dp_meta_objset,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(origin_head)->ds_deadlist_obj);
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
dsl_dataset_swap_remap_deadlists(clone, origin_head, tx);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
/*
|
|
|
|
* If there is a bookmark at the origin, its "next dataset" is
|
|
|
|
* changing, so we need to reset its FBN.
|
|
|
|
*/
|
|
|
|
dsl_bookmark_next_changed(origin_head, origin_head->ds_prev, tx);
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_scan_ds_clone_swapped(origin_head, clone, tx);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2019-07-26 20:54:14 +03:00
|
|
|
/*
|
|
|
|
* Destroy any livelists associated with the clone or the origin,
|
|
|
|
* since after the swap the corresponding livelists are no longer
|
|
|
|
* valid.
|
|
|
|
*/
|
|
|
|
dsl_dir_remove_livelist(clone->ds_dir, tx, B_TRUE);
|
|
|
|
dsl_dir_remove_livelist(origin_head->ds_dir, tx, B_TRUE);
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
spa_history_log_internal_ds(clone, "clone swap", tx,
|
|
|
|
"parent=%s", origin_head->ds_dir->dd_myname);
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* Given a pool name and a dataset object number in that pool,
|
|
|
|
* return the name of that dataset.
|
|
|
|
*/
|
2010-08-27 01:24:34 +04:00
|
|
|
int
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dsobj_to_dsname(char *pname, uint64_t obj, char *buf)
|
2010-08-27 01:24:34 +04:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_pool_t *dp;
|
|
|
|
dsl_dataset_t *ds;
|
2010-08-27 01:24:34 +04:00
|
|
|
int error;
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_pool_hold(pname, FTAG, &dp);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
error = dsl_dataset_hold_obj(dp, obj, FTAG, &ds);
|
|
|
|
if (error == 0) {
|
|
|
|
dsl_dataset_name(ds, buf);
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
}
|
|
|
|
dsl_pool_rele(dp, FTAG);
|
2010-08-27 01:24:34 +04:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2009-08-18 22:43:27 +04:00
|
|
|
int
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_check_quota(dsl_dataset_t *ds, boolean_t check_quota,
|
|
|
|
uint64_t asize, uint64_t inflight, uint64_t *used, uint64_t *ref_rsrv)
|
2009-08-18 22:43:27 +04:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
int error = 0;
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT3S(asize, >, 0);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* *ref_rsrv is the portion of asize that will come from any
|
|
|
|
* unconsumed refreservation space.
|
|
|
|
*/
|
|
|
|
*ref_rsrv = 0;
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
mutex_enter(&ds->ds_lock);
|
|
|
|
/*
|
|
|
|
* Make a space adjustment for reserved bytes.
|
|
|
|
*/
|
2015-04-01 18:14:34 +03:00
|
|
|
if (ds->ds_reserved > dsl_dataset_phys(ds)->ds_unique_bytes) {
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT3U(*used, >=,
|
2015-04-01 18:14:34 +03:00
|
|
|
ds->ds_reserved - dsl_dataset_phys(ds)->ds_unique_bytes);
|
|
|
|
*used -=
|
|
|
|
(ds->ds_reserved - dsl_dataset_phys(ds)->ds_unique_bytes);
|
2013-09-04 16:00:57 +04:00
|
|
|
*ref_rsrv =
|
|
|
|
asize - MIN(asize, parent_delta(ds, asize + inflight));
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (!check_quota || ds->ds_quota == 0) {
|
|
|
|
mutex_exit(&ds->ds_lock);
|
|
|
|
return (0);
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* If they are requesting more space, and our current estimate
|
|
|
|
* is over quota, they get to try again unless the actual
|
|
|
|
* on-disk is over quota and there are no pending changes (which
|
|
|
|
* may free up space for us).
|
|
|
|
*/
|
2015-04-01 18:14:34 +03:00
|
|
|
if (dsl_dataset_phys(ds)->ds_referenced_bytes + inflight >=
|
|
|
|
ds->ds_quota) {
|
2013-09-04 16:00:57 +04:00
|
|
|
if (inflight > 0 ||
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(ds)->ds_referenced_bytes < ds->ds_quota)
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(ERESTART);
|
2013-09-04 16:00:57 +04:00
|
|
|
else
|
2013-03-08 22:41:28 +04:00
|
|
|
error = SET_ERROR(EDQUOT);
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
mutex_exit(&ds->ds_lock);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
typedef struct dsl_dataset_set_qr_arg {
|
|
|
|
const char *ddsqra_name;
|
|
|
|
zprop_source_t ddsqra_source;
|
|
|
|
uint64_t ddsqra_value;
|
|
|
|
} dsl_dataset_set_qr_arg_t;
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
|
2009-08-18 22:43:27 +04:00
|
|
|
static int
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_set_refquota_check(void *arg, dmu_tx_t *tx)
|
2009-08-18 22:43:27 +04:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_set_qr_arg_t *ddsqra = arg;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
|
|
|
dsl_dataset_t *ds;
|
2009-08-18 22:43:27 +04:00
|
|
|
int error;
|
2013-09-04 16:00:57 +04:00
|
|
|
uint64_t newval;
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (spa_version(dp->dp_spa) < SPA_VERSION_REFQUOTA)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_hold(dp, ddsqra->ddsqra_name, FTAG, &ds);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
2015-04-02 06:44:32 +03:00
|
|
|
if (ds->ds_is_snapshot) {
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_prop_predict(ds->ds_dir,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_REFQUOTA),
|
|
|
|
ddsqra->ddsqra_source, ddsqra->ddsqra_value, &newval);
|
|
|
|
if (error != 0) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2009-08-18 22:43:27 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (newval == 0) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
if (newval < dsl_dataset_phys(ds)->ds_referenced_bytes ||
|
2013-09-04 16:00:57 +04:00
|
|
|
newval < ds->ds_reserved) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOSPC));
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2009-08-18 22:43:27 +04:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
static void
|
|
|
|
dsl_dataset_set_refquota_sync(void *arg, dmu_tx_t *tx)
|
2009-08-18 22:43:27 +04:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_set_qr_arg_t *ddsqra = arg;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
2016-07-26 22:08:51 +03:00
|
|
|
dsl_dataset_t *ds = NULL;
|
2013-09-04 16:00:57 +04:00
|
|
|
uint64_t newval;
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_dataset_hold(dp, ddsqra->ddsqra_name, FTAG, &ds));
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_prop_set_sync_impl(ds,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_REFQUOTA),
|
|
|
|
ddsqra->ddsqra_source, sizeof (ddsqra->ddsqra_value), 1,
|
|
|
|
&ddsqra->ddsqra_value, tx);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_prop_get_int_ds(ds,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_REFQUOTA), &newval));
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (ds->ds_quota != newval) {
|
|
|
|
dmu_buf_will_dirty(ds->ds_dbuf, tx);
|
|
|
|
ds->ds_quota = newval;
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
int
|
|
|
|
dsl_dataset_set_refquota(const char *dsname, zprop_source_t source,
|
|
|
|
uint64_t refquota)
|
2009-08-18 22:43:27 +04:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_set_qr_arg_t ddsqra;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
ddsqra.ddsqra_name = dsname;
|
|
|
|
ddsqra.ddsqra_source = source;
|
|
|
|
ddsqra.ddsqra_value = refquota;
|
|
|
|
|
|
|
|
return (dsl_sync_task(dsname, dsl_dataset_set_refquota_check,
|
2016-12-17 01:11:29 +03:00
|
|
|
dsl_dataset_set_refquota_sync, &ddsqra, 0,
|
|
|
|
ZFS_SPACE_CHECK_EXTRA_RESERVED));
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_set_refreservation_check(void *arg, dmu_tx_t *tx)
|
2009-08-18 22:43:27 +04:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_set_qr_arg_t *ddsqra = arg;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
2009-08-18 22:43:27 +04:00
|
|
|
dsl_dataset_t *ds;
|
|
|
|
int error;
|
2013-09-04 16:00:57 +04:00
|
|
|
uint64_t newval, unique;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (spa_version(dp->dp_spa) < SPA_VERSION_REFRESERVATION)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOTSUP));
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_hold(dp, ddsqra->ddsqra_name, FTAG, &ds);
|
|
|
|
if (error != 0)
|
2009-08-18 22:43:27 +04:00
|
|
|
return (error);
|
|
|
|
|
2015-04-02 06:44:32 +03:00
|
|
|
if (ds->ds_is_snapshot) {
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_prop_predict(ds->ds_dir,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_REFRESERVATION),
|
|
|
|
ddsqra->ddsqra_source, ddsqra->ddsqra_value, &newval);
|
|
|
|
if (error != 0) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2009-08-18 22:43:27 +04:00
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* If we are doing the preliminary check in open context, the
|
|
|
|
* space estimates may be inaccurate.
|
|
|
|
*/
|
|
|
|
if (!dmu_tx_is_syncing(tx)) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
return (0);
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
mutex_enter(&ds->ds_lock);
|
|
|
|
if (!DS_UNIQUE_IS_ACCURATE(ds))
|
|
|
|
dsl_dataset_recalc_head_uniq(ds);
|
2015-04-01 18:14:34 +03:00
|
|
|
unique = dsl_dataset_phys(ds)->ds_unique_bytes;
|
2013-09-04 16:00:57 +04:00
|
|
|
mutex_exit(&ds->ds_lock);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (MAX(unique, newval) > MAX(unique, ds->ds_reserved)) {
|
|
|
|
uint64_t delta = MAX(unique, newval) -
|
|
|
|
MAX(unique, ds->ds_reserved);
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
if (delta >
|
|
|
|
dsl_dir_space_available(ds->ds_dir, NULL, 0, B_TRUE) ||
|
|
|
|
(ds->ds_quota > 0 && newval > ds->ds_quota)) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(ENOSPC));
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
return (0);
|
2009-08-18 22:43:27 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
void
|
|
|
|
dsl_dataset_set_refreservation_sync_impl(dsl_dataset_t *ds,
|
|
|
|
zprop_source_t source, uint64_t value, dmu_tx_t *tx)
|
2010-05-29 00:45:14 +04:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
uint64_t newval;
|
|
|
|
uint64_t unique;
|
|
|
|
int64_t delta;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_prop_set_sync_impl(ds, zfs_prop_to_name(ZFS_PROP_REFRESERVATION),
|
|
|
|
source, sizeof (value), 1, &value, tx);
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_prop_get_int_ds(ds,
|
|
|
|
zfs_prop_to_name(ZFS_PROP_REFRESERVATION), &newval));
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dmu_buf_will_dirty(ds->ds_dbuf, tx);
|
|
|
|
mutex_enter(&ds->ds_dir->dd_lock);
|
|
|
|
mutex_enter(&ds->ds_lock);
|
|
|
|
ASSERT(DS_UNIQUE_IS_ACCURATE(ds));
|
2015-04-01 18:14:34 +03:00
|
|
|
unique = dsl_dataset_phys(ds)->ds_unique_bytes;
|
2013-09-04 16:00:57 +04:00
|
|
|
delta = MAX(0, (int64_t)(newval - unique)) -
|
|
|
|
MAX(0, (int64_t)(ds->ds_reserved - unique));
|
|
|
|
ds->ds_reserved = newval;
|
|
|
|
mutex_exit(&ds->ds_lock);
|
2010-08-27 01:24:34 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dir_diduse_space(ds->ds_dir, DD_USED_REFRSRV, delta, 0, 0, tx);
|
|
|
|
mutex_exit(&ds->ds_dir->dd_lock);
|
2010-05-29 00:45:14 +04:00
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
static void
|
|
|
|
dsl_dataset_set_refreservation_sync(void *arg, dmu_tx_t *tx)
|
2009-08-18 22:43:27 +04:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_set_qr_arg_t *ddsqra = arg;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
2016-07-26 22:08:51 +03:00
|
|
|
dsl_dataset_t *ds = NULL;
|
2009-08-18 22:43:27 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
VERIFY0(dsl_dataset_hold(dp, ddsqra->ddsqra_name, FTAG, &ds));
|
|
|
|
dsl_dataset_set_refreservation_sync_impl(ds,
|
|
|
|
ddsqra->ddsqra_source, ddsqra->ddsqra_value, tx);
|
2009-08-18 22:43:27 +04:00
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
}
|
2010-05-29 00:45:14 +04:00
|
|
|
|
|
|
|
int
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_set_refreservation(const char *dsname, zprop_source_t source,
|
|
|
|
uint64_t refreservation)
|
2010-05-29 00:45:14 +04:00
|
|
|
{
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_set_qr_arg_t ddsqra;
|
2010-05-29 00:45:14 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
ddsqra.ddsqra_name = dsname;
|
|
|
|
ddsqra.ddsqra_source = source;
|
|
|
|
ddsqra.ddsqra_value = refreservation;
|
2010-08-26 22:49:16 +04:00
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
return (dsl_sync_task(dsname, dsl_dataset_set_refreservation_check,
|
2016-12-17 01:11:29 +03:00
|
|
|
dsl_dataset_set_refreservation_sync, &ddsqra, 0,
|
|
|
|
ZFS_SPACE_CHECK_EXTRA_RESERVED));
|
2013-09-04 16:00:57 +04:00
|
|
|
}
|
2011-11-17 22:14:36 +04:00
|
|
|
|
Add zstd support to zfs
This PR adds two new compression types, based on ZStandard:
- zstd: A basic ZStandard compression algorithm Available compression.
Levels for zstd are zstd-1 through zstd-19, where the compression
increases with every level, but speed decreases.
- zstd-fast: A faster version of the ZStandard compression algorithm
zstd-fast is basically a "negative" level of zstd. The compression
decreases with every level, but speed increases.
Available compression levels for zstd-fast:
- zstd-fast-1 through zstd-fast-10
- zstd-fast-20 through zstd-fast-100 (in increments of 10)
- zstd-fast-500 and zstd-fast-1000
For more information check the man page.
Implementation details:
Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.
The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers). The upper bits are used to store the compression level.
It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.
All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables. Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).
The userspace tools all use the combined/bit-shifted value.
Additional notes:
zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.
ZSTD is included with all current tests and new tests are added
as-needed.
Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born. This is currently only used by zstd but can be
extended as needed.
Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #6247
Closes #9024
Closes #10277
Closes #10278
2020-08-18 20:10:17 +03:00
|
|
|
typedef struct dsl_dataset_set_compression_arg {
|
|
|
|
const char *ddsca_name;
|
|
|
|
zprop_source_t ddsca_source;
|
|
|
|
uint64_t ddsca_value;
|
|
|
|
} dsl_dataset_set_compression_arg_t;
|
|
|
|
|
|
|
|
static int
|
|
|
|
dsl_dataset_set_compression_check(void *arg, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
dsl_dataset_set_compression_arg_t *ddsca = arg;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
|
|
|
|
|
|
|
uint64_t compval = ZIO_COMPRESS_ALGO(ddsca->ddsca_value);
|
|
|
|
spa_feature_t f = zio_compress_to_feature(compval);
|
|
|
|
|
|
|
|
if (f == SPA_FEATURE_NONE)
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
|
|
|
if (!spa_feature_is_enabled(dp->dp_spa, f))
|
|
|
|
return (SET_ERROR(ENOTSUP));
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
dsl_dataset_set_compression_sync(void *arg, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
dsl_dataset_set_compression_arg_t *ddsca = arg;
|
|
|
|
dsl_pool_t *dp = dmu_tx_pool(tx);
|
|
|
|
dsl_dataset_t *ds = NULL;
|
|
|
|
|
|
|
|
uint64_t compval = ZIO_COMPRESS_ALGO(ddsca->ddsca_value);
|
|
|
|
spa_feature_t f = zio_compress_to_feature(compval);
|
2022-09-14 22:50:03 +03:00
|
|
|
ASSERT3S(f, !=, SPA_FEATURE_NONE);
|
Add zstd support to zfs
This PR adds two new compression types, based on ZStandard:
- zstd: A basic ZStandard compression algorithm Available compression.
Levels for zstd are zstd-1 through zstd-19, where the compression
increases with every level, but speed decreases.
- zstd-fast: A faster version of the ZStandard compression algorithm
zstd-fast is basically a "negative" level of zstd. The compression
decreases with every level, but speed increases.
Available compression levels for zstd-fast:
- zstd-fast-1 through zstd-fast-10
- zstd-fast-20 through zstd-fast-100 (in increments of 10)
- zstd-fast-500 and zstd-fast-1000
For more information check the man page.
Implementation details:
Rather than treat each level of zstd as a different algorithm (as was
done historically with gzip), the block pointer `enum zio_compress`
value is simply zstd for all levels, including zstd-fast, since they all
use the same decompression function.
The compress= property (a 64bit unsigned integer) uses the lower 7 bits
to store the compression algorithm (matching the number of bits used in
a block pointer, as the 8th bit was borrowed for embedded block
pointers). The upper bits are used to store the compression level.
It is necessary to be able to determine what compression level was used
when later reading a block back, so the concept used in LZ4, where the
first 32bits of the on-disk value are the size of the compressed data
(since the allocation is rounded up to the nearest ashift), was
extended, and we store the version of ZSTD and the level as well as the
compressed size. This value is returned when decompressing a block, so
that if the block needs to be recompressed (L2ARC, nop-write, etc), that
the same parameters will be used to result in the matching checksum.
All of the internal ZFS code ( `arc_buf_hdr_t`, `objset_t`,
`zio_prop_t`, etc.) uses the separated _compress and _complevel
variables. Only the properties ZAP contains the combined/bit-shifted
value. The combined value is split when the compression_changed_cb()
callback is called, and sets both objset members (os_compress and
os_complevel).
The userspace tools all use the combined/bit-shifted value.
Additional notes:
zdb can now also decode the ZSTD compression header (flag -Z) and
inspect the size, version and compression level saved in that header.
For each record, if it is ZSTD compressed, the parameters of the decoded
compression header get printed.
ZSTD is included with all current tests and new tests are added
as-needed.
Per-dataset feature flags now get activated when the property is set.
If a compression algorithm requires a feature flag, zfs activates the
feature when the property is set, rather than waiting for the first
block to be born. This is currently only used by zstd but can be
extended as needed.
Portions-Sponsored-By: The FreeBSD Foundation
Co-authored-by: Allan Jude <allanjude@freebsd.org>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Co-authored-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Co-authored-by: Michael Niewöhner <foss@mniewoehner.de>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Signed-off-by: Michael Niewöhner <foss@mniewoehner.de>
Closes #6247
Closes #9024
Closes #10277
Closes #10278
2020-08-18 20:10:17 +03:00
|
|
|
ASSERT3S(spa_feature_table[f].fi_type, ==, ZFEATURE_TYPE_BOOLEAN);
|
|
|
|
|
|
|
|
VERIFY0(dsl_dataset_hold(dp, ddsca->ddsca_name, FTAG, &ds));
|
|
|
|
if (zfeature_active(f, ds->ds_feature[f]) != B_TRUE) {
|
|
|
|
ds->ds_feature_activation[f] = (void *)B_TRUE;
|
|
|
|
dsl_dataset_activate_feature(ds->ds_object, f,
|
|
|
|
ds->ds_feature_activation[f], tx);
|
|
|
|
ds->ds_feature[f] = ds->ds_feature_activation[f];
|
|
|
|
}
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
dsl_dataset_set_compression(const char *dsname, zprop_source_t source,
|
|
|
|
uint64_t compression)
|
|
|
|
{
|
|
|
|
dsl_dataset_set_compression_arg_t ddsca;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The sync task is only required for zstd in order to activate
|
|
|
|
* the feature flag when the property is first set.
|
|
|
|
*/
|
|
|
|
if (ZIO_COMPRESS_ALGO(compression) != ZIO_COMPRESS_ZSTD)
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
ddsca.ddsca_name = dsname;
|
|
|
|
ddsca.ddsca_source = source;
|
|
|
|
ddsca.ddsca_value = compression;
|
|
|
|
|
|
|
|
return (dsl_sync_task(dsname, dsl_dataset_set_compression_check,
|
|
|
|
dsl_dataset_set_compression_sync, &ddsca, 0,
|
|
|
|
ZFS_SPACE_CHECK_EXTRA_RESERVED));
|
|
|
|
}
|
|
|
|
|
2011-11-17 22:14:36 +04:00
|
|
|
/*
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
* Return (in *usedp) the amount of space referenced by "new" that was not
|
|
|
|
* referenced at the time the bookmark corresponds to. "New" may be a
|
|
|
|
* snapshot or a head. The bookmark must be before new, in
|
|
|
|
* new's filesystem (or its origin) -- caller verifies this.
|
2011-11-17 22:14:36 +04:00
|
|
|
*
|
|
|
|
* The written space is calculated by considering two components: First, we
|
|
|
|
* ignore any freed space, and calculate the written as new's used space
|
|
|
|
* minus old's used space. Next, we add in the amount of space that was freed
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
* between the two time points, thus reducing new's used space relative to
|
|
|
|
* old's. Specifically, this is the space that was born before
|
|
|
|
* zbm_creation_txg, and freed before new (ie. on new's deadlist or a
|
|
|
|
* previous deadlist).
|
2011-11-17 22:14:36 +04:00
|
|
|
*
|
|
|
|
* space freed [---------------------]
|
|
|
|
* snapshots ---O-------O--------O-------O------
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
* bookmark new
|
|
|
|
*
|
|
|
|
* Note, the bookmark's zbm_*_bytes_refd must be valid, but if the HAS_FBN
|
|
|
|
* flag is not set, we will calculate the freed_before_next based on the
|
|
|
|
* next snapshot's deadlist, rather than using zbm_*_freed_before_next_snap.
|
2011-11-17 22:14:36 +04:00
|
|
|
*/
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
static int
|
|
|
|
dsl_dataset_space_written_impl(zfs_bookmark_phys_t *bmp,
|
|
|
|
dsl_dataset_t *new, uint64_t *usedp, uint64_t *compp, uint64_t *uncompp)
|
2011-11-17 22:14:36 +04:00
|
|
|
{
|
|
|
|
int err = 0;
|
|
|
|
dsl_pool_t *dp = new->ds_dir->dd_pool;
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
ASSERT(dsl_pool_config_held(dp));
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (dsl_dataset_is_snapshot(new)) {
|
|
|
|
ASSERT3U(bmp->zbm_creation_txg, <,
|
|
|
|
dsl_dataset_phys(new)->ds_creation_txg);
|
|
|
|
}
|
2013-09-04 16:00:57 +04:00
|
|
|
|
2011-11-17 22:14:36 +04:00
|
|
|
*usedp = 0;
|
2015-04-01 18:14:34 +03:00
|
|
|
*usedp += dsl_dataset_phys(new)->ds_referenced_bytes;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
*usedp -= bmp->zbm_referenced_bytes_refd;
|
2011-11-17 22:14:36 +04:00
|
|
|
|
|
|
|
*compp = 0;
|
2015-04-01 18:14:34 +03:00
|
|
|
*compp += dsl_dataset_phys(new)->ds_compressed_bytes;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
*compp -= bmp->zbm_compressed_bytes_refd;
|
2011-11-17 22:14:36 +04:00
|
|
|
|
|
|
|
*uncompp = 0;
|
2015-04-01 18:14:34 +03:00
|
|
|
*uncompp += dsl_dataset_phys(new)->ds_uncompressed_bytes;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
*uncompp -= bmp->zbm_uncompressed_bytes_refd;
|
2011-11-17 22:14:36 +04:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dsl_dataset_t *snap = new;
|
2011-11-17 22:14:36 +04:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
while (dsl_dataset_phys(snap)->ds_prev_snap_txg >
|
|
|
|
bmp->zbm_creation_txg) {
|
|
|
|
uint64_t used, comp, uncomp;
|
2011-11-17 22:14:36 +04:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
dsl_deadlist_space_range(&snap->ds_deadlist,
|
|
|
|
0, bmp->zbm_creation_txg,
|
|
|
|
&used, &comp, &uncomp);
|
2011-11-17 22:14:36 +04:00
|
|
|
*usedp += used;
|
|
|
|
*compp += comp;
|
|
|
|
*uncompp += uncomp;
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
uint64_t snapobj = dsl_dataset_phys(snap)->ds_prev_snap_obj;
|
2012-12-14 03:24:15 +04:00
|
|
|
if (snap != new)
|
|
|
|
dsl_dataset_rele(snap, FTAG);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
err = dsl_dataset_hold_obj(dp, snapobj, FTAG, &snap);
|
|
|
|
if (err != 0)
|
2011-11-17 22:14:36 +04:00
|
|
|
break;
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
}
|
2011-11-17 22:14:36 +04:00
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
/*
|
|
|
|
* We might not have the FBN if we are calculating written from
|
|
|
|
* a snapshot (because we didn't know the correct "next" snapshot
|
|
|
|
* until now).
|
|
|
|
*/
|
|
|
|
if (bmp->zbm_flags & ZBM_FLAG_HAS_FBN) {
|
|
|
|
*usedp += bmp->zbm_referenced_freed_before_next_snap;
|
|
|
|
*compp += bmp->zbm_compressed_freed_before_next_snap;
|
|
|
|
*uncompp += bmp->zbm_uncompressed_freed_before_next_snap;
|
|
|
|
} else {
|
|
|
|
ASSERT3U(dsl_dataset_phys(snap)->ds_prev_snap_txg, ==,
|
|
|
|
bmp->zbm_creation_txg);
|
|
|
|
uint64_t used, comp, uncomp;
|
|
|
|
dsl_deadlist_space(&snap->ds_deadlist, &used, &comp, &uncomp);
|
|
|
|
*usedp += used;
|
|
|
|
*compp += comp;
|
|
|
|
*uncompp += uncomp;
|
2011-11-17 22:14:36 +04:00
|
|
|
}
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (snap != new)
|
|
|
|
dsl_dataset_rele(snap, FTAG);
|
2011-11-17 22:14:36 +04:00
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
/*
|
|
|
|
* Return (in *usedp) the amount of space written in new that was not
|
|
|
|
* present at the time the bookmark corresponds to. New may be a
|
|
|
|
* snapshot or the head. Old must be a bookmark before new, in
|
|
|
|
* new's filesystem (or its origin) -- caller verifies this.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
dsl_dataset_space_written_bookmark(zfs_bookmark_phys_t *bmp,
|
|
|
|
dsl_dataset_t *new, uint64_t *usedp, uint64_t *compp, uint64_t *uncompp)
|
|
|
|
{
|
|
|
|
if (!(bmp->zbm_flags & ZBM_FLAG_HAS_FBN))
|
|
|
|
return (SET_ERROR(ENOTSUP));
|
|
|
|
return (dsl_dataset_space_written_impl(bmp, new,
|
|
|
|
usedp, compp, uncompp));
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Return (in *usedp) the amount of space written in new that is not
|
|
|
|
* present in oldsnap. New may be a snapshot or the head. Old must be
|
|
|
|
* a snapshot before new, in new's filesystem (or its origin). If not then
|
|
|
|
* fail and return EINVAL.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
dsl_dataset_space_written(dsl_dataset_t *oldsnap, dsl_dataset_t *new,
|
|
|
|
uint64_t *usedp, uint64_t *compp, uint64_t *uncompp)
|
|
|
|
{
|
|
|
|
if (!dsl_dataset_is_before(new, oldsnap, 0))
|
|
|
|
return (SET_ERROR(EINVAL));
|
|
|
|
|
|
|
|
zfs_bookmark_phys_t zbm = { 0 };
|
|
|
|
dsl_dataset_phys_t *dsp = dsl_dataset_phys(oldsnap);
|
|
|
|
zbm.zbm_guid = dsp->ds_guid;
|
|
|
|
zbm.zbm_creation_txg = dsp->ds_creation_txg;
|
|
|
|
zbm.zbm_creation_time = dsp->ds_creation_time;
|
|
|
|
zbm.zbm_referenced_bytes_refd = dsp->ds_referenced_bytes;
|
|
|
|
zbm.zbm_compressed_bytes_refd = dsp->ds_compressed_bytes;
|
|
|
|
zbm.zbm_uncompressed_bytes_refd = dsp->ds_uncompressed_bytes;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If oldsnap is the origin (or origin's origin, ...) of new,
|
|
|
|
* we can't easily calculate the effective FBN. Therefore,
|
|
|
|
* we do not set ZBM_FLAG_HAS_FBN, so that the _impl will calculate
|
|
|
|
* it relative to the correct "next": the next snapshot towards "new",
|
|
|
|
* rather than the next snapshot in oldsnap's dsl_dir.
|
|
|
|
*/
|
|
|
|
return (dsl_dataset_space_written_impl(&zbm, new,
|
|
|
|
usedp, compp, uncompp));
|
|
|
|
}
|
|
|
|
|
2011-11-17 22:14:36 +04:00
|
|
|
/*
|
|
|
|
* Return (in *usedp) the amount of space that will be reclaimed if firstsnap,
|
|
|
|
* lastsnap, and all snapshots in between are deleted.
|
|
|
|
*
|
|
|
|
* blocks that would be freed [---------------------------]
|
|
|
|
* snapshots ---O-------O--------O-------O--------O
|
|
|
|
* firstsnap lastsnap
|
|
|
|
*
|
|
|
|
* This is the set of blocks that were born after the snap before firstsnap,
|
|
|
|
* (birth > firstsnap->prev_snap_txg) and died before the snap after the
|
|
|
|
* last snap (ie, is on lastsnap->ds_next->ds_deadlist or an earlier deadlist).
|
|
|
|
* We calculate this by iterating over the relevant deadlists (from the snap
|
|
|
|
* after lastsnap, backward to the snap after firstsnap), summing up the
|
|
|
|
* space on the deadlist that was born after the snap before firstsnap.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
dsl_dataset_space_wouldfree(dsl_dataset_t *firstsnap,
|
|
|
|
dsl_dataset_t *lastsnap,
|
|
|
|
uint64_t *usedp, uint64_t *compp, uint64_t *uncompp)
|
|
|
|
{
|
|
|
|
int err = 0;
|
|
|
|
uint64_t snapobj;
|
|
|
|
dsl_pool_t *dp = firstsnap->ds_dir->dd_pool;
|
|
|
|
|
2015-04-02 06:44:32 +03:00
|
|
|
ASSERT(firstsnap->ds_is_snapshot);
|
|
|
|
ASSERT(lastsnap->ds_is_snapshot);
|
2011-11-17 22:14:36 +04:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Check that the snapshots are in the same dsl_dir, and firstsnap
|
|
|
|
* is before lastsnap.
|
|
|
|
*/
|
|
|
|
if (firstsnap->ds_dir != lastsnap->ds_dir ||
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(firstsnap)->ds_creation_txg >
|
|
|
|
dsl_dataset_phys(lastsnap)->ds_creation_txg)
|
2013-03-08 22:41:28 +04:00
|
|
|
return (SET_ERROR(EINVAL));
|
2011-11-17 22:14:36 +04:00
|
|
|
|
|
|
|
*usedp = *compp = *uncompp = 0;
|
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
snapobj = dsl_dataset_phys(lastsnap)->ds_next_snap_obj;
|
2011-11-17 22:14:36 +04:00
|
|
|
while (snapobj != firstsnap->ds_object) {
|
|
|
|
dsl_dataset_t *ds;
|
|
|
|
uint64_t used, comp, uncomp;
|
|
|
|
|
|
|
|
err = dsl_dataset_hold_obj(dp, snapobj, FTAG, &ds);
|
|
|
|
if (err != 0)
|
|
|
|
break;
|
|
|
|
|
|
|
|
dsl_deadlist_space_range(&ds->ds_deadlist,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dataset_phys(firstsnap)->ds_prev_snap_txg, UINT64_MAX,
|
2011-11-17 22:14:36 +04:00
|
|
|
&used, &comp, &uncomp);
|
|
|
|
*usedp += used;
|
|
|
|
*compp += comp;
|
|
|
|
*uncompp += uncomp;
|
|
|
|
|
2015-04-01 18:14:34 +03:00
|
|
|
snapobj = dsl_dataset_phys(ds)->ds_prev_snap_obj;
|
2011-11-17 22:14:36 +04:00
|
|
|
ASSERT3U(snapobj, !=, 0);
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
}
|
|
|
|
return (err);
|
|
|
|
}
|
|
|
|
|
2013-09-04 16:00:57 +04:00
|
|
|
/*
|
|
|
|
* Return TRUE if 'earlier' is an earlier snapshot in 'later's timeline.
|
|
|
|
* For example, they could both be snapshots of the same filesystem, and
|
|
|
|
* 'earlier' is before 'later'. Or 'earlier' could be the origin of
|
|
|
|
* 'later's filesystem. Or 'earlier' could be an older snapshot in the origin's
|
|
|
|
* filesystem. Or 'earlier' could be the origin's origin.
|
2013-12-12 02:33:41 +04:00
|
|
|
*
|
|
|
|
* If non-zero, earlier_txg is used instead of earlier's ds_creation_txg.
|
2013-09-04 16:00:57 +04:00
|
|
|
*/
|
|
|
|
boolean_t
|
2013-12-12 02:33:41 +04:00
|
|
|
dsl_dataset_is_before(dsl_dataset_t *later, dsl_dataset_t *earlier,
|
2017-01-12 20:42:11 +03:00
|
|
|
uint64_t earlier_txg)
|
2013-09-04 16:00:57 +04:00
|
|
|
{
|
|
|
|
dsl_pool_t *dp = later->ds_dir->dd_pool;
|
|
|
|
int error;
|
|
|
|
boolean_t ret;
|
|
|
|
|
|
|
|
ASSERT(dsl_pool_config_held(dp));
|
2015-04-02 06:44:32 +03:00
|
|
|
ASSERT(earlier->ds_is_snapshot || earlier_txg != 0);
|
2013-12-12 02:33:41 +04:00
|
|
|
|
|
|
|
if (earlier_txg == 0)
|
2015-04-01 18:14:34 +03:00
|
|
|
earlier_txg = dsl_dataset_phys(earlier)->ds_creation_txg;
|
2013-09-04 16:00:57 +04:00
|
|
|
|
2015-04-02 06:44:32 +03:00
|
|
|
if (later->ds_is_snapshot &&
|
2015-04-01 18:14:34 +03:00
|
|
|
earlier_txg >= dsl_dataset_phys(later)->ds_creation_txg)
|
2013-09-04 16:00:57 +04:00
|
|
|
return (B_FALSE);
|
|
|
|
|
|
|
|
if (later->ds_dir == earlier->ds_dir)
|
|
|
|
return (B_TRUE);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We check dd_origin_obj explicitly here rather than using
|
|
|
|
* dsl_dir_is_clone() so that we will return TRUE if "earlier"
|
|
|
|
* is $ORIGIN@$ORIGIN. dsl_dataset_space_written() depends on
|
|
|
|
* this behavior.
|
|
|
|
*/
|
|
|
|
if (dsl_dir_phys(later->ds_dir)->dd_origin_obj == 0)
|
2013-09-04 16:00:57 +04:00
|
|
|
return (B_FALSE);
|
|
|
|
|
2017-11-04 23:25:13 +03:00
|
|
|
dsl_dataset_t *origin;
|
2013-09-04 16:00:57 +04:00
|
|
|
error = dsl_dataset_hold_obj(dp,
|
2015-04-01 18:14:34 +03:00
|
|
|
dsl_dir_phys(later->ds_dir)->dd_origin_obj, FTAG, &origin);
|
2013-09-04 16:00:57 +04:00
|
|
|
if (error != 0)
|
|
|
|
return (B_FALSE);
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
if (dsl_dataset_phys(origin)->ds_creation_txg == earlier_txg &&
|
|
|
|
origin->ds_dir == earlier->ds_dir) {
|
|
|
|
dsl_dataset_rele(origin, FTAG);
|
|
|
|
return (B_TRUE);
|
|
|
|
}
|
2013-12-12 02:33:41 +04:00
|
|
|
ret = dsl_dataset_is_before(origin, earlier, earlier_txg);
|
2013-09-04 16:00:57 +04:00
|
|
|
dsl_dataset_rele(origin, FTAG);
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
|
2013-10-08 21:13:05 +04:00
|
|
|
void
|
|
|
|
dsl_dataset_zapify(dsl_dataset_t *ds, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
objset_t *mos = ds->ds_dir->dd_pool->dp_meta_objset;
|
|
|
|
dmu_object_zapify(mos, ds->ds_object, DMU_OT_DSL_DATASET, tx);
|
|
|
|
}
|
|
|
|
|
2016-01-07 00:22:48 +03:00
|
|
|
boolean_t
|
|
|
|
dsl_dataset_is_zapified(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
dmu_object_info_t doi;
|
|
|
|
|
|
|
|
dmu_object_info_from_db(ds->ds_dbuf, &doi);
|
|
|
|
return (doi.doi_type == DMU_OTN_ZAP_METADATA);
|
|
|
|
}
|
|
|
|
|
|
|
|
boolean_t
|
|
|
|
dsl_dataset_has_resume_receive_state(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
return (dsl_dataset_is_zapified(ds) &&
|
|
|
|
zap_contains(ds->ds_dir->dd_pool->dp_meta_objset,
|
|
|
|
ds->ds_object, DS_FIELD_RESUME_TOGUID) == 0);
|
|
|
|
}
|
|
|
|
|
OpenZFS 7614, 9064 - zfs device evacuation/removal
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1eb
Closes #6900
2016-09-22 19:30:13 +03:00
|
|
|
uint64_t
|
|
|
|
dsl_dataset_get_remap_deadlist_object(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
uint64_t remap_deadlist_obj;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (!dsl_dataset_is_zapified(ds))
|
|
|
|
return (0);
|
|
|
|
|
|
|
|
err = zap_lookup(ds->ds_dir->dd_pool->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_REMAP_DEADLIST, sizeof (remap_deadlist_obj), 1,
|
|
|
|
&remap_deadlist_obj);
|
|
|
|
|
|
|
|
if (err != 0) {
|
|
|
|
VERIFY3S(err, ==, ENOENT);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
ASSERT(remap_deadlist_obj != 0);
|
|
|
|
return (remap_deadlist_obj);
|
|
|
|
}
|
|
|
|
|
|
|
|
boolean_t
|
|
|
|
dsl_dataset_remap_deadlist_exists(dsl_dataset_t *ds)
|
|
|
|
{
|
|
|
|
EQUIV(dsl_deadlist_is_open(&ds->ds_remap_deadlist),
|
|
|
|
dsl_dataset_get_remap_deadlist_object(ds) != 0);
|
|
|
|
return (dsl_deadlist_is_open(&ds->ds_remap_deadlist));
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
dsl_dataset_set_remap_deadlist_object(dsl_dataset_t *ds, uint64_t obj,
|
|
|
|
dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
ASSERT(obj != 0);
|
|
|
|
dsl_dataset_zapify(ds, tx);
|
|
|
|
VERIFY0(zap_add(ds->ds_dir->dd_pool->dp_meta_objset, ds->ds_object,
|
|
|
|
DS_FIELD_REMAP_DEADLIST, sizeof (obj), 1, &obj, tx));
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
dsl_dataset_unset_remap_deadlist_object(dsl_dataset_t *ds, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
VERIFY0(zap_remove(ds->ds_dir->dd_pool->dp_meta_objset,
|
|
|
|
ds->ds_object, DS_FIELD_REMAP_DEADLIST, tx));
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
dsl_dataset_destroy_remap_deadlist(dsl_dataset_t *ds, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
uint64_t remap_deadlist_object;
|
|
|
|
spa_t *spa = ds->ds_dir->dd_pool->dp_spa;
|
|
|
|
|
|
|
|
ASSERT(dmu_tx_is_syncing(tx));
|
|
|
|
ASSERT(dsl_dataset_remap_deadlist_exists(ds));
|
|
|
|
|
|
|
|
remap_deadlist_object = ds->ds_remap_deadlist.dl_object;
|
|
|
|
dsl_deadlist_close(&ds->ds_remap_deadlist);
|
|
|
|
dsl_deadlist_free(spa_meta_objset(spa), remap_deadlist_object, tx);
|
|
|
|
dsl_dataset_unset_remap_deadlist_object(ds, tx);
|
|
|
|
spa_feature_decr(spa, SPA_FEATURE_OBSOLETE_COUNTS, tx);
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
dsl_dataset_create_remap_deadlist(dsl_dataset_t *ds, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
uint64_t remap_deadlist_obj;
|
|
|
|
spa_t *spa = ds->ds_dir->dd_pool->dp_spa;
|
|
|
|
|
|
|
|
ASSERT(dmu_tx_is_syncing(tx));
|
|
|
|
ASSERT(MUTEX_HELD(&ds->ds_remap_deadlist_lock));
|
|
|
|
/*
|
|
|
|
* Currently we only create remap deadlists when there are indirect
|
|
|
|
* vdevs with referenced mappings.
|
|
|
|
*/
|
|
|
|
ASSERT(spa_feature_is_active(spa, SPA_FEATURE_DEVICE_REMOVAL));
|
|
|
|
|
|
|
|
remap_deadlist_obj = dsl_deadlist_clone(
|
|
|
|
&ds->ds_deadlist, UINT64_MAX,
|
|
|
|
dsl_dataset_phys(ds)->ds_prev_snap_obj, tx);
|
|
|
|
dsl_dataset_set_remap_deadlist_object(ds,
|
|
|
|
remap_deadlist_obj, tx);
|
|
|
|
dsl_deadlist_open(&ds->ds_remap_deadlist, spa_meta_objset(spa),
|
|
|
|
remap_deadlist_obj);
|
|
|
|
spa_feature_incr(spa, SPA_FEATURE_OBSOLETE_COUNTS, tx);
|
|
|
|
}
|
|
|
|
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
void
|
|
|
|
dsl_dataset_activate_redaction(dsl_dataset_t *ds, uint64_t *redact_snaps,
|
|
|
|
uint64_t num_redact_snaps, dmu_tx_t *tx)
|
|
|
|
{
|
|
|
|
uint64_t dsobj = ds->ds_object;
|
|
|
|
struct feature_type_uint64_array_arg *ftuaa =
|
|
|
|
kmem_zalloc(sizeof (*ftuaa), KM_SLEEP);
|
|
|
|
ftuaa->length = (int64_t)num_redact_snaps;
|
|
|
|
if (num_redact_snaps > 0) {
|
|
|
|
ftuaa->array = kmem_alloc(num_redact_snaps * sizeof (uint64_t),
|
|
|
|
KM_SLEEP);
|
2022-02-25 16:26:54 +03:00
|
|
|
memcpy(ftuaa->array, redact_snaps, num_redact_snaps *
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
sizeof (uint64_t));
|
|
|
|
}
|
|
|
|
dsl_dataset_activate_feature(dsobj, SPA_FEATURE_REDACTED_DATASETS,
|
|
|
|
ftuaa, tx);
|
|
|
|
ds->ds_feature[SPA_FEATURE_REDACTED_DATASETS] = ftuaa;
|
|
|
|
}
|
|
|
|
|
Improve zpool status output, list all affected datasets
Currently, determining which datasets are affected by corruption is
a manual process.
The primary difficulty in reporting the list of affected snapshots is
that since the error was initially found, the snapshot where the error
originally occurred in, may have been deleted. To solve this issue, we
add the ID of the head dataset of the original snapshot which the error
was detected in, to the stored error report. Then any time a filesystem
is deleted, the errors associated with it are deleted as well. Any time
a clone promote occurs, we modify reports associated with the original
head to refer to the new head. The stored error reports are identified
by this head ID, the birth time of the block which the error occurred
in, as well as some information about the error itself are also stored.
Once this information is stored, we can find the set of datasets
affected by an error by walking back the list of snapshots in the given
head until we find one with the appropriate birth txg, and then traverse
through the snapshots of the clone family, terminating a branch if the
block was replaced in a given snapshot. Then we report this information
back to libzfs, and to the zpool status command, where it is displayed
as follows:
pool: test
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:00:00 with 800 errors on Fri Dec 3
08:27:57 2021
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
sdb ONLINE 0 0 1.58K
errors: Permanent errors have been detected in the following files:
test@1:/test.0.0
/test/test.0.0
/test/1clone/test.0.0
A new feature flag is introduced to mark the presence of this change, as
well as promotion and backwards compatibility logic. This is an updated
version of #9175. Rebase required fixing the tests, updating the ABI of
libzfs, updating the man pages, fixing bugs, fixing the error returns,
and updating the old on-disk error logs to the new format when
activating the feature.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Co-authored-by: TulsiJain <tulsi.jain@delphix.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #9175
Closes #12812
2022-04-26 03:25:42 +03:00
|
|
|
/*
|
|
|
|
* Find and return (in *oldest_dsobj) the oldest snapshot of the dsobj
|
|
|
|
* dataset whose birth time is >= min_txg.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
dsl_dataset_oldest_snapshot(spa_t *spa, uint64_t head_ds, uint64_t min_txg,
|
|
|
|
uint64_t *oldest_dsobj)
|
|
|
|
{
|
|
|
|
dsl_dataset_t *ds;
|
|
|
|
dsl_pool_t *dp = spa->spa_dsl_pool;
|
|
|
|
|
|
|
|
int error = dsl_dataset_hold_obj(dp, head_ds, FTAG, &ds);
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
uint64_t prev_obj = dsl_dataset_phys(ds)->ds_prev_snap_obj;
|
|
|
|
uint64_t prev_obj_txg = dsl_dataset_phys(ds)->ds_prev_snap_txg;
|
|
|
|
|
|
|
|
while (prev_obj != 0 && min_txg < prev_obj_txg) {
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
if ((error = dsl_dataset_hold_obj(dp, prev_obj,
|
|
|
|
FTAG, &ds)) != 0)
|
|
|
|
return (error);
|
|
|
|
prev_obj_txg = dsl_dataset_phys(ds)->ds_prev_snap_txg;
|
|
|
|
prev_obj = dsl_dataset_phys(ds)->ds_prev_snap_obj;
|
|
|
|
}
|
|
|
|
*oldest_dsobj = ds->ds_object;
|
|
|
|
dsl_dataset_rele(ds, FTAG);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
Cleanup: Specify unsignedness on things that should not be signed
In #13871, zfs_vdev_aggregation_limit_non_rotating and
zfs_vdev_aggregation_limit being signed was pointed out as a possible
reason not to eliminate an unnecessary MAX(unsigned, 0) since the
unsigned value was assigned from them.
There is no reason for these module parameters to be signed and upon
inspection, it was found that there are a number of other module
parameters that are signed, but should not be, so we make them unsigned.
Making them unsigned made it clear that some other variables in the code
should also be unsigned, so we also make those unsigned. This prevents
users from setting negative values that could potentially cause bad
behaviors. It also makes the code slightly easier to understand.
Mostly module parameters that deal with timeouts, limits, bitshifts and
percentages are made unsigned by this. Any that are boolean are left
signed, since whether booleans should be considered signed or unsigned
does not matter.
Making zfs_arc_lotsfree_percent unsigned caused a
`zfs_arc_lotsfree_percent >= 0` check to become redundant, so it was
removed. Removing the check was also necessary to prevent a compiler
error from -Werror=type-limits.
Several end of line comments had to be moved to their own lines because
replacing int with uint_t caused us to exceed the 80 character limit
enforced by cstyle.pl.
The following were kept signed because they are passed to
taskq_create(), which expects signed values and modifying the
OpenSolaris/Illumos DDI is out of scope of this patch:
* metaslab_load_pct
* zfs_sync_taskq_batch_pct
* zfs_zil_clean_taskq_nthr_pct
* zfs_zil_clean_taskq_minalloc
* zfs_zil_clean_taskq_maxalloc
* zfs_arc_prune_task_threads
Also, negative values in those parameters was found to be harmless.
The following were left signed because either negative values make
sense, or more analysis was needed to determine whether negative values
should be disallowed:
* zfs_metaslab_switch_threshold
* zfs_pd_bytes_max
* zfs_livelist_min_percent_shared
zfs_multihost_history was made static to be consistent with other
parameters.
A number of module parameters were marked as signed, but in reality
referenced unsigned variables. upgrade_errlog_limit is one of the
numerous examples. In the case of zfs_vdev_async_read_max_active, it was
already uint32_t, but zdb had an extern int declaration for it.
Interestingly, the documentation in zfs.4 was right for
upgrade_errlog_limit despite the module parameter being wrongly marked,
while the documentation for zfs_vdev_async_read_max_active (and friends)
was wrong. It was also wrong for zstd_abort_size, which was unsigned,
but was documented as signed.
Also, the documentation in zfs.4 incorrectly described the following
parameters as ulong when they were int:
* zfs_arc_meta_adjust_restarts
* zfs_override_estimate_recordsize
They are now uint_t as of this patch and thus the man page has been
updated to describe them as uint.
dbuf_state_index was left alone since it does nothing and perhaps should
be removed in another patch.
If any module parameters were missed, they were not found by `grep -r
'ZFS_MODULE_PARAM' | grep ', INT'`. I did find a few that grep missed,
but only because they were in files that had hits.
This patch intentionally did not attempt to address whether some of
these module parameters should be elevated to 64-bit parameters, because
the length of a long on 32-bit is 32-bit.
Lastly, it was pointed out during review that uint_t is a better match
for these variables than uint32_t because FreeBSD kernel parameter
definitions are designed for uint_t, whose bit width can change in
future memory models. As a result, we change the existing parameters
that are uint32_t to use uint_t.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13875
2022-09-28 02:42:41 +03:00
|
|
|
ZFS_MODULE_PARAM(zfs, zfs_, max_recordsize, UINT, ZMOD_RW,
|
2019-09-06 00:49:49 +03:00
|
|
|
"Max allowed record size");
|
2014-11-03 23:15:08 +03:00
|
|
|
|
2019-09-06 00:49:49 +03:00
|
|
|
ZFS_MODULE_PARAM(zfs, zfs_, allow_redacted_dataset_mount, INT, ZMOD_RW,
|
Implement Redacted Send/Receive
Redacted send/receive allows users to send subsets of their data to
a target system. One possible use case for this feature is to not
transmit sensitive information to a data warehousing, test/dev, or
analytics environment. Another is to save space by not replicating
unimportant data within a given dataset, for example in backup tools
like zrepl.
Redacted send/receive is a three-stage process. First, a clone (or
clones) is made of the snapshot to be sent to the target. In this
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction
snapshot" (or snapshots). Second, the new zfs redact command is used
to create a redaction bookmark. The redaction bookmark stores the
list of blocks in a snapshot that were modified by the redaction
snapshot(s). Finally, the redaction bookmark is passed as a parameter
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive
or unwanted information, and those blocks are not included in the send
stream. When sending from the redaction bookmark, the blocks it
contains are considered as candidate blocks in addition to those
blocks in the destination snapshot that were modified since the
creation_txg of the redaction bookmark. This step is necessary to
allow the target to rehydrate data in the case where some blocks are
accidentally or unnecessarily modified in the redaction snapshot.
The changes to bookmarks to enable fast space estimation involve
adding deadlists to bookmarks. There is also logic to manage the
life cycles of these deadlists.
The new size estimation process operates in cases where previously
an accurate estimate could not be provided. In those cases, a send
is performed where no data blocks are read, reducing the runtime
significantly and providing a byte-accurate size estimate.
Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
2019-06-19 19:48:13 +03:00
|
|
|
"Allow mounting of redacted datasets");
|
|
|
|
|
2022-08-22 22:36:22 +03:00
|
|
|
ZFS_MODULE_PARAM(zfs, zfs_, snapshot_history_enabled, INT, ZMOD_RW,
|
|
|
|
"Include snapshot events in pool history/events");
|
|
|
|
|
2010-08-26 22:49:16 +04:00
|
|
|
EXPORT_SYMBOL(dsl_dataset_hold);
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
EXPORT_SYMBOL(dsl_dataset_hold_flags);
|
2010-08-26 22:49:16 +04:00
|
|
|
EXPORT_SYMBOL(dsl_dataset_hold_obj);
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
EXPORT_SYMBOL(dsl_dataset_hold_obj_flags);
|
2010-08-26 22:49:16 +04:00
|
|
|
EXPORT_SYMBOL(dsl_dataset_own);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_own_obj);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_name);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_rele);
|
Native Encryption for ZFS on Linux
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #494
Closes #5769
2017-08-14 20:36:48 +03:00
|
|
|
EXPORT_SYMBOL(dsl_dataset_rele_flags);
|
2010-08-26 22:49:16 +04:00
|
|
|
EXPORT_SYMBOL(dsl_dataset_disown);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_tryown);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_create_sync);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_create_sync_dd);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_snapshot_check);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_snapshot_sync);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_promote);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_user_hold);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_user_release);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_get_holds);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_get_blkptr);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_get_spa);
|
2013-07-29 22:55:16 +04:00
|
|
|
EXPORT_SYMBOL(dsl_dataset_modified_since_snap);
|
2011-11-17 22:14:36 +04:00
|
|
|
EXPORT_SYMBOL(dsl_dataset_space_written);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_space_wouldfree);
|
2010-08-26 22:49:16 +04:00
|
|
|
EXPORT_SYMBOL(dsl_dataset_sync);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_block_born);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_block_kill);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_dirty);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_stats);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_fast_stat);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_space);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_fsid_guid);
|
|
|
|
EXPORT_SYMBOL(dsl_dsobj_to_dsname);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_check_quota);
|
2013-09-04 16:00:57 +04:00
|
|
|
EXPORT_SYMBOL(dsl_dataset_clone_swap_check_impl);
|
|
|
|
EXPORT_SYMBOL(dsl_dataset_clone_swap_sync_impl);
|