Implement Redacted Send/Receive

Redacted send/receive allows users to send subsets of their data to 
a target system. One possible use case for this feature is to not 
transmit sensitive information to a data warehousing, test/dev, or 
analytics environment. Another is to save space by not replicating 
unimportant data within a given dataset, for example in backup tools 
like zrepl.

Redacted send/receive is a three-stage process. First, a clone (or 
clones) is made of the snapshot to be sent to the target. In this 
clone (or clones), all unnecessary or unwanted data is removed or
modified. This clone is then snapshotted to create the "redaction 
snapshot" (or snapshots). Second, the new zfs redact command is used 
to create a redaction bookmark. The redaction bookmark stores the 
list of blocks in a snapshot that were modified by the redaction 
snapshot(s). Finally, the redaction bookmark is passed as a parameter 
to zfs send. When sending to the snapshot that was redacted, the
redaction bookmark is used to filter out blocks that contain sensitive 
or unwanted information, and those blocks are not included in the send 
stream.  When sending from the redaction bookmark, the blocks it 
contains are considered as candidate blocks in addition to those 
blocks in the destination snapshot that were modified since the 
creation_txg of the redaction bookmark.  This step is necessary to 
allow the target to rehydrate data in the case where some blocks are 
accidentally or unnecessarily modified in the redaction snapshot.

The changes to bookmarks to enable fast space estimation involve 
adding deadlists to bookmarks. There is also logic to manage the 
life cycles of these deadlists.

The new size estimation process operates in cases where previously 
an accurate estimate could not be provided. In those cases, a send 
is performed where no data blocks are read, reducing the runtime 
significantly and providing a byte-accurate size estimate.

Reviewed-by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Chris Williamson <chris.williamson@delphix.com>
Reviewed-by: Pavel Zhakarov <pavel.zakharov@delphix.com>
Reviewed-by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #7958
This commit is contained in:
Paul Dagnelie
2019-06-19 09:48:13 -07:00
committed by Brian Behlendorf
parent c1b5801bb5
commit 30af21b025
103 changed files with 11513 additions and 2668 deletions
+202 -5
View File
@@ -177,7 +177,7 @@
.Cm mount
.Nm
.Cm mount
.Op Fl Olv
.Op Fl Oflv
.Op Fl o Ar options
.Fl a | Ar filesystem
.Nm
@@ -200,11 +200,18 @@
.Ar snapshot
.Nm
.Cm send
.Op Fl LPcenvw
.Op Fl i Ar snapshot Ns | Ns Ar bookmark
.Op Fl DLPcenpvw
.Oo Fl i Ar snapshot Ns | Ns Ar bookmark
.Oc
.Ar filesystem Ns | Ns Ar volume Ns | Ns Ar snapshot
.Nm
.Cm send
.Fl -redact Ar redaction_bookmark
.Op Fl DLPcenpv
.Op Fl i Ar snapshot Ns | Ns Ar bookmark
.Ar snapshot
.Nm
.Cm send
.Op Fl Penv
.Fl t Ar receive_resume_token
.Nm
@@ -227,6 +234,10 @@
.Fl A
.Ar filesystem Ns | Ns Ar volume
.Nm
.Cm redact
.Ar snapshot redaction_bookmark
.Op Ar redaction_snapshot Ns ...
.Nm
.Cm allow
.Ar filesystem Ns | Ns Ar volume
.Nm
@@ -740,6 +751,11 @@ this opaque token can be provided to
.Sy zfs send -t
to resume and complete the
.Sy zfs receive .
.It Sy redact_snaps
For bookmarks, this is the list of snapshot guids the bookmark contains a redaction
list for.
For snapshots, this is the list of snapshot guids the snapshot is redacted with
respect to.
.It Sy referenced
The amount of data that is accessible by this dataset, which may or may not be
shared with other datasets in the pool.
@@ -2454,6 +2470,76 @@ would normally be. Since compression is applied before encryption datasets may
be vulnerable to a CRIME-like attack if applications accessing the data allow
for it. Deduplication with encryption will leak information about which blocks
are equivalent in a dataset and will incur an extra CPU cost per block written.
.Ss Redaction
ZFS has support for a limited version of data subsetting, in the form of
redaction. Using the
.Sy zfs redact
command, a
.Sy redaction bookmark
can be created that stores a list of blocks containing sensitive information. When
provided to
.Sy zfs
.Sy send ,
this causes a
.Sy redacted send
to occur. Redacted sends omit the blocks containing sensitive information,
replacing them with REDACT records. When these send streams are received, a
.Sy redacted dataset
is created. A redacted dataset cannot be mounted by default, since it is
incomplete. It can be used to receive other send streams. In this way datasets
can be used for data backup and replication, with all the benefits that zfs send
and receive have to offer, while protecting sensitive information from being
stored on less-trusted machines or services.
.Pp
For the purposes of redaction, there are two steps to the process. A redact
step, and a send/receive step. First, a redaction bookmark is created. This is
done by providing the
.Sy zfs redact
command with a parent snapshot, a bookmark to be created, and a number of
redaction snapshots. These redaction snapshots must be descendants of the
parent snapshot, and they should modify data that is considered sensitive in
some way. Any blocks of data modified by all of the redaction snapshots will
be listed in the redaction bookmark, because it represents the truly sensitive
information. When it comes to the send step, the send process will not send
the blocks listed in the redaction bookmark, instead replacing them with
REDACT records. When received on the target system, this will create a
redacted dataset, missing the data that corresponds to the blocks in the
redaction bookmark on the sending system. The incremental send streams from
the original parent to the redaction snapshots can then also be received on
the target system, and this will produce a complete snapshot that can be used
normally. Incrementals from one snapshot on the parent filesystem and another
can also be done by sending from the redaction bookmark, rather than the
snapshots themselves.
.Pp
In order to make the purpose of the feature more clear, an example is
provided. Consider a zfs filesystem containing four files. These files
represent information for an online shopping service. One file contains a list
of usernames and passwords, another contains purchase histories, a third
contains click tracking data, and a fourth contains user preferences. The
owner of this data wants to make it available for their development teams to
test against, and their market research teams to do analysis on. The
development teams need information about user preferences and the click
tracking data, while the market research teams need information about purchase
histories and user preferences. Neither needs access to the usernames and
passwords. However, because all of this data is stored in one ZFS filesystem,
it must all be sent and received together. In addition, the owner of the data
wants to take advantage of features like compression, checksumming, and
snapshots, so they do want to continue to use ZFS to store and transmit their
data. Redaction can help them do so. First, they would make two clones of a
snapshot of the data on the source. In one clone, they create the setup they
want their market research team to see; they delete the usernames and
passwords file, and overwrite the click tracking data with dummy
information. In another, they create the setup they want the development teams
to see, by replacing the passwords with fake information and replacing the
purchase histories with randomly generated ones. They would then create a
redaction bookmark on the parent snapshot, using snapshots on the two clones
as redaction snapshots. The parent can then be sent, redacted, to the target
server where the research and development teams have access. Finally,
incremental sends from the parent snapshot to each of the clones can be send
to and received on the target server; these snapshots are identical to the
ones on the source, and are ready to be used, while the parent snapshot on the
target contains none of the username and password data present on the source,
because it was removed by the redacted send operation.
.Sh SUBCOMMANDS
All subcommands that modify state are logged persistently to the pool in their
original form.
@@ -3329,7 +3415,7 @@ Displays all ZFS file systems currently mounted.
.It Xo
.Nm
.Cm mount
.Op Fl Olv
.Op Fl Oflv
.Op Fl o Ar options
.Fl a | Ar filesystem
.Xc
@@ -3370,6 +3456,8 @@ of
this will cause the terminal to interactively block after asking for the key.
.It Fl v
Report mount progress.
.It Fl f
Attempt to force mounting of all filesystems, even those that couldn't normally be mounted (e.g. redacted datasets).
.El
.It Xo
.Nm
@@ -3650,7 +3738,7 @@ You will be able to receive your streams on future versions of ZFS.
.It Xo
.Nm
.Cm send
.Op Fl LPcenvw
.Op Fl DLPRcenpvw
.Op Fl i Ar snapshot Ns | Ns Ar bookmark
.Ar filesystem Ns | Ns Ar volume Ns | Ns Ar snapshot
.Xc
@@ -3775,6 +3863,97 @@ This information includes a per-second report of how much data has been sent.
.It Xo
.Nm
.Cm send
.Fl -redact Ar redaction_bookmark
.Op Fl DLPcenpv
.br
.Op Fl i Ar snapshot Ns | Ns Ar bookmark
.Ar snapshot
.Xc
Generate a redacted send stream.
This send stream contains all blocks from the snapshot being sent that aren't
included in the redaction list contained in the bookmark specified by the
.Fl -redact
(or
.Fl -d
) flag.
The resulting send stream is said to be redacted with respect to the snapshots
the bookmark specified by the
.Fl -redact No flag was created with.
The bookmark must have been created by running
.Sy zfs redact
on the snapshot being sent.
.sp
This feature can be used to allow clones of a filesystem to be made available on
a remote system, in the case where their parent need not (or needs to not) be
usable.
For example, if a filesystem contains sensitive data, and it has clones where
that sensitive data has been secured or replaced with dummy data, redacted sends
can be used to replicate the secured data without replicating the original
sensitive data, while still sharing all possible blocks.
A snapshot that has been redacted with respect to a set of snapshots will
contain all blocks referenced by at least one snapshot in the set, but will
contain none of the blocks referenced by none of the snapshots in the set.
In other words, if all snapshots in the set have modified a given block in the
parent, that block will not be sent; but if one or more snapshots have not
modified a block in the parent, they will still reference the parent's block, so
that block will be sent.
Note that only user data will be redacted.
.sp
When the redacted send stream is received, we will generate a redacted
snapshot.
Due to the nature of redaction, a redacted dataset can only be used in the
following ways:
.sp
1. To receive, as a clone, an incremental send from the original snapshot to one
of the snapshots it was redacted with respect to.
In this case, the stream will produce a valid dataset when received because all
blocks that were redacted in the parent are guaranteed to be present in the
child's send stream.
This use case will produce a normal snapshot, which can be used just like other
snapshots.
.sp
2. To receive an incremental send from the original snapshot to something
redacted with respect to a subset of the set of snapshots the initial snapshot
was redacted with respect to.
In this case, each block that was redacted in the original is still redacted
(redacting with respect to additional snapshots causes less data to be redacted
(because the snapshots define what is permitted, and everything else is
redacted)).
This use case will produce a new redacted snapshot.
.sp
3. To receive an incremental send from a redaction bookmark of the original
snapshot that was created when redacting with respect to a subset of the set of
snapshots the initial snapshot was created with respect to
anything else.
A send stream from such a redaction bookmark will contain all of the blocks
necessary to fill in any redacted data, should it be needed, because the sending
system is aware of what blocks were originally redacted.
This will either produce a normal snapshot or a redacted one, depending on
whether the new send stream is redacted.
.sp
4. To receive an incremental send from a redacted version of the initial
snapshot that is redacted with respect to a subect of the set of snapshots the
initial snapshot was created with respect to.
A send stream from a compatible redacted dataset will contain all of the blocks
necessary to fill in any redacted data.
This will either produce a normal snapshot or a redacted one, depending on
whether the new send stream is redacted.
.sp
5. To receive a full send as a clone of the redacted snapshot.
Since the stream is a full send, it definitionally contains all the data needed
to create a new dataset.
This use case will either produce a normal snapshot or a redacted one, depending
on whether the full send stream was redacted.
.sp
These restrictions are detected and enforced by \fBzfs receive\fR; a
redacted send stream will contain the list of snapshots that the stream is
redacted with respsect to.
These are stored with the redacted snapshot, and are used to detect and
correctly handle the cases above. Note that for technical reasons, raw sends
and redacted sends cannot be combined at this time.
.It Xo
.Nm
.Cm send
.Op Fl Penv
.Fl t
.Ar receive_resume_token
@@ -4091,6 +4270,24 @@ Abort an interrupted
deleting its saved partially received state.
.It Xo
.Nm
.Cm redact
.Ar snapshot redaction_bookmark
.Op Ar redaction_snapshot Ns ...
.Xc
Generate a new redaction bookmark.
In addition to the typical bookmark information, a redaction bookmark contains
the list of redacted blocks and the list of redaction snapshots specified.
The redacted blocks are blocks in the snapshot which are not referenced by any
of the redaction snapshots.
These blocks are found by iterating over the metadata in each redaction snapshot
to determine what has been changed since the target snapshot.
Redaction is designed to support redacted zfs sends; see the entry for
.Sy zfs send
for more information on the purpose of this operation.
If a redact operation fails partway through (due to an error or a system
failure), the redaction can be resumed by rerunning the same command.
.It Xo
.Nm
.Cm allow
.Ar filesystem Ns | Ns Ar volume
.Xc