mirror_zfs/lib/libzfs
Matthew Ahrens c618f87cd2
Add zstream redup command to convert deduplicated send streams
Deduplicated send and receive is deprecated.  To ease migration to the
new dedup-send-less world, the commit adds a `zstream redup` utility to
convert deduplicated send streams to normal streams, so that they can
continue to be received indefinitely.

The new `zstream` command also replaces the functionality of
`zstreamdump`, by way of the `zstream dump` subcommand.  The
`zstreamdump` command is replaced by a shell script which invokes
`zstream dump`.

The way that `zstream redup` works under the hood is that as we read the
send stream, we build up a hash table which maps from `<GUID, object,
offset> -> <file_offset>`.

Whenever we see a WRITE record, we add a new entry to the hash table,
which indicates where in the stream file to find the WRITE record for
this block. (The key is `drr_toguid, drr_object, drr_offset`.)

For entries other than WRITE_BYREF, we pass them through unchanged
(except for the running checksum, which is recalculated).

For WRITE_BYREF records, we change them to WRITE records.  We find the
referenced WRITE record by looking in the hash table (for the record
with key `drr_refguid, drr_refobject, drr_refoffset`), and then reading
the record header and payload from the specified offset in the stream
file.  This is why the stream can not be a pipe.  The found WRITE record
replaces the WRITE_BYREF record, with its `drr_toguid`, `drr_object`,
and `drr_offset` fields changed to be the same as the WRITE_BYREF's
(i.e. we are writing the same logical block, but with the data supplied
by the previous WRITE record).

This algorithm requires memory proportional to the number of WRITE
records (same as `zfs send -D`), but the size per WRITE record is
relatively low (40 bytes, vs. 72 for `zfs send -D`).  A 1TB send stream
with 8KB blocks (`recordsize=8k`) would use around 5GB of RAM to
"redup".

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10124 
Closes #10156
2020-04-10 10:39:55 -07:00
..
os/linux Move zfs_version_kernel to platform code 2020-02-12 13:00:19 -08:00
.gitignore Add a pkgconfig file 2014-08-28 07:59:43 -07:00
libzfs_changelist.c zfs_handle used after being closed/freed in change_one callback 2019-08-28 15:02:58 -07:00
libzfs_config.c Use zfs_ioctl with zfs_cmd_t in libzfs 2019-10-23 17:29:43 -07:00
libzfs_core.pc.in Change http://zfsonlinux.org links to https://zfsonlinux.org 2020-01-13 16:43:59 -08:00
libzfs_crypto.c Fix typos in lib/ 2019-09-02 17:53:27 -07:00
libzfs_dataset.c Add 'zfs wait' command 2020-04-01 10:02:06 -07:00
libzfs_diff.c Don't open zfs control device exclusively 2020-02-28 14:54:14 -08:00
libzfs_import.c Persistent L2ARC 2020-04-10 10:33:35 -07:00
libzfs_iter.c Use zfs_ioctl with zfs_cmd_t in libzfs 2019-10-23 17:29:43 -07:00
libzfs_mount.c Change default to overlay=on 2020-03-06 09:28:19 -08:00
libzfs_pool.c libzfs_pool: Remove unused check for ENOTBLK 2020-04-07 10:04:40 -07:00
libzfs_sendrecv.c Add zstream redup command to convert deduplicated send streams 2020-04-10 10:39:55 -07:00
libzfs_status.c Add missing MMP status code to libzfs_status 2019-01-03 12:15:46 -08:00
libzfs_util.c libzfs: Fix bounds checks for float parsing 2020-03-16 11:56:29 -07:00
libzfs.pc.in Change http://zfsonlinux.org links to https://zfsonlinux.org 2020-01-13 16:43:59 -08:00
Makefile.am Compile cityhash code into libzfs 2020-03-27 09:11:22 -07:00
THIRDPARTYLICENSE.openssl Fix typos in lib/ 2019-09-02 17:53:27 -07:00
THIRDPARTYLICENSE.openssl.descrip Encryption patch follow-up 2017-10-11 16:54:48 -04:00