Fast Clone Deletion

Deleting a clone requires finding blocks are clone-only, not shared
with the snapshot. This was done by traversing the entire block tree
which results in a large performance penalty for sparsely
written clones.

This is new method keeps track of clone blocks when they are
modified in a "Livelist" so that, when it’s time to delete,
the clone-specific blocks are already at hand.

We see performance improvements because now deletion work is
proportional to the number of clone-modified blocks, not the size
of the original dataset.

Reviewed-by: Sean Eric Fagan <sef@ixsystems.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Sara Hartse <sara.hartse@delphix.com>
Closes #8416
This commit is contained in:
Sara Hartse
2019-07-26 10:54:14 -07:00
committed by Brian Behlendorf
parent d274ac5460
commit 37f03da8ba
38 changed files with 2583 additions and 205 deletions
+92
View File
@@ -1909,6 +1909,98 @@ Pattern written to vdev free space by \fBzpool initialize\fR.
Default value: \fB16,045,690,984,833,335,022\fR (0xdeadbeefdeadbeee).
.RE
.sp
.ne 2
.na
\fBzfs_livelist_max_entries\fR (ulong)
.ad
.RS 12n
The threshold size (in block pointers) at which we create a new sub-livelist.
Larger sublists are more costly from a memory perspective but the fewer
sublists there are, the lower the cost of insertion.
.sp
Default value: \fB500,000\fR.
.RE
.sp
.ne 2
.na
\fBzfs_livelist_min_percent_shared\fR (int)
.ad
.RS 12n
If the amount of shared space between a snapshot and its clone drops below
this threshold, the clone turns off the livelist and reverts to the old deletion
method. This is in place because once a clone has been overwritten enough
livelists no long give us a benefit.
.sp
Default value: \fB75\fR.
.RE
.sp
.ne 2
.na
\fBzfs_livelist_condense_new_alloc\fR (int)
.ad
.RS 12n
Incremented each time an extra ALLOC blkptr is added to a livelist entry while
it is being condensed.
This option is used by the test suite to track race conditions.
.sp
Default value: \fB0\fR.
.RE
.sp
.ne 2
.na
\fBzfs_livelist_condense_sync_cancel\fR (int)
.ad
.RS 12n
Incremented each time livelist condensing is canceled while in
spa_livelist_condense_sync.
This option is used by the test suite to track race conditions.
.sp
Default value: \fB0\fR.
.RE
.sp
.ne 2
.na
\fBzfs_livelist_condense_sync_pause\fR (int)
.ad
.RS 12n
When set, the livelist condense process pauses indefinitely before
executing the synctask - spa_livelist_condense_sync.
This option is used by the test suite to trigger race conditions.
.sp
Default value: \fB0\fR.
.RE
.sp
.ne 2
.na
\fBzfs_livelist_condense_zthr_cancel\fR (int)
.ad
.RS 12n
Incremented each time livelist condensing is canceled while in
spa_livelist_condense_cb.
This option is used by the test suite to track race conditions.
.sp
Default value: \fB0\fR.
.RE
.sp
.ne 2
.na
\fBzfs_livelist_condense_zthr_pause\fR (int)
.ad
.RS 12n
When set, the livelist condense process pauses indefinitely before
executing the open context condensing work in spa_livelist_condense_cb.
This option is used by the test suite to trigger race conditions.
.sp
Default value: \fB0\fR.
.RE
.sp
.ne 2
.na