Improve zfs receive performance by batching writes

For each WRITE record in the stream, `zfs receive` creates a DMU
transaction (`dmu_tx_create()`) and writes this block's data into the
object.  If per-block overheads (as opposed to per-byte overheads)
dominate performance (as is often the case with small recordsize), the
per-dmu-transaction overheads can be significant.  For example, in some
workloads the `receieve_writer` thread is 100% on CPU, and more than
half of its CPU time is in these per-tx routines (e.g.
dmu_tx_hold_write, dmu_tx_assign, dmu_tx_commit).

To improve performance of `zfs receive`, this commit batches WRITE
records which are to nearby offsets of the same object, and uses one DMU
transaction to write them all.  By default the batch size is 1MB, which
for recordsize=8K reduces the number of DMU transactions by 128x for
full send streams (incrementals will depend on how "clumpy" the changed
blocks are).

This commit improves the performance of `dd if=stream | zfs recv`
from 78,800 blocks/sec to 98,100 blocks/sec (25% improvement).

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10099
This commit is contained in:
Matthew Ahrens
2020-03-16 11:51:56 -07:00
committed by GitHub
parent c94fb10917
commit 7261fc2e81
2 changed files with 182 additions and 51 deletions
+14
View File
@@ -2995,6 +2995,20 @@ must be at least twice the maximum block size in use.
Default value: \fB16,777,216\fR.
.RE
.sp
.ne 2
.na
\fBzfs_recv_write_batch_size\fR (int)
.ad
.RS 12n
The maximum amount of data (in bytes) that \fBzfs receive\fR will write in
one DMU transaction. This is the uncompressed size, even when receiving a
compressed send stream. This setting will not reduce the write size below
a single block. Capped at a maximum of 32MB
.sp
Default value: \fB1MB\fR.
.RE
.sp
.ne 2
.na