mirror_zfs/module
Matthew Ahrens ba67d82142
Improve zfs receive performance with lightweight write
The performance of `zfs receive` can be bottlenecked on the CPU consumed
by the `receive_writer` thread, especially when receiving streams with
small compressed block sizes.  Much of the CPU is spent creating and
destroying dbuf's and arc buf's, one for each `WRITE` record in the send
stream.

This commit introduces the concept of "lightweight writes", which allows
`zfs receive` to write to the DMU by providing an ABD, and instantiating
only a new type of `dbuf_dirty_record_t`.  The dbuf and arc buf for this
"dirty leaf block" are not instantiated.

Because there is no dbuf with the dirty data, this mechanism doesn't
support reading from "lightweight-dirty" blocks (they would see the
on-disk state rather than the dirty data).  Since the dedup-receive code
has been removed, `zfs receive` is write-only, so this works fine.

Because there are no arc bufs for the received data, the received data
is no longer cached in the ARC.

Testing a receive of a stream with average compressed block size of 4KB,
this commit improves performance by 50%, while also reducing CPU usage
by 50% of a CPU.  On a per-block basis, CPU consumed by receive_writer()
and dbuf_evict() is now 1/7th (14%) of what it was.

Baseline: 450MB/s, CPU in receive_writer() 40% + dbuf_evict() 35%
New: 670MB/s, CPU in receive_writer() 17% + dbuf_evict() 0%

The code is also restructured in a few ways:

Added a `dr_dnode` field to the dbuf_dirty_record_t.  This simplifies
some existing code that no longer needs `DB_DNODE_ENTER()` and related
routines.  The new field is needed by the lightweight-type dirty record.

To ensure that the `dr_dnode` field remains valid until the dirty record
is freed, we have to ensure that the `dnode_move()` doesn't relocate the
dnode_t.  To do this we keep a hold on the dnode until it's zio's have
completed.  This is already done by the user-accounting code
(`userquota_updates_task()`), this commit extends that so that it always
keeps the dnode hold until zio completion (see `dnode_rele_task()`).

`dn_dirty_txg` was previously zeroed when the dnode was synced.  This
was not necessary, since its meaning can be "when was this dnode last
dirtied".  This change simplifies the new `dnode_rele_task()` code.

Removed some dead code related to `DRR_WRITE_BYREF` (dedup receive).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11105
2020-12-11 10:26:02 -08:00
..
avl Links in Source Files 2020-09-02 09:42:12 -07:00
icp Introduce CPU_SEQID_UNSTABLE 2020-11-02 11:51:12 -08:00
lua Use known license string for zlua 2020-10-27 09:43:36 -07:00
nvpair Links in Source Files 2020-09-02 09:42:12 -07:00
os FreeBSD: Fix format of vfs.zfs.arc_no_grow_shift 2020-12-10 15:28:56 -08:00
spl Cleanup linux module kbuild files 2020-06-10 09:24:15 -07:00
unicode Throw const on some strings 2020-10-02 17:44:10 -07:00
zcommon FreeBSD: Do zcommon_init sooner to avoid FPU panic 2020-12-09 21:29:00 -08:00
zfs Improve zfs receive performance with lightweight write 2020-12-11 10:26:02 -08:00
zstd Optimize locking checks in mempool allocator 2020-11-02 12:10:07 -08:00
.gitignore Cleanup linux module kbuild files 2020-06-10 09:24:15 -07:00
Kbuild.in Add zstd support to zfs 2020-08-20 10:30:06 -07:00
Makefile.bsd FreeBSD: decouple ZFS_DEBUG from kernel debug settings 2020-11-24 09:16:46 -08:00
Makefile.in Fix Linux modules uninstall 2020-10-08 20:07:10 -07:00