Persistent L2ARC

This commit makes the L2ARC persistent across reboots. We implement
a light-weight persistent L2ARC metadata structure that allows L2ARC
contents to be recovered after a reboot. This significantly eases the
impact a reboot has on read performance on systems with large caches.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Saso Kiselkov <skiselkov@gmail.com>
Co-authored-by: Jorgen Lundman <lundman@lundman.net>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Ported-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #925 
Closes #1823 
Closes #2672 
Closes #3744 
Closes #9582
This commit is contained in:
George Amanakis
2020-04-10 13:33:35 -04:00
committed by GitHub
parent 36a6e2335c
commit 77f6826b83
30 changed files with 3020 additions and 88 deletions
+42 -9
View File
@@ -87,7 +87,7 @@ Default value: \fB10\fR%.
.ad
.RS 12n
Set the size of the dbuf cache, \fBdbuf_cache_max_bytes\fR, to a log2 fraction
of the target arc size.
of the target ARC size.
.sp
Default value: \fB5\fR.
.RE
@@ -99,7 +99,7 @@ Default value: \fB5\fR.
.ad
.RS 12n
Set the size of the dbuf metadata cache, \fBdbuf_metadata_cache_max_bytes\fR,
to a log2 fraction of the target arc size.
to a log2 fraction of the target ARC size.
.sp
Default value: \fB6\fR.
.RE
@@ -179,7 +179,10 @@ Default value: \fB1\fR.
.ad
.RS 12n
How far through the ARC lists to search for L2ARC cacheable content, expressed
as a multiplier of \fBl2arc_write_max\fR
as a multiplier of \fBl2arc_write_max\fR.
ARC persistence across reboots can be achieved with persistent L2ARC by setting
this parameter to \fB0\fR allowing the full length of ARC lists to be searched
for cacheable content.
.sp
Default value: \fB2\fR.
.RE
@@ -203,7 +206,7 @@ Default value: \fB200\fR%.
.ad
.RS 12n
Do not write buffers to L2ARC if they were prefetched but not used by
applications
applications.
.sp
Use \fB1\fR for yes (default) and \fB0\fR to disable.
.RE
@@ -214,7 +217,7 @@ Use \fB1\fR for yes (default) and \fB0\fR to disable.
\fBl2arc_norw\fR (int)
.ad
.RS 12n
No reads during writes
No reads during writes.
.sp
Use \fB1\fR for yes and \fB0\fR for no (default).
.RE
@@ -237,11 +240,41 @@ Default value: \fB8,388,608\fR.
\fBl2arc_write_max\fR (ulong)
.ad
.RS 12n
Max write bytes per interval
Max write bytes per interval.
.sp
Default value: \fB8,388,608\fR.
.RE
.sp
.ne 2
.na
\fBl2arc_rebuild_enabled\fR (int)
.ad
.RS 12n
Rebuild the L2ARC when importing a pool (persistent L2ARC). This can be
disabled if there are problems importing a pool or attaching an L2ARC device
(e.g. the L2ARC device is slow in reading stored log metadata, or the metadata
has become somehow fragmented/unusable).
.sp
Use \fB1\fR for yes (default) and \fB0\fR for no.
.RE
.sp
.ne 2
.na
\fBl2arc_rebuild_blocks_min_l2size\fR (ulong)
.ad
.RS 12n
Min size (in bytes) of an L2ARC device required in order to write log blocks
in it. The log blocks are used upon importing the pool to rebuild
the L2ARC (persistent L2ARC). Rationale: for L2ARC devices less than 1GB, the
amount of data l2arc_evict() evicts is significant compared to the amount of
restored L2ARC data. In this case do not write log blocks in L2ARC in order not
to waste space.
.sp
Default value: \fB1,073,741,824\fR (1GB).
.RE
.sp
.ne 2
.na
@@ -614,7 +647,7 @@ Default value: \fB1\fR.
.ad
.RS 12n
Sets the maximum number of bytes to consume during pool import to the log2
fraction of the target arc size.
fraction of the target ARC size.
.sp
Default value: \fB4\fR.
.RE
@@ -963,7 +996,7 @@ Default value: \fB1\fR.
\fBzfs_arc_min\fR (ulong)
.ad
.RS 12n
Min arc size of ARC in bytes. If set to 0 then arc_c_min will default to
Min size of ARC in bytes. If set to 0 then arc_c_min will default to
consuming the larger of 32M or 1/32 of total system memory.
.sp
Default value: \fB0\fR.
@@ -1088,7 +1121,7 @@ Default value: \fB0\fR.
Percent of pagecache to reclaim arc to
This tunable allows ZFS arc to play more nicely with the kernel's LRU
pagecache. It can guarantee that the arc size won't collapse under scanning
pagecache. It can guarantee that the ARC size won't collapse under scanning
pressure on the pagecache, yet still allows arc to be reclaimed down to
zfs_arc_min if necessary. This value is specified as percent of pagecache
size (as measured by NR_FILE_PAGES) where that percent may exceed 100. This
+11 -5
View File
@@ -212,18 +212,24 @@ If specified multiple times, display counts of each intent log transaction type.
Examine the checkpointed state of the pool.
Note, the on disk format of the pool is not reverted to the checkpointed state.
.It Fl l Ar device
Read the vdev labels from the specified device.
Read the vdev labels and L2ARC header from the specified device.
.Nm Fl l
will return 0 if valid label was found, 1 if error occurred, and 2 if no valid
labels were found. Each unique configuration is displayed only once.
labels were found. The presence of L2ARC header is indicated by a specific
sequence (L2ARC_DEV_HDR_MAGIC). Each unique configuration is displayed only
once.
.It Fl ll Ar device
In addition display label space usage stats.
In addition display label space usage stats. If a valid L2ARC header was found
also display the properties of log blocks used for restoring L2ARC contents
(persistent L2ARC).
.It Fl lll Ar device
Display every configuration, unique or not.
Display every configuration, unique or not. If a valid L2ARC header was found
also display the properties of log entries in log blocks used for restoring
L2ARC contents (persistent L2ARC).
.Pp
If the
.Fl q
option is also specified, don't print the labels.
option is also specified, don't print the labels or the L2ARC header.
.Pp
If the
.Fl u
+4 -1
View File
@@ -48,7 +48,10 @@
.Xc
Removes ZFS label information from the specified
.Ar device .
The
If the
.Ar device
is a cache device, it also removes the L2ARC header
(persistent L2ARC). The
.Ar device
must not be part of an active pool configuration.
.Bl -tag -width Ds
+22 -2
View File
@@ -323,8 +323,28 @@ If a read error is encountered on a cache device, that read I/O is reissued to
the original storage pool device, which might be part of a mirrored or raidz
configuration.
.Pp
The content of the cache devices is considered volatile, as is the case with
other system caches.
The content of the cache devices is persistent across reboots and restored
asynchronously when importing the pool in L2ARC (persistent L2ARC).
This can be disabled by setting
.Sy l2arc_rebuild_enabled = 0 .
For cache devices smaller than 1GB we do not write the metadata structures
required for rebuilding the L2ARC in order not to waste space. This can be
changed with
.Sy l2arc_rebuild_blocks_min_l2size .
The cache device header (512 bytes) is updated even if no metadata structures
are written. Setting
.Sy l2arc_headroom = 0
will result in scanning the full-length ARC lists for cacheable content to be
written in L2ARC (persistent ARC). If a cache device is added with
.Nm zpool Cm add
its label and header will be overwritten and its contents are not going to be
restored in L2ARC, even if the device was previously part of the pool. If a
cache device is onlined with
.Nm zpool Cm online
its contents will be restored in L2ARC. This is useful in case of memory pressure
where the contents of the cache device are not fully restored in L2ARC.
The user can off/online the cache device when there is less memory pressure
in order to fully restore its contents to L2ARC.
.Ss Pool checkpoint
Before starting critical procedures that include destructive actions (e.g
.Nm zfs Cm destroy