Move properties, parameters, events, and concepts around manual sections

The pages moved as follows:
  zpool-features.{5 => 7}
  spl{-module-parameters.5 => .4}
  zfs{-module-parameters.5 => .4}
  zfs-events.5 => into zpool-events.8
  zfsconcepts.{8 => 7}
  zfsprops.{8 => 7}
  zpoolconcepts.{8 => 7}
  zpoolprops.{8 => 7}

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Co-authored-by: Daniel Ebdrup Jensen <debdrup@FreeBSD.org>
Closes #12149
Closes #12212
This commit is contained in:
наб
2021-06-04 22:29:26 +02:00
committed by Brian Behlendorf
parent 0bef46e6d5
commit a444efb6d7
50 changed files with 543 additions and 582 deletions
+206
View File
@@ -0,0 +1,206 @@
.\"
.\" CDDL HEADER START
.\"
.\" The contents of this file are subject to the terms of the
.\" Common Development and Distribution License (the "License").
.\" You may not use this file except in compliance with the License.
.\"
.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
.\" or http://www.opensolaris.org/os/licensing.
.\" See the License for the specific language governing permissions
.\" and limitations under the License.
.\"
.\" When distributing Covered Code, include this CDDL HEADER in each
.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
.\" If applicable, add the following below this CDDL HEADER, with the
.\" fields enclosed by brackets "[]" replaced with your own identifying
.\" information: Portions Copyright [yyyy] [name of copyright owner]
.\"
.\" CDDL HEADER END
.\"
.\" Copyright (c) 2009 Sun Microsystems, Inc. All Rights Reserved.
.\" Copyright 2011 Joshua M. Clulow <josh@sysmgr.org>
.\" Copyright (c) 2011, 2019 by Delphix. All rights reserved.
.\" Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
.\" Copyright (c) 2014, Joyent, Inc. All rights reserved.
.\" Copyright (c) 2014 by Adam Stevko. All rights reserved.
.\" Copyright (c) 2014 Integros [integros.com]
.\" Copyright 2019 Richard Laager. All rights reserved.
.\" Copyright 2018 Nexenta Systems, Inc.
.\" Copyright 2019 Joyent, Inc.
.\"
.Dd June 30, 2019
.Dt ZFSCONCEPTS 7
.Os
.
.Sh NAME
.Nm zfsconcepts
.Nd overview of ZFS concepts
.
.Sh DESCRIPTION
.Ss ZFS File System Hierarchy
A ZFS storage pool is a logical collection of devices that provide space for
datasets.
A storage pool is also the root of the ZFS file system hierarchy.
.Pp
The root of the pool can be accessed as a file system, such as mounting and
unmounting, taking snapshots, and setting properties.
The physical storage characteristics, however, are managed by the
.Xr zpool 8
command.
.Pp
See
.Xr zpool 8
for more information on creating and administering pools.
.Ss Snapshots
A snapshot is a read-only copy of a file system or volume.
Snapshots can be created extremely quickly, and initially consume no additional
space within the pool.
As data within the active dataset changes, the snapshot consumes more data than
would otherwise be shared with the active dataset.
.Pp
Snapshots can have arbitrary names.
Snapshots of volumes can be cloned or rolled back, visibility is determined
by the
.Sy snapdev
property of the parent volume.
.Pp
File system snapshots can be accessed under the
.Pa .zfs/snapshot
directory in the root of the file system.
Snapshots are automatically mounted on demand and may be unmounted at regular
intervals.
The visibility of the
.Pa .zfs
directory can be controlled by the
.Sy snapdir
property.
.Ss Bookmarks
A bookmark is like a snapshot, a read-only copy of a file system or volume.
Bookmarks can be created extremely quickly, compared to snapshots, and they
consume no additional space within the pool.
Bookmarks can also have arbitrary names, much like snapshots.
.Pp
Unlike snapshots, bookmarks can not be accessed through the filesystem in any way.
From a storage standpoint a bookmark just provides a way to reference
when a snapshot was created as a distinct object.
Bookmarks are initially tied to a snapshot, not the filesystem or volume,
and they will survive if the snapshot itself is destroyed.
Since they are very light weight there's little incentive to destroy them.
.Ss Clones
A clone is a writable volume or file system whose initial contents are the same
as another dataset.
As with snapshots, creating a clone is nearly instantaneous, and initially
consumes no additional space.
.Pp
Clones can only be created from a snapshot.
When a snapshot is cloned, it creates an implicit dependency between the parent
and child.
Even though the clone is created somewhere else in the dataset hierarchy, the
original snapshot cannot be destroyed as long as a clone exists.
The
.Sy origin
property exposes this dependency, and the
.Cm destroy
command lists any such dependencies, if they exist.
.Pp
The clone parent-child dependency relationship can be reversed by using the
.Cm promote
subcommand.
This causes the
.Qq origin
file system to become a clone of the specified file system, which makes it
possible to destroy the file system that the clone was created from.
.Ss "Mount Points"
Creating a ZFS file system is a simple operation, so the number of file systems
per system is likely to be numerous.
To cope with this, ZFS automatically manages mounting and unmounting file
systems without the need to edit the
.Pa /etc/fstab
file.
All automatically managed file systems are mounted by ZFS at boot time.
.Pp
By default, file systems are mounted under
.Pa /path ,
where
.Ar path
is the name of the file system in the ZFS namespace.
Directories are created and destroyed as needed.
.Pp
A file system can also have a mount point set in the
.Sy mountpoint
property.
This directory is created as needed, and ZFS automatically mounts the file
system when the
.Nm zfs Cm mount Fl a
command is invoked
.Po without editing
.Pa /etc/fstab
.Pc .
The
.Sy mountpoint
property can be inherited, so if
.Em pool/home
has a mount point of
.Pa /export/stuff ,
then
.Em pool/home/user
automatically inherits a mount point of
.Pa /export/stuff/user .
.Pp
A file system
.Sy mountpoint
property of
.Sy none
prevents the file system from being mounted.
.Pp
If needed, ZFS file systems can also be managed with traditional tools
.Po
.Nm mount ,
.Nm umount ,
.Pa /etc/fstab
.Pc .
If a file system's mount point is set to
.Sy legacy ,
ZFS makes no attempt to manage the file system, and the administrator is
responsible for mounting and unmounting the file system.
Because pools must
be imported before a legacy mount can succeed, administrators should ensure
that legacy mounts are only attempted after the zpool import process
finishes at boot time.
For example, on machines using systemd, the mount option
.Pp
.Nm x-systemd.requires=zfs-import.target
.Pp
will ensure that the zfs-import completes before systemd attempts mounting
the filesystem.
See
.Xr systemd.mount 5
for details.
.Ss Deduplication
Deduplication is the process for removing redundant data at the block level,
reducing the total amount of data stored.
If a file system has the
.Sy dedup
property enabled, duplicate data blocks are removed synchronously.
The result
is that only unique data is stored and common components are shared among files.
.Pp
Deduplicating data is a very resource-intensive operation.
It is generally recommended that you have at least 1.25 GiB of RAM
per 1 TiB of storage when you enable deduplication.
Calculating the exact requirement depends heavily
on the type of data stored in the pool.
.Pp
Enabling deduplication on an improperly-designed system can result in
performance issues (slow IO and administrative operations).
It can potentially lead to problems importing a pool due to memory exhaustion.
Deduplication can consume significant processing power (CPU) and memory as well
as generate additional disk IO.
.Pp
Before creating a pool with deduplication enabled, ensure that you have planned
your hardware requirements appropriately and implemented appropriate recovery
practices, such as regular backups.
Consider using the
.Sy compression
property as a less resource-intensive alternative.
+2045
View File
File diff suppressed because it is too large Load Diff
+842
View File
@@ -0,0 +1,842 @@
.\"
.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved.
.\" Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
.\" Copyright (c) 2014, Joyent, Inc. All rights reserved.
.\" The contents of this file are subject to the terms of the Common Development
.\" and Distribution License (the "License"). You may not use this file except
.\" in compliance with the License. You can obtain a copy of the license at
.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
.\"
.\" See the License for the specific language governing permissions and
.\" limitations under the License. When distributing Covered Code, include this
.\" CDDL HEADER in each file and include the License file at
.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
.\" own identifying information:
.\" Portions Copyright [yyyy] [name of copyright owner]
.\" Copyright (c) 2019, Klara Inc.
.\" Copyright (c) 2019, Allan Jude
.\" Copyright (c) 2021, Colm Buckley <colm@tuatha.org>
.\"
.Dd May 31, 2021
.Dt ZPOOL-FEATURES 7
.Os
.
.Sh NAME
.Nm zpool-features
.Nd description of ZFS pool features
.
.Sh DESCRIPTION
ZFS pool on-disk format versions are specified via "features" which replace
the old on-disk format numbers (the last supported on-disk format number is 28).
To enable a feature on a pool use the
.Nm zpool Cm upgrade ,
or set the
.Sy feature Ns @ Ns Ar feature-name
property to
.Sy enabled .
Please also see the
.Sx Compatibility feature sets
section for information on how sets of features may be enabled together.
.Pp
The pool format does not affect file system version compatibility or the ability
to send file systems between pools.
.Pp
Since most features can be enabled independently of each other, the on-disk
format of the pool is specified by the set of all features marked as
.Sy active
on the pool.
If the pool was created by another software version
this set may include unsupported features.
.
.Ss Identifying features
Every feature has a GUID of the form
.Ar com.example : Ns Ar feature-name .
The reversed DNS name ensures that the feature's GUID is unique across all ZFS
implementations.
When unsupported features are encountered on a pool they will
be identified by their GUIDs.
Refer to the documentation for the ZFS
implementation that created the pool for information about those features.
.Pp
Each supported feature also has a short name.
By convention a feature's short name is the portion of its GUID which follows the
.Sq \&:
(i.e.
.Ar com.example : Ns Ar feature-name
would have the short name
.Ar feature-name ) ,
however a feature's short name may differ across ZFS implementations if
following the convention would result in name conflicts.
.
.Ss Feature states
Features can be in one of three states:
.Bl -tag -width "disabled"
.It Sy active
This feature's on-disk format changes are in effect on the pool.
Support for this feature is required to import the pool in read-write mode.
If this feature is not read-only compatible,
support is also required to import the pool in read-only mode
.Pq see Sx Read-only compatibility .
.It Sy enabled
An administrator has marked this feature as enabled on the pool, but the
feature's on-disk format changes have not been made yet.
The pool can still be imported by software that does not support this feature,
but changes may be made to the on-disk format at any time
which will move the feature to the
.Sy active
state.
Some features may support returning to the
.Sy enabled
state after becoming
.Sy active .
See feature-specific documentation for details.
.It Sy disabled
This feature's on-disk format changes have not been made and will not be made
unless an administrator moves the feature to the
.Sy enabled
state.
Features cannot be disabled once they have been enabled.
.El
.Pp
The state of supported features is exposed through pool properties of the form
.Sy feature Ns @ Ns Ar short-name .
.
.Ss Read-only compatibility
Some features may make on-disk format changes that do not interfere with other
software's ability to read from the pool.
These features are referred to as
.Dq read-only compatible .
If all unsupported features on a pool are read-only compatible,
the pool can be imported in read-only mode by setting the
.Sy readonly
property during import (see
.Xr zpool-import 8
for details on importing pools).
.
.Ss Unsupported features
For each unsupported feature enabled on an imported pool, a pool property
named
.Sy unsupported Ns @ Ns Ar feature-name
will indicate why the import was allowed despite the unsupported feature.
Possible values for this property are:
.Bl -tag -width "readonly"
.It Sy inactive
The feature is in the
.Sy enabled
state and therefore the pool's on-disk
format is still compatible with software that does not support this feature.
.It Sy readonly
The feature is read-only compatible and the pool has been imported in
read-only mode.
.El
.
.Ss Feature dependencies
Some features depend on other features being enabled in order to function.
Enabling a feature will automatically enable any features it depends on.
.
.Ss Compatibility feature sets
It is sometimes necessary for a pool to maintain compatibility with a
specific on-disk format, by enabling and disabling particular features.
The
.Sy compatibility
feature facilitates this by allowing feature sets to be read from text files.
When set to
.Sy off
(the default), compatibility feature sets are disabled
(i.e. all features are enabled); when set to
.Sy legacy ,
no features are enabled.
When set to a comma-separated list of filenames
(each filename may either be an absolute path, or relative to
.Pa /etc/zfs/compatibility.d
or
.Pa /usr/share/zfs/compatibility.d ) ,
the lists of requested features are read from those files,
separated by whitespace and/or commas.
Only features present in all files are enabled.
.Pp
Simple sanity checks are applied to the files:
they must be between 1B and 16kB in size, and must end with a newline character.
.Pp
The requested features are applied when a pool is created using
.Nm zpool Cm create Fl o Sy compatibility Ns = Ns Ar …
and controls which features are enabled when using
.Nm zpool Cm upgrade .
.Nm zpool Cm status
will not show a warning about disabled features which are not part
of the requested feature set.
.Pp
The special value
.Sy legacy
prevents any features from being enabled, either via
.Nm zpool Cm upgrade
or
.Nm zpool Cm set Sy feature Ns @ Ns Ar feature-name Ns = Ns Sy enabled .
This setting also prevents pools from being upgraded to newer on-disk versions.
This is a safety measure to prevent new features from being
accidentally enabled, breaking compatibility.
.Pp
By convention, compatibility files in
.Pa /usr/share/zfs/compatibility.d
are provided by the distribution, and include feature sets
supported by important versions of popular distributions, and feature
sets commonly supported at the start of each year.
Compatibility files in
.Pa /etc/zfs/compatibility.d ,
if present, will take precedence over files with the same name in
.Pa /usr/share/zfs/compatibility.d .
.Pp
If an unrecognized feature is found in these files, an error message will
be shown.
If the unrecognized feature is in a file in
.Pa /etc/zfs/compatibility.d ,
this is treated as an error and processing will stop.
If the unrecognized feature is under
.Pa /usr/share/zfs/compatibility.d ,
this is treated as a warning and processing will continue.
This difference is to allow distributions to include features
which might not be recognized by the currently-installed binaries.
.Pp
Compatibility files may include comments:
any text from
.Sq #
to the end of the line is ignored.
.Pp
.Sy Example :
.Bd -literal -compact -offset 4n
.No example# Nm cat Pa /usr/share/zfs/compatibility.d/grub2
# Features which are supported by GRUB2
async_destroy
bookmarks
embedded_data
empty_bpobj
enabled_txg
extensible_dataset
filesystem_limits
hole_birth
large_blocks
lz4_compress
spacemap_histogram
.No example# Nm zpool Cm create Fl o Sy compatibility Ns = Ns Ar grub2 Ar bootpool Ar vdev
.Ed
.Pp
See
.Xr zpool-create 8
and
.Xr zpool-upgrade 8
for more information on how these commands are affected by feature sets.
.
.de feature
.It Sy \\$2
.Bl -tag -compact -width "READ-ONLY COMPATIBLE"
.It GUID
.Sy \\$1:\\$2
.if !"\\$4"" \{\
.It DEPENDENCIES
\fB\\$4\fP\c
.if !"\\$5"" , \fB\\$5\fP\c
.if !"\\$6"" , \fB\\$6\fP\c
.if !"\\$7"" , \fB\\$7\fP\c
.if !"\\$8"" , \fB\\$8\fP\c
.if !"\\$9"" , \fB\\$9\fP\c
.\}
.It READ-ONLY COMPATIBLE
\\$3
.El
.Pp
..
.
.ds instant-never \
.No This feature becomes Sy active No as soon as it is enabled \
and will never return to being Sy enabled .
.
.ds remount-upgrade \
.No Each filesystem will be upgraded automatically when remounted, \
or when a new file is created under that filesystem. \
The upgrade can also be triggered on filesystems via \
Nm zfs Cm set Sy version Ns = Ns Sy current Ar fs . \
No The upgrade process runs in the background and may take a while to complete \
for filesystems containing large amounts of files.
.
.de checksum-spiel
When the
.Sy \\$1
feature is set to
.Sy enabled ,
the administrator can turn on the
.Sy \\$1
checksum on any dataset using
.Nm zfs Cm set Sy checksum Ns = Ns Sy \\$1 Ar dset
.Po see Xr zfs-set 8 Pc .
This feature becomes
.Sy active
once a
.Sy checksum
property has been set to
.Sy \\$1 ,
and will return to being
.Sy enabled
once all filesystems that have ever had their checksum set to
.Sy \\$1
are destroyed.
..
.
.Sh FEATURES
The following features are supported on this system:
.Bl -tag -width Ds
.feature org.zfsonlinux allocation_classes yes
This feature enables support for separate allocation classes.
.Pp
This feature becomes
.Sy active
when a dedicated allocation class vdev (dedup or special) is created with the
.Nm zpool Cm create No or Nm zpool Cm add No commands .
With device removal, it can be returned to the
.Sy enabled
state if all the dedicated allocation class vdevs are removed.
.
.feature com.delphix async_destroy yes
Destroying a file system requires traversing all of its data in order to
return its used space to the pool.
Without
.Sy async_destroy ,
the file system is not fully removed until all space has been reclaimed.
If the destroy operation is interrupted by a reboot or power outage,
the next attempt to open the pool will need to complete the destroy
operation synchronously.
.Pp
When
.Sy async_destroy
is enabled, the file system's data will be reclaimed by a background process,
allowing the destroy operation to complete
without traversing the entire file system.
The background process is able to resume
interrupted destroys after the pool has been opened, eliminating the need
to finish interrupted destroys as part of the open operation.
The amount of space remaining to be reclaimed by the background process
is available through the
.Sy freeing
property.
.Pp
This feature is only
.Sy active
while
.Sy freeing
is non-zero.
.
.feature com.delphix bookmarks yes extensible_dataset
This feature enables use of the
.Nm zfs Cm bookmark
command.
.Pp
This feature is
.Sy active
while any bookmarks exist in the pool.
All bookmarks in the pool can be listed by running
.Nm zfs Cm list Fl t Sy bookmark Fl r Ar poolname .
.
.feature com.datto bookmark_v2 no bookmark extensible_dataset
This feature enables the creation and management of larger bookmarks which are
needed for other features in ZFS.
.Pp
This feature becomes
.Sy active
when a v2 bookmark is created and will be returned to the
.Sy enabled
state when all v2 bookmarks are destroyed.
.
.feature com.delphix bookmark_written no bookmark extensible_dataset bookmark_v2
This feature enables additional bookmark accounting fields, enabling the
.Sy written Ns # Ns Ar bookmark
property (space written since a bookmark) and estimates of
send stream sizes for incrementals from bookmarks.
.Pp
This feature becomes
.Sy active
when a bookmark is created and will be
returned to the
.Sy enabled
state when all bookmarks with these fields are destroyed.
.
.feature org.openzfs device_rebuild yes
This feature enables the ability for the
.Nm zpool Cm attach
and
.Nm zpool Cm replace
commands to perform sequential reconstruction
(instead of healing reconstruction) when resilvering.
.Pp
Sequential reconstruction resilvers a device in LBA order without immediately
verifying the checksums.
Once complete, a scrub is started, which then verifies the checksums.
This approach allows full redundancy to be restored to the pool
in the minimum amount of time.
This two-phase approach will take longer than a healing resilver
when the time to verify the checksums is included.
However, unless there is additional pool damage,
no checksum errors should be reported by the scrub.
This feature is incompatible with raidz configurations.
.
This feature becomes
.Sy active
while a sequential resilver is in progress, and returns to
.Sy enabled
when the resilver completes.
.
.feature com.delphix device_removal no
This feature enables the
.Nm zpool Cm remove
command to remove top-level vdevs,
evacuating them to reduce the total size of the pool.
.Pp
This feature becomes
.Sy active
when the
.Nm zpool Cm remove
command is used
on a top-level vdev, and will never return to being
.Sy enabled .
.
.feature org.openzfs draid no
This feature enables use of the
.Sy draid
vdev type.
dRAID is a variant of raidz which provides integrated distributed
hot spares that allow faster resilvering while retaining the benefits of raidz.
Data, parity, and spare space are organized in redundancy groups
and distributed evenly over all of the devices.
.Pp
This feature becomes
.Sy active
when creating a pool which uses the
.Sy draid
vdev type, or when adding a new
.Sy draid
vdev to an existing pool.
.
.feature org.illumos edonr no extensible_dataset
This feature enables the use of the Edon-R hash algorithm for checksum,
including for nopwrite (if compression is also enabled, an overwrite of
a block whose checksum matches the data being written will be ignored).
In an abundance of caution, Edon-R requires verification when used with
dedup:
.Nm zfs Cm set Sy dedup Ns = Ns Sy edonr , Ns Sy verify
.Po see Xr zfs-set 8 Pc .
.Pp
Edon-R is a very high-performance hash algorithm that was part
of the NIST SHA-3 competition.
It provides extremely high hash performance (over 350% faster than SHA-256),
but was not selected because of its unsuitability
as a general purpose secure hash algorithm.
This implementation utilizes the new salted checksumming functionality
in ZFS, which means that the checksum is pre-seeded with a secret
256-bit random key (stored on the pool) before being fed the data block
to be checksummed.
Thus the produced checksums are unique to a given pool,
preventing hash collision attacks on systems with dedup.
.Pp
.checksum-spiel edonr
.Pp
.Fx does not support the Sy edonr No feature.
.
.feature com.delphix embedded_data no
This feature improves the performance and compression ratio of
highly-compressible blocks.
Blocks whose contents can compress to 112 bytes
or smaller can take advantage of this feature.
.Pp
When this feature is enabled, the contents of highly-compressible blocks are
stored in the block "pointer" itself (a misnomer in this case, as it contains
the compressed data, rather than a pointer to its location on disk).
Thus the space of the block (one sector, typically 512B or 4kB) is saved,
and no additional I/O is needed to read and write the data block.
.
\*[instant-never]
.
.feature com.delphix empty_bpobj yes
This feature increases the performance of creating and using a large
number of snapshots of a single filesystem or volume, and also reduces
the disk space required.
.Pp
When there are many snapshots, each snapshot uses many Block Pointer
Objects (bpobjs) to track blocks associated with that snapshot.
However, in common use cases, most of these bpobjs are empty.
This feature allows us to create each bpobj on-demand,
thus eliminating the empty bpobjs.
.Pp
This feature is
.Sy active
while there are any filesystems, volumes,
or snapshots which were created after enabling this feature.
.
.feature com.delphix enabled_txg yes
Once this feature is enabled, ZFS records the transaction group number
in which new features are enabled.
This has no user-visible impact, but other features may depend on this feature.
.Pp
This feature becomes
.Sy active
as soon as it is enabled and will
never return to being
.Sy enabled .
.
.feature com.datto encryption no bookmark_v2 extensible_dataset
This feature enables the creation and management of natively encrypted datasets.
.Pp
This feature becomes
.Sy active
when an encrypted dataset is created and will be returned to the
.Sy enabled
state when all datasets that use this feature are destroyed.
.
.feature com.delphix extensible_dataset no
This feature allows more flexible use of internal ZFS data structures,
and exists for other features to depend on.
.Pp
This feature will be
.Sy active
when the first dependent feature uses it, and will be returned to the
.Sy enabled
state when all datasets that use this feature are destroyed.
.
.feature com.joyent filesystem_limits yes extensible_dataset
This feature enables filesystem and snapshot limits.
These limits can be used to control how many filesystems and/or snapshots
can be created at the point in the tree on which the limits are set.
.Pp
This feature is
.Sy active
once either of the limit properties has been set on a dataset.
Once activated the feature is never deactivated.
.
.feature com.delphix hole_birth no enabled_txg
This feature has/had bugs, the result of which is that, if you do a
.Nm zfs Cm send Fl i
.Pq or Fl R , No since it uses Fl i
from an affected dataset, the receiving party will not see any checksum
or other errors, but the resulting destination snapshot
will not match the source.
Its use by
.Nm zfs Cm send Fl i
has been disabled by default
.Pq see Sy send_holes_without_birth_time No in Xr zfs 4 .
.Pp
This feature improves performance of incremental sends
.Pq Nm zfs Cm send Fl i
and receives for objects with many holes.
The most common case of hole-filled objects is zvols.
.Pp
An incremental send stream from snapshot
.Sy A No to snapshot Sy B
contains information about every block that changed between
.Sy A No and Sy B .
Blocks which did not change between those snapshots can be
identified and omitted from the stream using a piece of metadata called
the "block birth time", but birth times are not recorded for holes
(blocks filled only with zeroes).
Since holes created after
.Sy A No cannot be distinguished from holes created before Sy A ,
information about every hole in the entire filesystem or zvol
is included in the send stream.
.Pp
For workloads where holes are rare this is not a problem.
However, when incrementally replicating filesystems or zvols with many holes
(for example a zvol formatted with another filesystem) a lot of time will
be spent sending and receiving unnecessary information about holes that
already exist on the receiving side.
.Pp
Once the
.Sy hole_birth
feature has been enabled the block birth times
of all new holes will be recorded.
Incremental sends between snapshots created after this feature is enabled
will use this new metadata to avoid sending information about holes that
already exist on the receiving side.
.Pp
\*[instant-never]
.
.feature org.open-zfs large_blocks no extensible_dataset
This feature allows the record size on a dataset to be set larger than 128kB.
.Pp
This feature becomes
.Sy active
once a dataset contains a file with a block size larger than 128kB,
and will return to being
.Sy enabled
once all filesystems that have ever had their recordsize larger than 128kB
are destroyed.
.
.feature org.zfsonlinux large_dnode no extensible_dataset
This feature allows the size of dnodes in a dataset to be set larger than 512B.
.
This feature becomes
.Sy active
once a dataset contains an object with a dnode larger than 512B,
which occurs as a result of setting the
.Sy dnodesize
dataset property to a value other than
.Sy legacy .
The feature will return to being
.Sy enabled
once all filesystems that have ever contained a dnode larger than 512B
are destroyed.
Large dnodes allow more data to be stored in the bonus buffer,
thus potentially improving performance by avoiding the use of spill blocks.
.
.feature com.delphix livelist yes
This feature allows clones to be deleted faster than the traditional method
when a large number of random/sparse writes have been made to the clone.
All blocks allocated and freed after a clone is created are tracked by the
the clone's livelist which is referenced during the deletion of the clone.
The feature is activated when a clone is created and remains
.Sy active
until all clones have been destroyed.
.
.feature com.delphix log_spacemap yes com.delphix:spacemap_v2
This feature improves performance for heavily-fragmented pools,
especially when workloads are heavy in random-writes.
It does so by logging all the metaslab changes on a single spacemap every TXG
instead of scattering multiple writes to all the metaslab spacemaps.
.Pp
\*[instant-never]
.
.feature org.illumos lz4_compress no
.Sy lz4
is a high-performance real-time compression algorithm that
features significantly faster compression and decompression as well as a
higher compression ratio than the older
.Sy lzjb
compression.
Typically,
.Sy lz4
compression is approximately 50% faster on compressible data and 200% faster
on incompressible data than
.Sy lzjb .
It is also approximately 80% faster on decompression,
while giving approximately a 10% better compression ratio.
.Pp
When the
.Sy lz4_compress
feature is set to
.Sy enabled ,
the administrator can turn on
.Sy lz4
compression on any dataset on the pool using the
.Xr zfs-set 8
command.
All newly written metadata will be compressed with the
.Sy lz4
algorithm.
.Pp
\*[instant-never]
.
.feature com.joyent multi_vdev_crash_dump no
This feature allows a dump device to be configured with a pool comprised
of multiple vdevs.
Those vdevs may be arranged in any mirrored or raidz configuration.
.Pp
When the
.Sy multi_vdev_crash_dump
feature is set to
.Sy enabled ,
the administrator can use
.Xr dumpadm 1M
to configure a dump device on a pool comprised of multiple vdevs.
.Pp
Under
.Fx
and Linux this feature is unused, but registered for compatibility.
New pools created on these systems will have the feature
.Sy enabled
but will never transition to
.Sy active ,
as this functionality is not required for crash dump support.
Existing pools where this feature is
.Sy active
can be imported.
.
.feature com.delphix obsolete_counts yes device_removal
This feature is an enhancement of
.Sy device_removal ,
which will over time reduce the memory used to track removed devices.
When indirect blocks are freed or remapped,
we note that their part of the indirect mapping is "obsolete" no longer needed.
.Pp
This feature becomes
.Sy active
when the
.Nm zpool Cm remove
command is used on a top-level vdev, and will never return to being
.Sy enabled .
.
.feature org.zfsonlinux project_quota yes extensible_dataset
This feature allows administrators to account the spaces and objects usage
information against the project identifier (ID).
.Pp
The project ID is an object-based attribute.
When upgrading an existing filesystem,
objects without a project ID will be assigned a zero project ID.
When this feature is enabled, newly created objects inherit
their parent directories' project ID if the parent's inherit flag is set
.Pq via Nm chattr Sy [+-]P No or Nm zfs Cm project Fl s Ns | Ns Fl C .
Otherwise, the new object's project ID will be zero.
An object's project ID can be changed at any time by the owner
(or privileged user) via
.Nm chattr Fl p Ar prjid
or
.Nm zfs Cm project Fl p Ar prjid .
.Pp
This feature will become
.Sy active
as soon as it is enabled and will never return to being
.Sy disabled .
\*[remount-upgrade]
.
.feature com.delphix redaction_bookmarks no bookmarks extensible_dataset
This feature enables the use of redacted
.Nm zfs Cm send Ns s ,
which create redaction bookmarks storing the list of blocks
redacted by the send that created them.
For more information about redacted sends, see
.Xr zfs-send 8 .
.
.feature com.delphix redacted_datasets no extensible_dataset
This feature enables the receiving of redacted
.Nm zfs Cm send Ns
streams. which create redacted datasets when received.
These datasets are missing some of their blocks,
and so cannot be safely mounted, and their contents cannot be safely read.
For more information about redacted receives, see
.Xr zfs-send 8 .
.
.feature com.datto resilver_defer yes
This feature allows ZFS to postpone new resilvers if an existing one is already
in progress.
Without this feature, any new resilvers will cause the currently
running one to be immediately restarted from the beginning.
.Pp
This feature becomes
.Sy active
once a resilver has been deferred, and returns to being
.Sy enabled
when the deferred resilver begins.
.
.feature org.illumos sha512 no extensible_dataset
This feature enables the use of the SHA-512/256 truncated hash algorithm
(FIPS 180-4) for checksum and dedup.
The native 64-bit arithmetic of SHA-512 provides an approximate 50%
performance boost over SHA-256 on 64-bit hardware
and is thus a good minimum-change replacement candidate
for systems where hash performance is important,
but these systems cannot for whatever reason utilize the faster
.Sy skein No and Sy edonr
algorithms.
.Pp
.checksum-spiel sha512
.
.feature org.illumos skein no extensible_dataset
This feature enables the use of the Skein hash algorithm for checksum and dedup.
Skein is a high-performance secure hash algorithm that was a
finalist in the NIST SHA-3 competition.
It provides a very high security margin and high performance on 64-bit hardware
(80% faster than SHA-256).
This implementation also utilizes the new salted checksumming
functionality in ZFS, which means that the checksum is pre-seeded with a
secret 256-bit random key (stored on the pool) before being fed the data
block to be checksummed.
Thus the produced checksums are unique to a given pool,
preventing hash collision attacks on systems with dedup.
.Pp
.checksum-spiel skein
.
.feature com.delphix spacemap_histogram yes
This features allows ZFS to maintain more information about how free space
is organized within the pool.
If this feature is
.Sy enabled ,
it will be activated when a new space map object is created, or
an existing space map is upgraded to the new format,
and never returns back to being
.Sy enabled .
.
.feature com.delphix spacemap_v2 yes
This feature enables the use of the new space map encoding which
consists of two words (instead of one) whenever it is advantageous.
The new encoding allows space maps to represent large regions of
space more efficiently on-disk while also increasing their maximum
addressable offset.
.Pp
This feature becomes
.Sy active
once it is
.Sy enabled ,
and never returns back to being
.Sy enabled .
.
.feature org.zfsonlinux userobj_accounting yes extensible_dataset
This feature allows administrators to account the object usage information
by user and group.
.Pp
\*[instant-never]
\*[remount-upgrade]
.
.feature com.delphix zpool_checkpoint yes
This feature enables the
.Nm zpool Cm checkpoint
command that can checkpoint the state of the pool
at the time it was issued and later rewind back to it or discard it.
.Pp
This feature becomes
.Sy active
when the
.Nm zpool Cm checkpoint
command is used to checkpoint the pool.
The feature will only return back to being
.Sy enabled
when the pool is rewound or the checkpoint has been discarded.
.
.feature org.freebsd zstd_compress no extensible_dataset
.Sy zstd
is a high-performance compression algorithm that features a
combination of high compression ratios and high speed.
Compared to
.Sy gzip ,
.Sy zstd
offers slightly better compression at much higher speeds.
Compared to
.Sy lz4 ,
.Sy zstd
offers much better compression while being only modestly slower.
Typically,
.Sy zstd
compression speed ranges from 250 to 500 MB/s per thread
and decompression speed is over 1 GB/s per thread.
.Pp
When the
.Sy zstd
feature is set to
.Sy enabled ,
the administrator can turn on
.Sy zstd
compression of any dataset using
.Nm zfs Cm set Sy compress Ns = Ns Sy zstd Ar dset
.Po see Xr zfs-set 8 Pc .
This feature becomes
.Sy active
once a
.Sy compress
property has been set to
.Sy zstd ,
and will return to being
.Sy enabled
once all filesystems that have ever had their
.Sy compress
property set to
.Sy zstd
are destroyed.
.El
.
.Sh SEE ALSO
.Xr zpool 8
+512
View File
@@ -0,0 +1,512 @@
.\"
.\" CDDL HEADER START
.\"
.\" The contents of this file are subject to the terms of the
.\" Common Development and Distribution License (the "License").
.\" You may not use this file except in compliance with the License.
.\"
.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
.\" or http://www.opensolaris.org/os/licensing.
.\" See the License for the specific language governing permissions
.\" and limitations under the License.
.\"
.\" When distributing Covered Code, include this CDDL HEADER in each
.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
.\" If applicable, add the following below this CDDL HEADER, with the
.\" fields enclosed by brackets "[]" replaced with your own identifying
.\" information: Portions Copyright [yyyy] [name of copyright owner]
.\"
.\" CDDL HEADER END
.\"
.\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved.
.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved.
.\" Copyright (c) 2012 Cyril Plisko. All Rights Reserved.
.\" Copyright (c) 2017 Datto Inc.
.\" Copyright (c) 2018 George Melikov. All Rights Reserved.
.\" Copyright 2017 Nexenta Systems, Inc.
.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
.\"
.Dd June 2, 2021
.Dt ZPOOLCONCEPTS 7
.Os
.
.Sh NAME
.Nm zpoolconcepts
.Nd overview of ZFS storage pools
.
.Sh DESCRIPTION
.Ss Virtual Devices (vdevs)
A "virtual device" describes a single device or a collection of devices
organized according to certain performance and fault characteristics.
The following virtual devices are supported:
.Bl -tag -width "special"
.It Sy disk
A block device, typically located under
.Pa /dev .
ZFS can use individual slices or partitions, though the recommended mode of
operation is to use whole disks.
A disk can be specified by a full path, or it can be a shorthand name
.Po the relative portion of the path under
.Pa /dev
.Pc .
A whole disk can be specified by omitting the slice or partition designation.
For example,
.Pa sda
is equivalent to
.Pa /dev/sda .
When given a whole disk, ZFS automatically labels the disk, if necessary.
.It Sy file
A regular file.
The use of files as a backing store is strongly discouraged.
It is designed primarily for experimental purposes, as the fault tolerance of a
file is only as good as the file system on which it resides.
A file must be specified by a full path.
.It Sy mirror
A mirror of two or more devices.
Data is replicated in an identical fashion across all components of a mirror.
A mirror with
.Em N No disks of size Em X No can hold Em X No bytes and can withstand Em N-1
devices failing without losing data.
.It Sy raidz , raidz1 , raidz2 , raidz3
A variation on RAID-5 that allows for better distribution of parity and
eliminates the RAID-5
.Qq write hole
.Pq in which data and parity become inconsistent after a power loss .
Data and parity is striped across all disks within a raidz group.
.Pp
A raidz group can have single, double, or triple parity, meaning that the
raidz group can sustain one, two, or three failures, respectively, without
losing any data.
The
.Sy raidz1
vdev type specifies a single-parity raidz group; the
.Sy raidz2
vdev type specifies a double-parity raidz group; and the
.Sy raidz3
vdev type specifies a triple-parity raidz group.
The
.Sy raidz
vdev type is an alias for
.Sy raidz1 .
.Pp
A raidz group with
.Em N No disks of size Em X No with Em P No parity disks can hold approximately
.Em (N-P)*X No bytes and can withstand Em P No devices failing without losing data.
The minimum number of devices in a raidz group is one more than the number of
parity disks.
The recommended number is between 3 and 9 to help increase performance.
.It Sy draid , draid1 , draid2 , draid3
A variant of raidz that provides integrated distributed hot spares which
allows for faster resilvering while retaining the benefits of raidz.
A dRAID vdev is constructed from multiple internal raidz groups, each with
.Em D No data devices and Em P No parity devices.
These groups are distributed over all of the children in order to fully
utilize the available disk performance.
.Pp
Unlike raidz, dRAID uses a fixed stripe width (padding as necessary with
zeros) to allow fully sequential resilvering.
This fixed stripe width significantly effects both usable capacity and IOPS.
For example, with the default
.Em D=8 No and Em 4kB No disk sectors the minimum allocation size is Em 32kB .
If using compression, this relatively large allocation size can reduce the
effective compression ratio.
When using ZFS volumes and dRAID, the default of the
.Sy volblocksize
property is increased to account for the allocation size.
If a dRAID pool will hold a significant amount of small blocks, it is
recommended to also add a mirrored
.Sy special
vdev to store those blocks.
.Pp
In regards to I/O, performance is similar to raidz since for any read all
.Em D No data disks must be accessed.
Delivered random IOPS can be reasonably approximated as
.Sy floor((N-S)/(D+P))*single_drive_IOPS .
.Pp
Like raidzm a dRAID can have single-, double-, or triple-parity.
The
.Sy draid1 ,
.Sy draid2 ,
and
.Sy draid3
types can be used to specify the parity level.
The
.Sy draid
vdev type is an alias for
.Sy draid1 .
.Pp
A dRAID with
.Em N No disks of size Em X , D No data disks per redundancy group, Em P
.No parity level, and Em S No distributed hot spares can hold approximately
.Em (N-S)*(D/(D+P))*X No bytes and can withstand Em P
devices failing without losing data.
.It Sy draid Ns Oo Ar parity Oc Ns Oo Sy \&: Ns Ar data Ns Sy d Oc Ns Oo Sy \&: Ns Ar children Ns Sy c Oc Ns Oo Sy \&: Ns Ar spares Ns Sy s Oc
A non-default dRAID configuration can be specified by appending one or more
of the following optional arguments to the
.Sy draid
keyword:
.Bl -tag -compact -width "children"
.It Ar parity
The parity level (1-3).
.It Ar data
The number of data devices per redundancy group.
In general, a smaller value of
.Em D No will increase IOPS, improve the compression ratio,
and speed up resilvering at the expense of total usable capacity.
Defaults to
.Em 8 , No unless Em N-P-S No is less than Em 8 .
.It Ar children
The expected number of children.
Useful as a cross-check when listing a large number of devices.
An error is returned when the provided number of children differs.
.It Ar spares
The number of distributed hot spares.
Defaults to zero.
.El
.It Sy spare
A pseudo-vdev which keeps track of available hot spares for a pool.
For more information, see the
.Sx Hot Spares
section.
.It Sy log
A separate intent log device.
If more than one log device is specified, then writes are load-balanced between
devices.
Log devices can be mirrored.
However, raidz vdev types are not supported for the intent log.
For more information, see the
.Sx Intent Log
section.
.It Sy dedup
A device dedicated solely for deduplication tables.
The redundancy of this device should match the redundancy of the other normal
devices in the pool.
If more than one dedup device is specified, then
allocations are load-balanced between those devices.
.It Sy special
A device dedicated solely for allocating various kinds of internal metadata,
and optionally small file blocks.
The redundancy of this device should match the redundancy of the other normal
devices in the pool.
If more than one special device is specified, then
allocations are load-balanced between those devices.
.Pp
For more information on special allocations, see the
.Sx Special Allocation Class
section.
.It Sy cache
A device used to cache storage pool data.
A cache device cannot be configured as a mirror or raidz group.
For more information, see the
.Sx Cache Devices
section.
.El
.Pp
Virtual devices cannot be nested, so a mirror or raidz virtual device can only
contain files or disks.
Mirrors of mirrors
.Pq or other combinations
are not allowed.
.Pp
A pool can have any number of virtual devices at the top of the configuration
.Po known as
.Qq root vdevs
.Pc .
Data is dynamically distributed across all top-level devices to balance data
among devices.
As new virtual devices are added, ZFS automatically places data on the newly
available devices.
.Pp
Virtual devices are specified one at a time on the command line,
separated by whitespace.
Keywords like
.Sy mirror No and Sy raidz
are used to distinguish where a group ends and another begins.
For example, the following creates a pool with two root vdevs,
each a mirror of two disks:
.Dl # Nm zpool Cm create Ar mypool Sy mirror Ar sda sdb Sy mirror Ar sdc sdd
.
.Ss Device Failure and Recovery
ZFS supports a rich set of mechanisms for handling device failure and data
corruption.
All metadata and data is checksummed, and ZFS automatically repairs bad data
from a good copy when corruption is detected.
.Pp
In order to take advantage of these features, a pool must make use of some form
of redundancy, using either mirrored or raidz groups.
While ZFS supports running in a non-redundant configuration, where each root
vdev is simply a disk or file, this is strongly discouraged.
A single case of bit corruption can render some or all of your data unavailable.
.Pp
A pool's health status is described by one of three states:
.Sy online , degraded , No or Sy faulted .
An online pool has all devices operating normally.
A degraded pool is one in which one or more devices have failed, but the data is
still available due to a redundant configuration.
A faulted pool has corrupted metadata, or one or more faulted devices, and
insufficient replicas to continue functioning.
.Pp
The health of the top-level vdev, such as a mirror or raidz device,
is potentially impacted by the state of its associated vdevs,
or component devices.
A top-level vdev or component device is in one of the following states:
.Bl -tag -width "DEGRADED"
.It Sy DEGRADED
One or more top-level vdevs is in the degraded state because one or more
component devices are offline.
Sufficient replicas exist to continue functioning.
.Pp
One or more component devices is in the degraded or faulted state, but
sufficient replicas exist to continue functioning.
The underlying conditions are as follows:
.Bl -bullet -compact
.It
The number of checksum errors exceeds acceptable levels and the device is
degraded as an indication that something may be wrong.
ZFS continues to use the device as necessary.
.It
The number of I/O errors exceeds acceptable levels.
The device could not be marked as faulted because there are insufficient
replicas to continue functioning.
.El
.It Sy FAULTED
One or more top-level vdevs is in the faulted state because one or more
component devices are offline.
Insufficient replicas exist to continue functioning.
.Pp
One or more component devices is in the faulted state, and insufficient
replicas exist to continue functioning.
The underlying conditions are as follows:
.Bl -bullet -compact
.It
The device could be opened, but the contents did not match expected values.
.It
The number of I/O errors exceeds acceptable levels and the device is faulted to
prevent further use of the device.
.El
.It Sy OFFLINE
The device was explicitly taken offline by the
.Nm zpool Cm offline
command.
.It Sy ONLINE
The device is online and functioning.
.It Sy REMOVED
The device was physically removed while the system was running.
Device removal detection is hardware-dependent and may not be supported on all
platforms.
.It Sy UNAVAIL
The device could not be opened.
If a pool is imported when a device was unavailable, then the device will be
identified by a unique identifier instead of its path since the path was never
correct in the first place.
.El
.Pp
Checksum errors represent events where a disk returned data that was expected
to be correct, but was not.
In other words, these are instances of silent data corruption.
The checksum errors are reported in
.Nm zpool Cm status
and
.Nm zpool Cm events .
When a block is stored redundantly, a damaged block may be reconstructed
(e.g. from raidz parity or a mirrored copy).
In this case, ZFS reports the checksum error against the disks that contained
damaged data.
If a block is unable to be reconstructed (e.g. due to 3 disks being damaged
in a raidz2 group), it is not possible to determine which disks were silently
corrupted.
In this case, checksum errors are reported for all disks on which the block
is stored.
.Pp
If a device is removed and later re-attached to the system,
ZFS attempts online the device automatically.
Device attachment detection is hardware-dependent
and might not be supported on all platforms.
.
.Ss Hot Spares
ZFS allows devices to be associated with pools as
.Qq hot spares .
These devices are not actively used in the pool, but when an active device
fails, it is automatically replaced by a hot spare.
To create a pool with hot spares, specify a
.Sy spare
vdev with any number of devices.
For example,
.Dl # Nm zpool Cm create Ar pool Sy mirror Ar sda sdb Sy spare Ar sdc sdd
.Pp
Spares can be shared across multiple pools, and can be added with the
.Nm zpool Cm add
command and removed with the
.Nm zpool Cm remove
command.
Once a spare replacement is initiated, a new
.Sy spare
vdev is created within the configuration that will remain there until the
original device is replaced.
At this point, the hot spare becomes available again if another device fails.
.Pp
If a pool has a shared spare that is currently being used, the pool can not be
exported since other pools may use this shared spare, which may lead to
potential data corruption.
.Pp
Shared spares add some risk.
If the pools are imported on different hosts,
and both pools suffer a device failure at the same time,
both could attempt to use the spare at the same time.
This may not be detected, resulting in data corruption.
.Pp
An in-progress spare replacement can be cancelled by detaching the hot spare.
If the original faulted device is detached, then the hot spare assumes its
place in the configuration, and is removed from the spare list of all active
pools.
.Pp
The
.Sy draid
vdev type provides distributed hot spares.
These hot spares are named after the dRAID vdev they're a part of
.Po Sy draid1 Ns - Ns Ar 2 Ns - Ns Ar 3 No specifies spare Ar 3 No of vdev Ar 2 ,
.No which is a single parity dRAID Pc
and may only be used by that dRAID vdev.
Otherwise, they behave the same as normal hot spares.
.Pp
Spares cannot replace log devices.
.
.Ss Intent Log
The ZFS Intent Log (ZIL) satisfies POSIX requirements for synchronous
transactions.
For instance, databases often require their transactions to be on stable storage
devices when returning from a system call.
NFS and other applications can also use
.Xr fsync 2
to ensure data stability.
By default, the intent log is allocated from blocks within the main pool.
However, it might be possible to get better performance using separate intent
log devices such as NVRAM or a dedicated disk.
For example:
.Dl # Nm zpool Cm create Ar pool sda sdb Sy log Ar sdc
.Pp
Multiple log devices can also be specified, and they can be mirrored.
See the
.Sx EXAMPLES
section for an example of mirroring multiple log devices.
.Pp
Log devices can be added, replaced, attached, detached and removed.
In addition, log devices are imported and exported as part of the pool
that contains them.
Mirrored devices can be removed by specifying the top-level mirror vdev.
.
.Ss Cache Devices
Devices can be added to a storage pool as
.Qq cache devices .
These devices provide an additional layer of caching between main memory and
disk.
For read-heavy workloads, where the working set size is much larger than what
can be cached in main memory, using cache devices allows much more of this
working set to be served from low latency media.
Using cache devices provides the greatest performance improvement for random
read-workloads of mostly static content.
.Pp
To create a pool with cache devices, specify a
.Sy cache
vdev with any number of devices.
For example:
.Dl # Nm zpool Cm create Ar pool sda sdb Sy cache Ar sdc sdd
.Pp
Cache devices cannot be mirrored or part of a raidz configuration.
If a read error is encountered on a cache device, that read I/O is reissued to
the original storage pool device, which might be part of a mirrored or raidz
configuration.
.Pp
The content of the cache devices is persistent across reboots and restored
asynchronously when importing the pool in L2ARC (persistent L2ARC).
This can be disabled by setting
.Sy l2arc_rebuild_enabled Ns = Ns Sy 0 .
For cache devices smaller than
.Em 1GB ,
we do not write the metadata structures
required for rebuilding the L2ARC in order not to waste space.
This can be changed with
.Sy l2arc_rebuild_blocks_min_l2size .
The cache device header
.Pq Em 512B
is updated even if no metadata structures are written.
Setting
.Sy l2arc_headroom Ns = Ns Sy 0
will result in scanning the full-length ARC lists for cacheable content to be
written in L2ARC (persistent ARC).
If a cache device is added with
.Nm zpool Cm add
its label and header will be overwritten and its contents are not going to be
restored in L2ARC, even if the device was previously part of the pool.
If a cache device is onlined with
.Nm zpool Cm online
its contents will be restored in L2ARC.
This is useful in case of memory pressure
where the contents of the cache device are not fully restored in L2ARC.
The user can off- and online the cache device when there is less memory pressure
in order to fully restore its contents to L2ARC.
.
.Ss Pool checkpoint
Before starting critical procedures that include destructive actions
.Pq like Nm zfs Cm destroy ,
an administrator can checkpoint the pool's state and in the case of a
mistake or failure, rewind the entire pool back to the checkpoint.
Otherwise, the checkpoint can be discarded when the procedure has completed
successfully.
.Pp
A pool checkpoint can be thought of as a pool-wide snapshot and should be used
with care as it contains every part of the pool's state, from properties to vdev
configuration.
Thus, certain operations are not allowed while a pool has a checkpoint.
Specifically, vdev removal/attach/detach, mirror splitting, and
changing the pool's GUID.
Adding a new vdev is supported, but in the case of a rewind it will have to be
added again.
Finally, users of this feature should keep in mind that scrubs in a pool that
has a checkpoint do not repair checkpointed data.
.Pp
To create a checkpoint for a pool:
.Dl # Nm zpool Cm checkpoint Ar pool
.Pp
To later rewind to its checkpointed state, you need to first export it and
then rewind it during import:
.Dl # Nm zpool Cm export Ar pool
.Dl # Nm zpool Cm import Fl -rewind-to-checkpoint Ar pool
.Pp
To discard the checkpoint from a pool:
.Dl # Nm zpool Cm checkpoint Fl d Ar pool
.Pp
Dataset reservations (controlled by the
.Sy reservation No and Sy refreservation
properties) may be unenforceable while a checkpoint exists, because the
checkpoint is allowed to consume the dataset's reservation.
Finally, data that is part of the checkpoint but has been freed in the
current state of the pool won't be scanned during a scrub.
.
.Ss Special Allocation Class
Allocations in the special class are dedicated to specific block types.
By default this includes all metadata, the indirect blocks of user data, and
any deduplication tables.
The class can also be provisioned to accept small file blocks.
.Pp
A pool must always have at least one normal
.Pq non- Ns Sy dedup Ns /- Ns Sy special
vdev before
other devices can be assigned to the special class.
If the
.Sy special
class becomes full, then allocations intended for it
will spill back into the normal class.
.Pp
Deduplication tables can be excluded from the special class by unsetting the
.Sy zfs_ddt_data_is_special
ZFS module parameter.
.Pp
Inclusion of small file blocks in the special class is opt-in.
Each dataset can control the size of small file blocks allowed
in the special class by setting the
.Sy special_small_blocks
property to nonzero.
See
.Xr zfsprops 7
for more info on this property.
+412
View File
@@ -0,0 +1,412 @@
.\"
.\" CDDL HEADER START
.\"
.\" The contents of this file are subject to the terms of the
.\" Common Development and Distribution License (the "License").
.\" You may not use this file except in compliance with the License.
.\"
.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
.\" or http://www.opensolaris.org/os/licensing.
.\" See the License for the specific language governing permissions
.\" and limitations under the License.
.\"
.\" When distributing Covered Code, include this CDDL HEADER in each
.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
.\" If applicable, add the following below this CDDL HEADER, with the
.\" fields enclosed by brackets "[]" replaced with your own identifying
.\" information: Portions Copyright [yyyy] [name of copyright owner]
.\"
.\" CDDL HEADER END
.\"
.\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved.
.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved.
.\" Copyright (c) 2012 Cyril Plisko. All Rights Reserved.
.\" Copyright (c) 2017 Datto Inc.
.\" Copyright (c) 2018 George Melikov. All Rights Reserved.
.\" Copyright 2017 Nexenta Systems, Inc.
.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
.\" Copyright (c) 2021, Colm Buckley <colm@tuatha.org>
.\"
.Dd May 27, 2021
.Dt ZPOOLPROPS 7
.Os
.
.Sh NAME
.Nm zpoolprops
.Nd properties of ZFS storage pools
.
.Sh DESCRIPTION
Each pool has several properties associated with it.
Some properties are read-only statistics while others are configurable and
change the behavior of the pool.
.Pp
The following are read-only properties:
.Bl -tag -width "unsupported@guid"
.It Cm allocated
Amount of storage used within the pool.
See
.Sy fragmentation
and
.Sy free
for more information.
.It Sy capacity
Percentage of pool space used.
This property can also be referred to by its shortened column name,
.Sy cap .
.It Sy expandsize
Amount of uninitialized space within the pool or device that can be used to
increase the total capacity of the pool.
On whole-disk vdevs, this is the space beyond the end of the GPT
typically occurring when a LUN is dynamically expanded
or a disk replaced with a larger one.
On partition vdevs, this is the space appended to the partition after it was
added to the pool most likely by resizing it in-place.
The space can be claimed for the pool by bringing it online with
.Sy autoexpand=on
or using
.Nm zpool Cm online Fl e .
.It Sy fragmentation
The amount of fragmentation in the pool.
As the amount of space
.Sy allocated
increases, it becomes more difficult to locate
.Sy free
space.
This may result in lower write performance compared to pools with more
unfragmented free space.
.It Sy free
The amount of free space available in the pool.
By contrast, the
.Xr zfs 8
.Sy available
property describes how much new data can be written to ZFS filesystems/volumes.
The zpool
.Sy free
property is not generally useful for this purpose, and can be substantially more than the zfs
.Sy available
space.
This discrepancy is due to several factors, including raidz parity;
zfs reservation, quota, refreservation, and refquota properties; and space set aside by
.Sy spa_slop_shift
(see
.Xr zfs 4
for more information).
.It Sy freeing
After a file system or snapshot is destroyed, the space it was using is
returned to the pool asynchronously.
.Sy freeing
is the amount of space remaining to be reclaimed.
Over time
.Sy freeing
will decrease while
.Sy free
increases.
.It Sy health
The current health of the pool.
Health can be one of
.Sy ONLINE , DEGRADED , FAULTED , OFFLINE, REMOVED , UNAVAIL .
.It Sy guid
A unique identifier for the pool.
.It Sy load_guid
A unique identifier for the pool.
Unlike the
.Sy guid
property, this identifier is generated every time we load the pool (i.e. does
not persist across imports/exports) and never changes while the pool is loaded
(even if a
.Sy reguid
operation takes place).
.It Sy size
Total size of the storage pool.
.It Sy unsupported@ Ns Em guid
Information about unsupported features that are enabled on the pool.
See
.Xr zpool-features 7
for details.
.El
.Pp
The space usage properties report actual physical space available to the
storage pool.
The physical space can be different from the total amount of space that any
contained datasets can actually use.
The amount of space used in a raidz configuration depends on the characteristics
of the data being written.
In addition, ZFS reserves some space for internal accounting that the
.Xr zfs 8
command takes into account, but the
.Nm
command does not.
For non-full pools of a reasonable size, these effects should be invisible.
For small pools, or pools that are close to being completely full, these
discrepancies may become more noticeable.
.Pp
The following property can be set at creation time and import time:
.Bl -tag -width Ds
.It Sy altroot
Alternate root directory.
If set, this directory is prepended to any mount points within the pool.
This can be used when examining an unknown pool where the mount points cannot be
trusted, or in an alternate boot environment, where the typical paths are not
valid.
.Sy altroot
is not a persistent property.
It is valid only while the system is up.
Setting
.Sy altroot
defaults to using
.Sy cachefile Ns = Ns Sy none ,
though this may be overridden using an explicit setting.
.El
.Pp
The following property can be set only at import time:
.Bl -tag -width Ds
.It Sy readonly Ns = Ns Sy on Ns | Ns Sy off
If set to
.Sy on ,
the pool will be imported in read-only mode.
This property can also be referred to by its shortened column name,
.Sy rdonly .
.El
.Pp
The following properties can be set at creation time and import time, and later
changed with the
.Nm zpool Cm set
command:
.Bl -tag -width Ds
.It Sy ashift Ns = Ns Sy ashift
Pool sector size exponent, to the power of
.Sy 2
(internally referred to as
.Sy ashift ) .
Values from 9 to 16, inclusive, are valid; also, the
value 0 (the default) means to auto-detect using the kernel's block
layer and a ZFS internal exception list.
I/O operations will be aligned to the specified size boundaries.
Additionally, the minimum (disk)
write size will be set to the specified size, so this represents a
space vs. performance trade-off.
For optimal performance, the pool sector size should be greater than
or equal to the sector size of the underlying disks.
The typical case for setting this property is when
performance is important and the underlying disks use 4KiB sectors but
report 512B sectors to the OS (for compatibility reasons); in that
case, set
.Sy ashift Ns = Ns Sy 12
(which is
.Sy 1<<12 No = Sy 4096 ) .
When set, this property is
used as the default hint value in subsequent vdev operations (add,
attach and replace).
Changing this value will not modify any existing
vdev, not even on disk replacement; however it can be used, for
instance, to replace a dying 512B sectors disk with a newer 4KiB
sectors device: this will probably result in bad performance but at the
same time could prevent loss of data.
.It Sy autoexpand Ns = Ns Sy on Ns | Ns Sy off
Controls automatic pool expansion when the underlying LUN is grown.
If set to
.Sy on ,
the pool will be resized according to the size of the expanded device.
If the device is part of a mirror or raidz then all devices within that
mirror/raidz group must be expanded before the new space is made available to
the pool.
The default behavior is
.Sy off .
This property can also be referred to by its shortened column name,
.Sy expand .
.It Sy autoreplace Ns = Ns Sy on Ns | Ns Sy off
Controls automatic device replacement.
If set to
.Sy off ,
device replacement must be initiated by the administrator by using the
.Nm zpool Cm replace
command.
If set to
.Sy on ,
any new device, found in the same physical location as a device that previously
belonged to the pool, is automatically formatted and replaced.
The default behavior is
.Sy off .
This property can also be referred to by its shortened column name,
.Sy replace .
Autoreplace can also be used with virtual disks (like device
mapper) provided that you use the /dev/disk/by-vdev paths setup by
vdev_id.conf.
See the
.Xr vdev_id 8
manual page for more details.
Autoreplace and autoonline require the ZFS Event Daemon be configured and
running.
See the
.Xr zed 8
manual page for more details.
.It Sy autotrim Ns = Ns Sy on Ns | Ns Sy off
When set to
.Sy on
space which has been recently freed, and is no longer allocated by the pool,
will be periodically trimmed.
This allows block device vdevs which support
BLKDISCARD, such as SSDs, or file vdevs on which the underlying file system
supports hole-punching, to reclaim unused blocks.
The default value for this property is
.Sy off .
.Pp
Automatic TRIM does not immediately reclaim blocks after a free.
Instead, it will optimistically delay allowing smaller ranges to be aggregated
into a few larger ones.
These can then be issued more efficiently to the storage.
TRIM on L2ARC devices is enabled by setting
.Sy l2arc_trim_ahead > 0 .
.Pp
Be aware that automatic trimming of recently freed data blocks can put
significant stress on the underlying storage devices.
This will vary depending of how well the specific device handles these commands.
For lower-end devices it is often possible to achieve most of the benefits
of automatic trimming by running an on-demand (manual) TRIM periodically
using the
.Nm zpool Cm trim
command.
.It Sy bootfs Ns = Ns Sy (unset) Ns | Ns Ar pool Ns Op / Ns Ar dataset
Identifies the default bootable dataset for the root pool.
This property is expected to be set mainly by the installation and upgrade programs.
Not all Linux distribution boot processes use the bootfs property.
.It Sy cachefile Ns = Ns Ar path Ns | Ns Sy none
Controls the location of where the pool configuration is cached.
Discovering all pools on system startup requires a cached copy of the
configuration data that is stored on the root file system.
All pools in this cache are automatically imported when the system boots.
Some environments, such as install and clustering, need to cache this
information in a different location so that pools are not automatically
imported.
Setting this property caches the pool configuration in a different location that
can later be imported with
.Nm zpool Cm import Fl c .
Setting it to the value
.Sy none
creates a temporary pool that is never cached, and the
.Qq
.Pq empty string
uses the default location.
.Pp
Multiple pools can share the same cache file.
Because the kernel destroys and recreates this file when pools are added and
removed, care should be taken when attempting to access this file.
When the last pool using a
.Sy cachefile
is exported or destroyed, the file will be empty.
.It Sy comment Ns = Ns Ar text
A text string consisting of printable ASCII characters that will be stored
such that it is available even if the pool becomes faulted.
An administrator can provide additional information about a pool using this
property.
.It Sy compatibility Ns = Ns Sy off Ns | Ns Sy legacy Ns | Ns Ar file Ns Oo , Ns Ar file Oc Ns …
Specifies that the pool maintain compatibility with specific feature sets.
When set to
.Sy off
(or unset) compatibility is disabled (all features may be enabled); when set to
.Sy legacy Ns
no features may be enabled.
When set to a comma-separated list of filenames
(each filename may either be an absolute path, or relative to
.Pa /etc/zfs/compatibility.d
or
.Pa /usr/share/zfs/compatibility.d )
the lists of requested features are read from those files, separated by
whitespace and/or commas.
Only features present in all files may be enabled.
.Pp
See
.Xr zpool-features 7 ,
.Xr zpool-create 8
and
.Xr zpool-upgrade 8
for more information on the operation of compatibility feature sets.
.It Sy dedupditto Ns = Ns Ar number
This property is deprecated and no longer has any effect.
.It Sy delegation Ns = Ns Sy on Ns | Ns Sy off
Controls whether a non-privileged user is granted access based on the dataset
permissions defined on the dataset.
See
.Xr zfs 8
for more information on ZFS delegated administration.
.It Sy failmode Ns = Ns Sy wait Ns | Ns Sy continue Ns | Ns Sy panic
Controls the system behavior in the event of catastrophic pool failure.
This condition is typically a result of a loss of connectivity to the underlying
storage device(s) or a failure of all devices within the pool.
The behavior of such an event is determined as follows:
.Bl -tag -width "continue"
.It Sy wait
Blocks all I/O access until the device connectivity is recovered and the errors
are cleared.
This is the default behavior.
.It Sy continue
Returns
.Er EIO
to any new write I/O requests but allows reads to any of the remaining healthy
devices.
Any write requests that have yet to be committed to disk would be blocked.
.It Sy panic
Prints out a message to the console and generates a system crash dump.
.El
.It Sy feature@ Ns Ar feature_name Ns = Ns Sy enabled
The value of this property is the current state of
.Ar feature_name .
The only valid value when setting this property is
.Sy enabled
which moves
.Ar feature_name
to the enabled state.
See
.Xr zpool-features 7
for details on feature states.
.It Sy listsnapshots Ns = Ns Sy on Ns | Ns Sy off
Controls whether information about snapshots associated with this pool is
output when
.Nm zfs Cm list
is run without the
.Fl t
option.
The default value is
.Sy off .
This property can also be referred to by its shortened name,
.Sy listsnaps .
.It Sy multihost Ns = Ns Sy on Ns | Ns Sy off
Controls whether a pool activity check should be performed during
.Nm zpool Cm import .
When a pool is determined to be active it cannot be imported, even with the
.Fl f
option.
This property is intended to be used in failover configurations
where multiple hosts have access to a pool on shared storage.
.Pp
Multihost provides protection on import only.
It does not protect against an
individual device being used in multiple pools, regardless of the type of vdev.
See the discussion under
.Nm zpool Cm create .
.Pp
When this property is on, periodic writes to storage occur to show the pool is
in use.
See
.Sy zfs_multihost_interval
in the
.Xr zfs 4
manual page.
In order to enable this property each host must set a unique hostid.
See
.Xr genhostid 1
.Xr zgenhostid 8
.Xr spl 4
for additional details.
The default value is
.Sy off .
.It Sy version Ns = Ns Ar version
The current on-disk version of the pool.
This can be increased, but never decreased.
The preferred method of updating pools is with the
.Nm zpool Cm upgrade
command, though this property can be used when a specific version is needed for
backwards compatibility.
Once feature flags are enabled on a pool this property will no longer have a
value.
.El