mirror of
https://git.proxmox.com/git/mirror_zfs.git
synced 2024-11-17 18:11:00 +03:00
f1512ee61e
5027 zfs large block support Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5027 https://github.com/illumos/illumos-gate/commit/b515258 Porting Notes: * Included in this patch is a tiny ISP2() cleanup in zio_init() from Illumos 5255. * Unlike the upstream Illumos commit this patch does not impose an arbitrary 128K block size limit on volumes. Volumes, like filesystems, are limited by the zfs_max_recordsize=1M module option. * By default the maximum record size is limited to 1M by the module option zfs_max_recordsize. This value may be safely increased up to 16M which is the largest block size supported by the on-disk format. At the moment, 1M blocks clearly offer a significant performance improvement but the benefits of going beyond this for the majority of workloads are less clear. * The illumos version of this patch increased DMU_MAX_ACCESS to 32M. This was determined not to be large enough when using 16M blocks because the zfs_make_xattrdir() function will fail (EFBIG) when assigning a TX. This was immediately observed under Linux because all newly created files must have a security xattr created and that was failing. Therefore, we've set DMU_MAX_ACCESS to 64M. * On 32-bit platforms a hard limit of 1M is set for blocks due to the limited virtual address space. We should be able to relax this one the ABD patches are merged. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #354
437 lines
13 KiB
Groff
437 lines
13 KiB
Groff
'\" te
|
|
.\" Copyright (c) 2013 by Delphix. All rights reserved.
|
|
.\" Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
|
|
.\" Copyright (c) 2014, Joyent, Inc. All rights reserved.
|
|
.\" The contents of this file are subject to the terms of the Common Development
|
|
.\" and Distribution License (the "License"). You may not use this file except
|
|
.\" in compliance with the License. You can obtain a copy of the license at
|
|
.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
|
|
.\"
|
|
.\" See the License for the specific language governing permissions and
|
|
.\" limitations under the License. When distributing Covered Code, include this
|
|
.\" CDDL HEADER in each file and include the License file at
|
|
.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this
|
|
.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
|
|
.\" own identifying information:
|
|
.\" Portions Copyright [yyyy] [name of copyright owner]
|
|
.TH ZPOOL-FEATURES 5 "Aug 27, 2013"
|
|
.SH NAME
|
|
zpool\-features \- ZFS pool feature descriptions
|
|
.SH DESCRIPTION
|
|
.sp
|
|
.LP
|
|
ZFS pool on\-disk format versions are specified via "features" which replace
|
|
the old on\-disk format numbers (the last supported on\-disk format number is
|
|
28). To enable a feature on a pool use the \fBupgrade\fR subcommand of the
|
|
\fBzpool\fR(8) command, or set the \fBfeature@\fR\fIfeature_name\fR property
|
|
to \fBenabled\fR.
|
|
.sp
|
|
.LP
|
|
The pool format does not affect file system version compatibility or the ability
|
|
to send file systems between pools.
|
|
.sp
|
|
.LP
|
|
Since most features can be enabled independently of each other the on\-disk
|
|
format of the pool is specified by the set of all features marked as
|
|
\fBactive\fR on the pool. If the pool was created by another software version
|
|
this set may include unsupported features.
|
|
.SS "Identifying features"
|
|
.sp
|
|
.LP
|
|
Every feature has a guid of the form \fIcom.example:feature_name\fR. The reverse
|
|
DNS name ensures that the feature's guid is unique across all ZFS
|
|
implementations. When unsupported features are encountered on a pool they will
|
|
be identified by their guids. Refer to the documentation for the ZFS
|
|
implementation that created the pool for information about those features.
|
|
.sp
|
|
.LP
|
|
Each supported feature also has a short name. By convention a feature's short
|
|
name is the portion of its guid which follows the ':' (e.g.
|
|
\fIcom.example:feature_name\fR would have the short name \fIfeature_name\fR),
|
|
however a feature's short name may differ across ZFS implementations if
|
|
following the convention would result in name conflicts.
|
|
.SS "Feature states"
|
|
.sp
|
|
.LP
|
|
Features can be in one of three states:
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBactive\fR\fR
|
|
.ad
|
|
.RS 12n
|
|
This feature's on\-disk format changes are in effect on the pool. Support for
|
|
this feature is required to import the pool in read\-write mode. If this
|
|
feature is not read-only compatible, support is also required to import the pool
|
|
in read\-only mode (see "Read\-only compatibility").
|
|
.RE
|
|
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBenabled\fR\fR
|
|
.ad
|
|
.RS 12n
|
|
An administrator has marked this feature as enabled on the pool, but the
|
|
feature's on\-disk format changes have not been made yet. The pool can still be
|
|
imported by software that does not support this feature, but changes may be made
|
|
to the on\-disk format at any time which will move the feature to the
|
|
\fBactive\fR state. Some features may support returning to the \fBenabled\fR
|
|
state after becoming \fBactive\fR. See feature\-specific documentation for
|
|
details.
|
|
.RE
|
|
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fBdisabled\fR
|
|
.ad
|
|
.RS 12n
|
|
This feature's on\-disk format changes have not been made and will not be made
|
|
unless an administrator moves the feature to the \fBenabled\fR state. Features
|
|
cannot be disabled once they have been enabled.
|
|
.RE
|
|
|
|
.sp
|
|
.LP
|
|
The state of supported features is exposed through pool properties of the form
|
|
\fIfeature@short_name\fR.
|
|
.SS "Read\-only compatibility"
|
|
.sp
|
|
.LP
|
|
Some features may make on\-disk format changes that do not interfere with other
|
|
software's ability to read from the pool. These features are referred to as
|
|
"read\-only compatible". If all unsupported features on a pool are read\-only
|
|
compatible, the pool can be imported in read\-only mode by setting the
|
|
\fBreadonly\fR property during import (see \fBzpool\fR(8) for details on
|
|
importing pools).
|
|
.SS "Unsupported features"
|
|
.sp
|
|
.LP
|
|
For each unsupported feature enabled on an imported pool a pool property
|
|
named \fIunsupported@feature_guid\fR will indicate why the import was allowed
|
|
despite the unsupported feature. Possible values for this property are:
|
|
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBinactive\fR\fR
|
|
.ad
|
|
.RS 12n
|
|
The feature is in the \fBenabled\fR state and therefore the pool's on\-disk
|
|
format is still compatible with software that does not support this feature.
|
|
.RE
|
|
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBreadonly\fR\fR
|
|
.ad
|
|
.RS 12n
|
|
The feature is read\-only compatible and the pool has been imported in
|
|
read\-only mode.
|
|
.RE
|
|
|
|
.SS "Feature dependencies"
|
|
.sp
|
|
.LP
|
|
Some features depend on other features being enabled in order to function
|
|
properly. Enabling a feature will automatically enable any features it
|
|
depends on.
|
|
.SH FEATURES
|
|
.sp
|
|
.LP
|
|
The following features are supported on this system:
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBasync_destroy\fR\fR
|
|
.ad
|
|
.RS 4n
|
|
.TS
|
|
l l .
|
|
GUID com.delphix:async_destroy
|
|
READ\-ONLY COMPATIBLE yes
|
|
DEPENDENCIES none
|
|
.TE
|
|
|
|
Destroying a file system requires traversing all of its data in order to
|
|
return its used space to the pool. Without \fBasync_destroy\fR the file system
|
|
is not fully removed until all space has been reclaimed. If the destroy
|
|
operation is interrupted by a reboot or power outage the next attempt to open
|
|
the pool will need to complete the destroy operation synchronously.
|
|
|
|
When \fBasync_destroy\fR is enabled the file system's data will be reclaimed
|
|
by a background process, allowing the destroy operation to complete without
|
|
traversing the entire file system. The background process is able to resume
|
|
interrupted destroys after the pool has been opened, eliminating the need
|
|
to finish interrupted destroys as part of the open operation. The amount
|
|
of space remaining to be reclaimed by the background process is available
|
|
through the \fBfreeing\fR property.
|
|
|
|
This feature is only \fBactive\fR while \fBfreeing\fR is non\-zero.
|
|
.RE
|
|
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBempty_bpobj\fR\fR
|
|
.ad
|
|
.RS 4n
|
|
.TS
|
|
l l .
|
|
GUID com.delphix:empty_bpobj
|
|
READ\-ONLY COMPATIBLE yes
|
|
DEPENDENCIES none
|
|
.TE
|
|
|
|
This feature increases the performance of creating and using a large
|
|
number of snapshots of a single filesystem or volume, and also reduces
|
|
the disk space required.
|
|
|
|
When there are many snapshots, each snapshot uses many Block Pointer
|
|
Objects (bpobj's) to track blocks associated with that snapshot.
|
|
However, in common use cases, most of these bpobj's are empty. This
|
|
feature allows us to create each bpobj on-demand, thus eliminating the
|
|
empty bpobjs.
|
|
|
|
This feature is \fBactive\fR while there are any filesystems, volumes,
|
|
or snapshots which were created after enabling this feature.
|
|
.RE
|
|
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBfilesystem_limits\fR\fR
|
|
.ad
|
|
.RS 4n
|
|
.TS
|
|
l l .
|
|
GUID com.joyent:filesystem_limits
|
|
READ\-ONLY COMPATIBLE yes
|
|
DEPENDENCIES extensible_dataset
|
|
.TE
|
|
|
|
This feature enables filesystem and snapshot limits. These limits can be used
|
|
to control how many filesystems and/or snapshots can be created at the point in
|
|
the tree on which the limits are set.
|
|
|
|
This feature is \fBactive\fR once either of the limit properties has been
|
|
set on a dataset. Once activated the feature is never deactivated.
|
|
.RE
|
|
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBlz4_compress\fR\fR
|
|
.ad
|
|
.RS 4n
|
|
.TS
|
|
l l .
|
|
GUID org.illumos:lz4_compress
|
|
READ\-ONLY COMPATIBLE no
|
|
DEPENDENCIES none
|
|
.TE
|
|
|
|
\fBlz4\fR is a high-performance real-time compression algorithm that
|
|
features significantly faster compression and decompression as well as a
|
|
higher compression ratio than the older \fBlzjb\fR compression.
|
|
Typically, \fBlz4\fR compression is approximately 50% faster on
|
|
compressible data and 200% faster on incompressible data than
|
|
\fBlzjb\fR. It is also approximately 80% faster on decompression, while
|
|
giving approximately 10% better compression ratio.
|
|
|
|
When the \fBlz4_compress\fR feature is set to \fBenabled\fR, the
|
|
administrator can turn on \fBlz4\fR compression on any dataset on the
|
|
pool using the \fBzfs\fR(8) command. Please note that doing so will
|
|
immediately activate the \fBlz4_compress\fR feature on the underlying
|
|
pool using the \fBzfs\fR(1M) command. Also, all newly written metadata
|
|
will be compressed with \fBlz4\fR algorithm. Since this feature is not
|
|
read-only compatible, this operation will render the pool unimportable
|
|
on systems without support for the \fBlz4_compress\fR feature. Booting
|
|
off of \fBlz4\fR-compressed root pools is supported.
|
|
|
|
This feature becomes \fBactive\fR as soon as it is enabled and will
|
|
never return to being \fBenabled\fB.
|
|
.RE
|
|
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBspacemap_histogram\fR\fR
|
|
.ad
|
|
.RS 4n
|
|
.TS
|
|
l l .
|
|
GUID com.delphix:spacemap_histogram
|
|
READ\-ONLY COMPATIBLE yes
|
|
DEPENDENCIES none
|
|
.TE
|
|
|
|
This features allows ZFS to maintain more information about how free space
|
|
is organized within the pool. If this feature is \fBenabled\fR, ZFS will
|
|
set this feature to \fBactive\fR when a new space map object is created or
|
|
an existing space map is upgraded to the new format. Once the feature is
|
|
\fBactive\fR, it will remain in that state until the pool is destroyed.
|
|
|
|
.RE
|
|
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBextensible_dataset\fR\fR
|
|
.ad
|
|
.RS 4n
|
|
.TS
|
|
l l .
|
|
GUID com.delphix:extensible_dataset
|
|
READ\-ONLY COMPATIBLE no
|
|
DEPENDENCIES none
|
|
.TE
|
|
|
|
This feature allows more flexible use of internal ZFS data structures,
|
|
and exists for other features to depend on.
|
|
|
|
This feature will be \fBactive\fR when the first dependent feature uses it,
|
|
and will be returned to the \fBenabled\fR state when all datasets that use
|
|
this feature are destroyed.
|
|
|
|
.RE
|
|
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBbookmarks\fR\fR
|
|
.ad
|
|
.RS 4n
|
|
.TS
|
|
l l .
|
|
GUID com.delphix:bookmarks
|
|
READ\-ONLY COMPATIBLE yes
|
|
DEPENDENCIES extensible_dataset
|
|
.TE
|
|
|
|
This feature enables use of the \fBzfs bookmark\fR subcommand.
|
|
|
|
This feature is \fBactive\fR while any bookmarks exist in the pool.
|
|
All bookmarks in the pool can be listed by running
|
|
\fBzfs list -t bookmark -r \fIpoolname\fR\fR.
|
|
|
|
.RE
|
|
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBenabled_txg\fR\fR
|
|
.ad
|
|
.RS 4n
|
|
.TS
|
|
l l .
|
|
GUID com.delphix:enabled_txg
|
|
READ\-ONLY COMPATIBLE yes
|
|
DEPENDENCIES none
|
|
.TE
|
|
|
|
Once this feature is enabled ZFS records the transaction group number
|
|
in which new features are enabled. This has no user-visible impact,
|
|
but other features may depend on this feature.
|
|
|
|
This feature becomes \fBactive\fR as soon as it is enabled and will
|
|
never return to being \fBenabled\fB.
|
|
|
|
.RE
|
|
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBhole_birth\fR\fR
|
|
.ad
|
|
.RS 4n
|
|
.TS
|
|
l l .
|
|
GUID com.delphix:hole_birth
|
|
READ\-ONLY COMPATIBLE no
|
|
DEPENDENCIES enabled_txg
|
|
.TE
|
|
|
|
This feature improves performance of incremental sends ("zfs send -i")
|
|
and receives for objects with many holes. The most common case of
|
|
hole-filled objects is zvols.
|
|
|
|
An incremental send stream from snapshot \fBA\fR to snapshot \fBB\fR
|
|
contains information about every block that changed between \fBA\fR and
|
|
\fBB\fR. Blocks which did not change between those snapshots can be
|
|
identified and omitted from the stream using a piece of metadata called
|
|
the 'block birth time', but birth times are not recorded for holes (blocks
|
|
filled only with zeroes). Since holes created after \fBA\fR cannot be
|
|
distinguished from holes created before \fBA\fR, information about every
|
|
hole in the entire filesystem or zvol is included in the send stream.
|
|
|
|
For workloads where holes are rare this is not a problem. However, when
|
|
incrementally replicating filesystems or zvols with many holes (for
|
|
example a zvol formatted with another filesystem) a lot of time will
|
|
be spent sending and receiving unnecessary information about holes that
|
|
already exist on the receiving side.
|
|
|
|
Once the \fBhole_birth\fR feature has been enabled the block birth times
|
|
of all new holes will be recorded. Incremental sends between snapshots
|
|
created after this feature is enabled will use this new metadata to avoid
|
|
sending information about holes that already exist on the receiving side.
|
|
|
|
This feature becomes \fBactive\fR as soon as it is enabled and will
|
|
never return to being \fBenabled\fB.
|
|
|
|
.RE
|
|
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBembedded_data\fR\fR
|
|
.ad
|
|
.RS 4n
|
|
.TS
|
|
l l .
|
|
GUID com.delphix:embedded_data
|
|
READ\-ONLY COMPATIBLE no
|
|
DEPENDENCIES none
|
|
.TE
|
|
|
|
This feature improves the performance and compression ratio of
|
|
highly-compressible blocks. Blocks whose contents can compress to 112 bytes
|
|
or smaller can take advantage of this feature.
|
|
|
|
When this feature is enabled, the contents of highly-compressible blocks are
|
|
stored in the block "pointer" itself (a misnomer in this case, as it contains
|
|
the compresseed data, rather than a pointer to its location on disk). Thus
|
|
the space of the block (one sector, typically 512 bytes or 4KB) is saved,
|
|
and no additional i/o is needed to read and write the data block.
|
|
|
|
This feature becomes \fBactive\fR as soon as it is enabled and will
|
|
never return to being \fBenabled\fR.
|
|
|
|
.RE
|
|
|
|
.sp
|
|
.ne 2
|
|
.na
|
|
\fB\fBlarge_blocks\fR\fR
|
|
.ad
|
|
.RS 4n
|
|
.TS
|
|
l l .
|
|
GUID org.open-zfs:large_block
|
|
READ\-ONLY COMPATIBLE no
|
|
DEPENDENCIES extensible_dataset
|
|
.TE
|
|
|
|
The \fBlarge_block\fR feature allows the record size on a dataset to be
|
|
set larger than 128KB.
|
|
|
|
This feature becomes \fBactive\fR once a \fBrecordsize\fR property has been
|
|
set larger than 128KB, and will return to being \fBenabled\fR once all
|
|
filesystems that have ever had their recordsize larger than 128KB are destroyed.
|
|
.RE
|
|
|
|
.SH "SEE ALSO"
|
|
\fBzpool\fR(8)
|