mirror of
https://git.proxmox.com/git/mirror_zfs.git
synced 2026-05-22 10:37:35 +03:00
Illumos 5027 - zfs large block support
5027 zfs large block support Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5027 https://github.com/illumos/illumos-gate/commit/b515258 Porting Notes: * Included in this patch is a tiny ISP2() cleanup in zio_init() from Illumos 5255. * Unlike the upstream Illumos commit this patch does not impose an arbitrary 128K block size limit on volumes. Volumes, like filesystems, are limited by the zfs_max_recordsize=1M module option. * By default the maximum record size is limited to 1M by the module option zfs_max_recordsize. This value may be safely increased up to 16M which is the largest block size supported by the on-disk format. At the moment, 1M blocks clearly offer a significant performance improvement but the benefits of going beyond this for the majority of workloads are less clear. * The illumos version of this patch increased DMU_MAX_ACCESS to 32M. This was determined not to be large enough when using 16M blocks because the zfs_make_xattrdir() function will fail (EFBIG) when assigning a TX. This was immediately observed under Linux because all newly created files must have a security xattr created and that was failing. Therefore, we've set DMU_MAX_ACCESS to 64M. * On 32-bit platforms a hard limit of 1M is set for blocks due to the limited virtual address space. We should be able to relax this one the ABD patches are merged. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #354
This commit is contained in:
committed by
Brian Behlendorf
parent
3df293404a
commit
f1512ee61e
@@ -945,6 +945,24 @@ Largest data block to write to zil
|
||||
Default value: \fB32,768\fR.
|
||||
.RE
|
||||
|
||||
.sp
|
||||
.ne 2
|
||||
.na
|
||||
\fBzfs_max_recordsize\fR (int)
|
||||
.ad
|
||||
.RS 12n
|
||||
We currently support block sizes from 512 bytes to 16MB. The benefits of
|
||||
larger blocks, and thus larger IO, need to be weighed against the cost of
|
||||
COWing a giant block to modify one byte. Additionally, very large blocks
|
||||
can have an impact on i/o latency, and also potentially on the memory
|
||||
allocator. Therefore, we do not allow the recordsize to be set larger than
|
||||
zfs_max_recordsize (default 1MB). Larger blocks can be created by changing
|
||||
this tunable, and pools with larger blocks can always be imported and used,
|
||||
regardless of this setting.
|
||||
.sp
|
||||
Default value: \fB1,048,576\fR.
|
||||
.RE
|
||||
|
||||
.sp
|
||||
.ne 2
|
||||
.na
|
||||
|
||||
@@ -411,5 +411,26 @@ never return to being \fBenabled\fR.
|
||||
|
||||
.RE
|
||||
|
||||
.sp
|
||||
.ne 2
|
||||
.na
|
||||
\fB\fBlarge_blocks\fR\fR
|
||||
.ad
|
||||
.RS 4n
|
||||
.TS
|
||||
l l .
|
||||
GUID org.open-zfs:large_block
|
||||
READ\-ONLY COMPATIBLE no
|
||||
DEPENDENCIES extensible_dataset
|
||||
.TE
|
||||
|
||||
The \fBlarge_block\fR feature allows the record size on a dataset to be
|
||||
set larger than 128KB.
|
||||
|
||||
This feature becomes \fBactive\fR once a \fBrecordsize\fR property has been
|
||||
set larger than 128KB, and will return to being \fBenabled\fR once all
|
||||
filesystems that have ever had their recordsize larger than 128KB are destroyed.
|
||||
.RE
|
||||
|
||||
.SH "SEE ALSO"
|
||||
\fBzpool\fR(8)
|
||||
|
||||
+36
-5
@@ -174,12 +174,12 @@ zfs \- configures ZFS file systems
|
||||
|
||||
.LP
|
||||
.nf
|
||||
\fBzfs\fR \fBsend\fR [\fB-DnPpRve\fR] [\fB-\fR[\fBiI\fR] \fIsnapshot\fR] \fIsnapshot\fR
|
||||
\fBzfs\fR \fBsend\fR [\fB-DnPpRveL\fR] [\fB-\fR[\fBiI\fR] \fIsnapshot\fR] \fIsnapshot\fR
|
||||
.fi
|
||||
|
||||
.LP
|
||||
.nf
|
||||
\fBzfs\fR \fBsend\fR [\fB-e\fR] [\fB-i \fIsnapshot\fR|\fIbookmark\fR]\fR \fIfilesystem\fR|\fIvolume\fR|\fIsnapshot\fR
|
||||
\fBzfs\fR \fBsend\fR [\fB-eL\fR] [\fB-i \fIsnapshot\fR|\fIbookmark\fR]\fR \fIfilesystem\fR|\fIvolume\fR|\fIsnapshot\fR
|
||||
.fi
|
||||
|
||||
.LP
|
||||
@@ -2706,7 +2706,7 @@ See \fBzpool-features\fR(5) for details on ZFS feature flags and the
|
||||
.sp
|
||||
.ne 2
|
||||
.na
|
||||
\fBzfs send\fR [\fB-DnPpRve\fR] [\fB-\fR[\fBiI\fR] \fIsnapshot\fR] \fIsnapshot\fR
|
||||
\fBzfs send\fR [\fB-DnPpRveL\fR] [\fB-\fR[\fBiI\fR] \fIsnapshot\fR] \fIsnapshot\fR
|
||||
.ad
|
||||
.sp .6
|
||||
.RS 4n
|
||||
@@ -2759,6 +2759,22 @@ If the \fB-i\fR or \fB-I\fR flags are used in conjunction with the \fB-R\fR flag
|
||||
Generate a deduplicated stream. Blocks which would have been sent multiple times in the send stream will only be sent once. The receiving system must also support this feature to receive a deduplicated stream. This flag can be used regardless of the dataset's dedup property, but performance will be much better if the filesystem uses a dedup-capable checksum (eg. sha256).
|
||||
.RE
|
||||
|
||||
.sp
|
||||
.ne 2
|
||||
.mk
|
||||
.na
|
||||
\fB\fB-L\fR\fR
|
||||
.ad
|
||||
.sp .6
|
||||
.RS 4n
|
||||
Generate a stream which may contain blocks larger than 128KB. This flag
|
||||
has no effect if the \fBlarge_blocks\fR pool feature is disabled, or if
|
||||
the \fRrecordsize\fR property of this filesystem has never been set above
|
||||
128KB. The receiving system must have the \fBlarge_blocks\fR pool feature
|
||||
enabled as well. See \fBzpool-features\fR(5) for details on ZFS feature
|
||||
flags and the \fBlarge_blocks\fR feature.
|
||||
.RE
|
||||
|
||||
.sp
|
||||
.ne 2
|
||||
.mk
|
||||
@@ -2828,7 +2844,7 @@ The format of the stream is committed. You will be able to receive your streams
|
||||
.sp
|
||||
.ne 2
|
||||
.na
|
||||
\fBzfs send\fR [\fB-e\fR] [\fB-i\fR \fIsnapshot\fR|\fIbookmark\fR] \fIfilesystem\fR|\fIvolume\fR|\fIsnapshot\fR
|
||||
\fBzfs send\fR [\fB-eL\fR] [\fB-i\fR \fIsnapshot\fR|\fIbookmark\fR] \fIfilesystem\fR|\fIvolume\fR|\fIsnapshot\fR
|
||||
.ad
|
||||
.sp .6
|
||||
.RS 4n
|
||||
@@ -2856,6 +2872,22 @@ be the origin snapshot, or an earlier snapshot in the origin's filesystem,
|
||||
or the origin's origin, etc.
|
||||
.RE
|
||||
|
||||
.sp
|
||||
.ne 2
|
||||
.mk
|
||||
.na
|
||||
\fB\fB-L\fR\fR
|
||||
.ad
|
||||
.sp .6
|
||||
.RS 4n
|
||||
Generate a stream which may contain blocks larger than 128KB. This flag
|
||||
has no effect if the \fBlarge_blocks\fR pool feature is disabled, or if
|
||||
the \fRrecordsize\fR property of this filesystem has never been set above
|
||||
128KB. The receiving system must have the \fBlarge_blocks\fR pool feature
|
||||
enabled as well. See \fBzpool-features\fR(5) for details on ZFS feature
|
||||
flags and the \fBlarge_blocks\fR feature.
|
||||
.RE
|
||||
|
||||
.sp
|
||||
.ne 2
|
||||
.mk
|
||||
@@ -2909,7 +2941,6 @@ The \fB-d\fR and \fB-e\fR options cause the file system name of the target snaps
|
||||
Discard the first element of the sent snapshot's file system name, using the remaining elements to determine the name of the target file system for the new snapshot as described in the paragraph above.
|
||||
.RE
|
||||
|
||||
|
||||
.sp
|
||||
.ne 2
|
||||
.na
|
||||
|
||||
Reference in New Issue
Block a user