mirror of
https://git.proxmox.com/git/mirror_zfs.git
synced 2026-05-24 11:18:52 +03:00
Fletcher4 implementation using avx512f instruction set
Algorithm runs 8 parallel sums, consuming 8x uint32_t elements per loop iteration. Size alignment of main fletcher4 methods is adjusted accordingly. New implementation is called 'avx512f'. Note: byteswap method can be implemented more efficiently when avx512bw hardware becomes available. Currently, it is ~ 2x slower than native method. Table shows result of full (native) fletcher4 calculation for different buffer size: fletcher4 4KB 16KB 64KB 128KB 256KB 1MB 16MB -------------------------------------------------------------------- [scalar] 1213 1228 1231 1231 1225 1200 1160 [sse2] 2374 2442 2459 2456 2462 2250 2220 [avx2] 4288 4753 4871 4893 4900 4050 3882 [avx512f] 5975 8445 9196 9221 9262 6307 5620 Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #4952
This commit is contained in:
committed by
Brian Behlendorf
parent
32ffaa3de5
commit
70b258fc96
@@ -883,14 +883,14 @@ Default value: \fB67,108,864\fR.
|
||||
Select a fletcher 4 implementation.
|
||||
.sp
|
||||
Supported selectors are: \fBfastest\fR, \fBscalar\fR, \fBsse2\fR, \fBssse3\fR,
|
||||
and \fBavx2\fR. All of the selectors except \fBfastest\fR and \fBscalar\fR
|
||||
require instruction set extensions to be available and will only appear if ZFS
|
||||
detects that they are present at runtime. If multiple implementations of
|
||||
fletcher 4 are available, the \fBfastest\fR will be chosen using a micro
|
||||
benchmark. Selecting \fBscalar\fR results in the original CPU based calculation
|
||||
being used. Selecting any option other than \fBfastest\fR and \fBscalar\fR
|
||||
results in vector instructions from the respective CPU instruction set being
|
||||
used.
|
||||
\fBavx2\fR, and \fBavx512f\fR.
|
||||
All of the selectors except \fBfastest\fR and \fBscalar\fR require instruction
|
||||
set extensions to be available and will only appear if ZFS detects that they are
|
||||
present at runtime. If multiple implementations of fletcher 4 are available,
|
||||
the \fBfastest\fR will be chosen using a micro benchmark. Selecting \fBscalar\fR
|
||||
results in the original, CPU based calculation, being used. Selecting any option
|
||||
other than \fBfastest\fR and \fBscalar\fR results in vector instructions from
|
||||
the respective CPU instruction set being used.
|
||||
.sp
|
||||
Default value: \fBfastest\fR.
|
||||
.RE
|
||||
|
||||
Reference in New Issue
Block a user