Add AVX512BW variant of fletcher

It is much faster than AVX512F when byteswapping on Skylake-SP
and newer, as we can do the byteswap in a single vshufb instead
of many instructions.

Reviewed by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net>
Closes #9517
This commit is contained in:
Romain Dolbeau
2019-10-30 20:26:14 +01:00
committed by Brian Behlendorf
parent bae11ba8dc
commit 0b2a642351
4 changed files with 57 additions and 1 deletions
+1 -1
View File
@@ -1507,7 +1507,7 @@ Default value: \fB20\fR% of \fBzfs_dirty_data_max\fR.
Select a fletcher 4 implementation.
.sp
Supported selectors are: \fBfastest\fR, \fBscalar\fR, \fBsse2\fR, \fBssse3\fR,
\fBavx2\fR, \fBavx512f\fR, and \fBaarch64_neon\fR.
\fBavx2\fR, \fBavx512f\fR, \fBavx512bw\fR, and \fBaarch64_neon\fR.
All of the selectors except \fBfastest\fR and \fBscalar\fR require instruction
set extensions to be available and will only appear if ZFS detects that they are
present at runtime. If multiple implementations of fletcher 4 are available,