Implementation of SSE optimized Fletcher-4

Builds off of 1eeb4562 (Implementation of AVX2 optimized Fletcher-4)
This commit adds another implementation of the Fletcher-4 algorithm.
It is automatically selected at module load if it benchmarks higher
than all other available implementations.

The module benchmark was also amended to analyze the performance of
the byteswap-ed version of Fletcher-4, as well as the non-byteswaped
version. The average performance of the two is used to select the
the fastest implementation available on the host system.

Adds a pair of fields to an existing zcommon module parameter:
-  zfs_fletcher_4_impl (str)
    "sse2"    - new SSE2 implementation if available
    "ssse3"   - new SSSE3 implementation if available

Signed-off-by: Tyler J. Stachecki <stachecki.tyler@gmail.com>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4789
This commit is contained in:
Tyler J. Stachecki
2016-06-23 23:32:40 -04:00
committed by Brian Behlendorf
parent dfbc86309f
commit 35a76a0366
6 changed files with 243 additions and 5 deletions
+9 -5
View File
@@ -838,11 +838,15 @@ Default value: \fB67,108,864\fR.
.RS 12n
Select a fletcher 4 implementation.
.sp
Supported selectors are: \fBfastest\fR, \fBscalar\fR, and \fBavx2\fR when
AVX2 is supported by the processor. If multiple implementations of fletcher 4
are available the \fBfastest\fR will be chosen using a micro benchmark.
Selecting \fBscalar\fR results in the original CPU based calculation being
used, \fBavx2\fR uses the AVX2 vector instructions to compute a fletcher 4.
Supported selectors are: \fBfastest\fR, \fBscalar\fR, \fBsse2\fR, \fBssse3\fR,
and \fBavx2\fR. All of the selectors except \fBfastest\fR and \fBscalar\fR
require instruction set extensions to be available and will only appear if ZFS
detects that they are present at runtime. If multiple implementations of
fletcher 4 are available, the \fBfastest\fR will be chosen using a micro
benchmark. Selecting \fBscalar\fR results in the original CPU based calculation
being used. Selecting any option other than \fBfastest\fR and \fBscalar\fR
results in vector instructions from the respective CPU instruction set being
used.
.sp
Default value: \fBfastest\fR.
.RE