Implementation of SSE optimized Fletcher-4

Builds off of 1eeb4562 (Implementation of AVX2 optimized Fletcher-4) This commit adds another implementation of the Fletcher-4 algorithm. It is automatically selected at module load if it benchmarks higher than all other available implementations. The module benchmark was also amended to analyze the performance of the byteswap-ed version of Fletcher-4, as well as the non-byteswaped version. The average performance of the two is used to select the the fastest implementation available on the host system. Adds a pair of fields to an existing zcommon module parameter: - zfs_fletcher_4_impl (str) "sse2" - new SSE2 implementation if available "ssse3" - new SSSE3 implementation if available Signed-off-by: Tyler J. Stachecki <stachecki.tyler@gmail.com> Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4789
2026-05-22 02:27:36 +03:00 · 2016-06-23 23:32:40 -04:00
parent dfbc86309f
commit 35a76a0366
6 changed files with 243 additions and 5 deletions
@@ -838,11 +838,15 @@ Default value: \fB67,108,864\fR.
 .RS 12n
 Select a fletcher 4 implementation.
 .sp
-Supported selectors are: \fBfastest\fR, \fBscalar\fR, and \fBavx2\fR when
-AVX2 is supported by the processor.  If multiple implementations of fletcher 4
-are available the \fBfastest\fR will be chosen using a micro benchmark.
-Selecting \fBscalar\fR results in the original CPU based calculation being
-used, \fBavx2\fR uses the AVX2 vector instructions to compute a fletcher 4.
+Supported selectors are: \fBfastest\fR, \fBscalar\fR, \fBsse2\fR, \fBssse3\fR,
+and \fBavx2\fR. All of the selectors except \fBfastest\fR and \fBscalar\fR
+require instruction set extensions to be available and will only appear if ZFS
+detects that they are present at runtime. If multiple implementations of
+fletcher 4 are available, the \fBfastest\fR will be chosen using a micro
+benchmark. Selecting \fBscalar\fR results in the original CPU based calculation
+being used. Selecting any option other than \fBfastest\fR and \fBscalar\fR
+results in vector instructions from the respective CPU instruction set being
+used.
 .sp
 Default value: \fBfastest\fR.
 .RE