Implementation of SSE optimized Fletcher-4

Builds off of 1eeb4562 (Implementation of AVX2 optimized Fletcher-4)
This commit adds another implementation of the Fletcher-4 algorithm.
It is automatically selected at module load if it benchmarks higher
than all other available implementations.

The module benchmark was also amended to analyze the performance of
the byteswap-ed version of Fletcher-4, as well as the non-byteswaped
version. The average performance of the two is used to select the
the fastest implementation available on the host system.

Adds a pair of fields to an existing zcommon module parameter:
-  zfs_fletcher_4_impl (str)
    "sse2"    - new SSE2 implementation if available
    "ssse3"   - new SSSE3 implementation if available

Signed-off-by: Tyler J. Stachecki <stachecki.tyler@gmail.com>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4789
This commit is contained in:
Tyler J. Stachecki
2016-06-23 23:32:40 -04:00
committed by Brian Behlendorf
parent dfbc86309f
commit 35a76a0366
6 changed files with 243 additions and 5 deletions
+8
View File
@@ -61,6 +61,14 @@ typedef struct fletcher_4_func {
const char *name;
} fletcher_4_ops_t;
#if defined(HAVE_SSE2)
extern const fletcher_4_ops_t fletcher_4_sse2_ops;
#endif
#if defined(HAVE_SSE2) && defined(HAVE_SSSE3)
extern const fletcher_4_ops_t fletcher_4_ssse3_ops;
#endif
#if defined(HAVE_AVX) && defined(HAVE_AVX2)
extern const fletcher_4_ops_t fletcher_4_avx2_ops;
#endif