Add AVX512BW variant of fletcher

It is much faster than AVX512F when byteswapping on Skylake-SP
and newer, as we can do the byteswap in a single vshufb instead
of many instructions.

Reviewed by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net>
Closes #9517
This commit is contained in:
Romain Dolbeau
2019-10-30 20:26:14 +01:00
committed by Brian Behlendorf
parent bae11ba8dc
commit 0b2a642351
4 changed files with 57 additions and 1 deletions
+4
View File
@@ -143,6 +143,10 @@ extern const fletcher_4_ops_t fletcher_4_avx2_ops;
extern const fletcher_4_ops_t fletcher_4_avx512f_ops;
#endif
#if defined(__x86_64) && defined(HAVE_AVX512BW)
extern const fletcher_4_ops_t fletcher_4_avx512bw_ops;
#endif
#if defined(__aarch64__)
extern const fletcher_4_ops_t fletcher_4_aarch64_neon_ops;
#endif