Add superscalar fletcher4

This is the Fletcher4 algorithm implemented in pure C, but using multiple counters using algorithms identical to those used for SSE/NEON and AVX2. This allows for faster execution on core with strong superscalar capabilities but weak SIMD capabilities. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net> Closes #5317
2026-05-27 04:32:16 +03:00 · 2016-11-04 18:53:03 +01:00
parent ace1eae84c
commit 7f3194932d
8 changed files with 405 additions and 2 deletions
@@ -2,7 +2,7 @@
 * Implement fast Fletcher4 with SSE2,SSSE3 instructions. (x86)
 *
 * Use the 128-bit SSE2/SSSE3 SIMD instructions and registers to compute
- * Fletcher4 in four incremental 64-bit parallel accumulator streams,
+ * Fletcher4 in two incremental 64-bit parallel accumulator streams,
 * and then combine the streams to form the final four checksum words.
 * This implementation is a derivative of the AVX SIMD implementation by
 * James Guilford and Jinshan Xiong from Intel (see zfs_fletcher_intel.c).