Implementation of AVX2 optimized Fletcher-4

New functionality: - Preserves existing scalar implementation. - Adds AVX2 optimized Fletcher-4 computation. - Fastest routines selected on module load (benchmark). - Test case for Fletcher-4 added to ztest. New zcommon module parameters: - zfs_fletcher_4_impl (str): selects the implementation to use. "fastest" - use the fastest version available "cycle" - cycle trough all available impl for ztest "scalar" - use the original version "avx2" - new AVX2 implementation if available Performance comparison (Intel i7 CPU, 1MB data buffers): - Scalar: 4216 MB/s - AVX2: 14499 MB/s See contents of `/sys/module/zcommon/parameters/zfs_fletcher_4_impl` to get list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com> Signed-off-by: Andreas Dilger <andreas.dilger@intel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4330
2026-05-30 02:34:14 +03:00 · 2015-12-09 15:34:16 -08:00
parent 8fbbc6b4cf
commit 1eeb4562a7
12 changed files with 589 additions and 70 deletions
@@ -830,6 +830,23 @@ Start syncing out a transaction group if there is at least this much dirty data.
 Default value: \fB67,108,864\fR.
 .RE

+.sp
+.ne 2
+.na
+\fBzfs_fletcher_4_impl\fR (string)
+.ad
+.RS 12n
+Select a fletcher 4 implementation.
+.sp
+Supported selectors are: \fBfastest\fR, \fBscalar\fR, and \fBavx2\fR when
+AVX2 is supported by the processor.  If multiple implementations of fletcher 4
+are available the \fBfastest\fR will be chosen using a micro benchmark.
+Selecting \fBscalar\fR results in the original CPU based calculation being
+used, \fBavx2\fR uses the AVX2 vector instructions to compute a fletcher 4.
+.sp
+Default value: \fBfastest\fR.
+.RE
+
 .sp
 .ne 2
 .na