Fletcher4 implementation using avx512f instruction set

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-22 18:40:43 +03:00

Algorithm runs 8 parallel sums, consuming 8x uint32_t elements per
loop iteration. Size alignment of main fletcher4 methods is adjusted
accordingly. New implementation is called 'avx512f'.

Note: byteswap method can be implemented more efficiently when avx512bw hardware
becomes available. Currently, it is ~ 2x slower than native method.

Table shows result of full (native) fletcher4 calculation for different buffer size:

fletcher4   4KB     16KB    64KB    128KB   256KB   1MB     16MB
--------------------------------------------------------------------
[scalar]    1213    1228    1231    1231    1225    1200    1160
[sse2]      2374    2442    2459    2456    2462    2250    2220
[avx2]      4288    4753    4871    4893    4900    4050    3882
[avx512f]   5975    8445    9196    9221    9262    6307    5620

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4952

This commit is contained in:

Gvozden Neskovic

2016-07-06 13:42:04 +02:00

committed by

Brian Behlendorf

parent 32ffaa3de5

commit 70b258fc96

6 changed files with 182 additions and 10 deletions

									
										lib/libzpool/Makefile.am
									
		+1
		
												View File
												
				@@ -24,6 +24,7 @@ KERNEL_C = \

					zfs_fletcher.c \

					zfs_fletcher_intel.c \

					zfs_fletcher_sse.c \

					zfs_fletcher_avx512.c \

					zfs_namecheck.c \

					zfs_prop.c \

					zfs_uio.c \