Implementation of AVX2 optimized Fletcher-4

New functionality:
- Preserves existing scalar implementation.
- Adds AVX2 optimized Fletcher-4 computation.
- Fastest routines selected on module load (benchmark).
- Test case for Fletcher-4 added to ztest.

New zcommon module parameters:
-  zfs_fletcher_4_impl (str): selects the implementation to use.
    "fastest" - use the fastest version available
    "cycle"   - cycle trough all available impl for ztest
    "scalar"  - use the original version
    "avx2"    - new AVX2 implementation if available

Performance comparison (Intel i7 CPU, 1MB data buffers):
- Scalar:  4216 MB/s
- AVX2:   14499 MB/s

See contents of `/sys/module/zcommon/parameters/zfs_fletcher_4_impl`
to get list of supported values. If an implementation is not supported
on the system, it will not be shown. Currently selected option is
enclosed in `[]`.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4330
This commit is contained in:
Jinshan Xiong
2015-12-09 15:34:16 -08:00
committed by Brian Behlendorf
parent 8fbbc6b4cf
commit 1eeb4562a7
12 changed files with 589 additions and 70 deletions
+1 -32
View File
@@ -35,6 +35,7 @@
#include <sys/sysmacros.h>
#include <sys/types.h>
#include <sys/fs/zfs.h>
#include <sys/spa_checksum.h>
#ifdef __cplusplus
extern "C" {
@@ -142,12 +143,6 @@ typedef struct dva {
uint64_t dva_word[2];
} dva_t;
/*
* Each block has a 256-bit checksum -- strong enough for cryptographic hashes.
*/
typedef struct zio_cksum {
uint64_t zc_word[4];
} zio_cksum_t;
/*
* Each block is described by its DVAs, time of birth, checksum, etc.
@@ -440,35 +435,9 @@ _NOTE(CONSTCOND) } while (0)
DVA_EQUAL(&(bp1)->blk_dva[1], &(bp2)->blk_dva[1]) && \
DVA_EQUAL(&(bp1)->blk_dva[2], &(bp2)->blk_dva[2]))
#define ZIO_CHECKSUM_EQUAL(zc1, zc2) \
(0 == (((zc1).zc_word[0] - (zc2).zc_word[0]) | \
((zc1).zc_word[1] - (zc2).zc_word[1]) | \
((zc1).zc_word[2] - (zc2).zc_word[2]) | \
((zc1).zc_word[3] - (zc2).zc_word[3])))
#define ZIO_CHECKSUM_IS_ZERO(zc) \
(0 == ((zc)->zc_word[0] | (zc)->zc_word[1] | \
(zc)->zc_word[2] | (zc)->zc_word[3]))
#define ZIO_CHECKSUM_BSWAP(zcp) \
{ \
(zcp)->zc_word[0] = BSWAP_64((zcp)->zc_word[0]); \
(zcp)->zc_word[1] = BSWAP_64((zcp)->zc_word[1]); \
(zcp)->zc_word[2] = BSWAP_64((zcp)->zc_word[2]); \
(zcp)->zc_word[3] = BSWAP_64((zcp)->zc_word[3]); \
}
#define DVA_IS_VALID(dva) (DVA_GET_ASIZE(dva) != 0)
#define ZIO_SET_CHECKSUM(zcp, w0, w1, w2, w3) \
{ \
(zcp)->zc_word[0] = w0; \
(zcp)->zc_word[1] = w1; \
(zcp)->zc_word[2] = w2; \
(zcp)->zc_word[3] = w3; \
}
#define BP_IDENTITY(bp) (ASSERT(!BP_IS_EMBEDDED(bp)), &(bp)->blk_dva[0])
#define BP_IS_GANG(bp) \
(BP_IS_EMBEDDED(bp) ? B_FALSE : DVA_GET_GANG(BP_IDENTITY(bp)))