Implementation of AVX2 optimized Fletcher-4

New functionality:
- Preserves existing scalar implementation.
- Adds AVX2 optimized Fletcher-4 computation.
- Fastest routines selected on module load (benchmark).
- Test case for Fletcher-4 added to ztest.

New zcommon module parameters:
-  zfs_fletcher_4_impl (str): selects the implementation to use.
    "fastest" - use the fastest version available
    "cycle"   - cycle trough all available impl for ztest
    "scalar"  - use the original version
    "avx2"    - new AVX2 implementation if available

Performance comparison (Intel i7 CPU, 1MB data buffers):
- Scalar:  4216 MB/s
- AVX2:   14499 MB/s

See contents of `/sys/module/zcommon/parameters/zfs_fletcher_4_impl`
to get list of supported values. If an implementation is not supported
on the system, it will not be shown. Currently selected option is
enclosed in `[]`.

Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4330
This commit is contained in:
Jinshan Xiong
2015-12-09 15:34:16 -08:00
committed by Brian Behlendorf
parent 8fbbc6b4cf
commit 1eeb4562a7
12 changed files with 589 additions and 70 deletions
+44
View File
@@ -123,6 +123,7 @@
#include <ctype.h>
#include <math.h>
#include <sys/fs/zfs.h>
#include <zfs_fletcher.h>
#include <libnvpair.h>
#ifdef __GLIBC__
#include <execinfo.h> /* for backtrace() */
@@ -327,6 +328,7 @@ ztest_func_t ztest_vdev_aux_add_remove;
ztest_func_t ztest_split_pool;
ztest_func_t ztest_reguid;
ztest_func_t ztest_spa_upgrade;
ztest_func_t ztest_fletcher;
uint64_t zopt_always = 0ULL * NANOSEC; /* all the time */
uint64_t zopt_incessant = 1ULL * NANOSEC / 10; /* every 1/10 second */
@@ -372,6 +374,7 @@ ztest_info_t ztest_info[] = {
ZTI_INIT(ztest_vdev_LUN_growth, 1, &zopt_rarely),
ZTI_INIT(ztest_vdev_add_remove, 1, &ztest_opts.zo_vdevtime),
ZTI_INIT(ztest_vdev_aux_add_remove, 1, &ztest_opts.zo_vdevtime),
ZTI_INIT(ztest_fletcher, 1, &zopt_rarely),
};
#define ZTEST_FUNCS (sizeof (ztest_info) / sizeof (ztest_info_t))
@@ -5496,6 +5499,47 @@ ztest_spa_rename(ztest_ds_t *zd, uint64_t id)
(void) rw_unlock(&ztest_name_lock);
}
void
ztest_fletcher(ztest_ds_t *zd, uint64_t id)
{
hrtime_t end = gethrtime() + NANOSEC;
while (gethrtime() <= end) {
int run_count = 100;
void *buf;
uint32_t size;
int *ptr;
int i;
zio_cksum_t zc_ref;
zio_cksum_t zc_ref_byteswap;
size = ztest_random_blocksize();
buf = umem_alloc(size, UMEM_NOFAIL);
for (i = 0, ptr = buf; i < size / sizeof (*ptr); i++, ptr++)
*ptr = ztest_random(UINT_MAX);
VERIFY0(fletcher_4_impl_set("scalar"));
fletcher_4_native(buf, size, &zc_ref);
fletcher_4_byteswap(buf, size, &zc_ref_byteswap);
VERIFY0(fletcher_4_impl_set("cycle"));
while (run_count-- > 0) {
zio_cksum_t zc;
zio_cksum_t zc_byteswap;
fletcher_4_byteswap(buf, size, &zc_byteswap);
fletcher_4_native(buf, size, &zc);
VERIFY0(bcmp(&zc, &zc_ref, sizeof (zc)));
VERIFY0(bcmp(&zc_byteswap, &zc_ref_byteswap,
sizeof (zc_byteswap)));
}
umem_free(buf, size);
}
}
static int
ztest_check_path(char *path)
{