mirror_zfs/include/sys/vdev_raidz_impl.h

411 lines
11 KiB
C
Raw Permalink Normal View History

SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or https://opensource.org/licenses/CDDL-1.0.
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright (C) 2016 Gvozden Nešković. All rights reserved.
*/
#ifndef _VDEV_RAIDZ_H
#define _VDEV_RAIDZ_H
#include <sys/types.h>
#include <sys/debug.h>
#include <sys/kstat.h>
#include <sys/abd.h>
Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid|raidz|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102
2020-11-14 00:51:51 +03:00
#include <sys/vdev_impl.h>
RAID-Z expansion feature This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). == Initiating expansion == A new device (disk) can be attached to an existing RAIDZ vdev, by running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank raidz2-0 sda`. The new device will become part of the RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The `feature@raidz_expansion` on-disk feature flag must be `enabled` to initiate an expansion, and it remains `active` for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. == During expansion == The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with `zpool status`. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. == After expansion == When the expansion completes, the additional space is available for use, and is reflected in the `available` zfs property (as seen in `zfs list`, `df`, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to `zfs list`, `df`, `ls -s`, and similar tools. Sponsored-by: The FreeBSD Foundation Sponsored-by: iXsystems, Inc. Sponsored-by: vStack Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Authored-by: Matthew Ahrens <mahrens@delphix.com> Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com> Contributions-by: Stuart Maybee <stuart.maybee@comcast.net> Contributions-by: Thorsten Behrens <tbehrens@outlook.com> Contributions-by: Fmstrat <nospam@nowsci.com> Contributions-by: Don Brady <dev.fs.zfs@gmail.com> Signed-off-by: Don Brady <dev.fs.zfs@gmail.com> Closes #15022
2023-11-08 21:19:41 +03:00
#include <sys/abd_impl.h>
#include <sys/zfs_rlock.h>
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
#ifdef __cplusplus
extern "C" {
#endif
#define CODE_P (0U)
#define CODE_Q (1U)
#define CODE_R (2U)
#define PARITY_P (1U)
#define PARITY_PQ (2U)
#define PARITY_PQR (3U)
#define TARGET_X (0U)
#define TARGET_Y (1U)
#define TARGET_Z (2U)
/*
* Parity generation methods indexes
*/
enum raidz_math_gen_op {
RAIDZ_GEN_P = 0,
RAIDZ_GEN_PQ,
RAIDZ_GEN_PQR,
RAIDZ_GEN_NUM = 3
};
/*
* Data reconstruction methods indexes
*/
enum raidz_rec_op {
RAIDZ_REC_P = 0,
RAIDZ_REC_Q,
RAIDZ_REC_R,
RAIDZ_REC_PQ,
RAIDZ_REC_PR,
RAIDZ_REC_QR,
RAIDZ_REC_PQR,
RAIDZ_REC_NUM = 7
};
extern const char *const raidz_gen_name[RAIDZ_GEN_NUM];
extern const char *const raidz_rec_name[RAIDZ_REC_NUM];
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
/*
* Methods used to define raidz implementation
*
* @raidz_gen_f Parity generation function
* @par1 pointer to raidz_map
* @raidz_rec_f Data reconstruction function
* @par1 pointer to raidz_map
* @par2 array of reconstruction targets
* @will_work_f Function returns TRUE if impl. is supported on the system
* @init_impl_f Function is called once on init
* @fini_impl_f Function is called once on fini
*/
typedef void (*raidz_gen_f)(void *);
typedef int (*raidz_rec_f)(void *, const int *);
typedef boolean_t (*will_work_f)(void);
typedef void (*init_impl_f)(void);
typedef void (*fini_impl_f)(void);
#define RAIDZ_IMPL_NAME_MAX (20)
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
typedef struct raidz_impl_ops {
init_impl_f init;
fini_impl_f fini;
raidz_gen_f gen[RAIDZ_GEN_NUM]; /* Parity generate functions */
raidz_rec_f rec[RAIDZ_REC_NUM]; /* Data reconstruction functions */
will_work_f is_supported; /* Support check function */
char name[RAIDZ_IMPL_NAME_MAX]; /* Name of the implementation */
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
} raidz_impl_ops_t;
RAID-Z expansion feature This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). == Initiating expansion == A new device (disk) can be attached to an existing RAIDZ vdev, by running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank raidz2-0 sda`. The new device will become part of the RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The `feature@raidz_expansion` on-disk feature flag must be `enabled` to initiate an expansion, and it remains `active` for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. == During expansion == The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with `zpool status`. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. == After expansion == When the expansion completes, the additional space is available for use, and is reflected in the `available` zfs property (as seen in `zfs list`, `df`, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to `zfs list`, `df`, `ls -s`, and similar tools. Sponsored-by: The FreeBSD Foundation Sponsored-by: iXsystems, Inc. Sponsored-by: vStack Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Authored-by: Matthew Ahrens <mahrens@delphix.com> Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com> Contributions-by: Stuart Maybee <stuart.maybee@comcast.net> Contributions-by: Thorsten Behrens <tbehrens@outlook.com> Contributions-by: Fmstrat <nospam@nowsci.com> Contributions-by: Don Brady <dev.fs.zfs@gmail.com> Signed-off-by: Don Brady <dev.fs.zfs@gmail.com> Closes #15022
2023-11-08 21:19:41 +03:00
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
typedef struct raidz_col {
RAID-Z expansion feature This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). == Initiating expansion == A new device (disk) can be attached to an existing RAIDZ vdev, by running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank raidz2-0 sda`. The new device will become part of the RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The `feature@raidz_expansion` on-disk feature flag must be `enabled` to initiate an expansion, and it remains `active` for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. == During expansion == The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with `zpool status`. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. == After expansion == When the expansion completes, the additional space is available for use, and is reflected in the `available` zfs property (as seen in `zfs list`, `df`, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to `zfs list`, `df`, `ls -s`, and similar tools. Sponsored-by: The FreeBSD Foundation Sponsored-by: iXsystems, Inc. Sponsored-by: vStack Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Authored-by: Matthew Ahrens <mahrens@delphix.com> Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com> Contributions-by: Stuart Maybee <stuart.maybee@comcast.net> Contributions-by: Thorsten Behrens <tbehrens@outlook.com> Contributions-by: Fmstrat <nospam@nowsci.com> Contributions-by: Don Brady <dev.fs.zfs@gmail.com> Signed-off-by: Don Brady <dev.fs.zfs@gmail.com> Closes #15022
2023-11-08 21:19:41 +03:00
int rc_devidx; /* child device index for I/O */
uint32_t rc_size; /* I/O size */
uint64_t rc_offset; /* device offset */
abd_t rc_abdstruct; /* rc_abd probably points here */
abd_t *rc_abd; /* I/O data */
Clean up RAIDZ/DRAID ereport code The RAIDZ and DRAID code is responsible for reporting checksum errors on their child vdevs. Checksum errors represent events where a disk returned data or parity that should have been correct, but was not. In other words, these are instances of silent data corruption. The checksum errors show up in the vdev stats (and thus `zpool status`'s CKSUM column), and in the event log (`zpool events`). Note, this is in contrast with the more common "noisy" errors where a disk goes offline, in which case ZFS knows that the disk is bad and doesn't try to read it, or the device returns an error on the requested read or write operation. RAIDZ/DRAID generate checksum errors via three code paths: 1. When RAIDZ/DRAID reconstructs a damaged block, checksum errors are reported on any children whose data was not used during the reconstruction. This is handled in `raidz_reconstruct()`. This is the most common type of RAIDZ/DRAID checksum error. 2. When RAIDZ/DRAID is not able to reconstruct a damaged block, that means that the data has been lost. The zio fails and an error is returned to the consumer (e.g. the read(2) system call). This would happen if, for example, three different disks in a RAIDZ2 group are silently damaged. Since the damage is silent, it isn't possible to know which three disks are damaged, so a checksum error is reported against every child that returned data or parity for this read. (For DRAID, typically only one "group" of children is involved in each io.) This case is handled in `vdev_raidz_cksum_finish()`. This is the next most common type of RAIDZ/DRAID checksum error. 3. If RAIDZ/DRAID is not able to reconstruct a damaged block (like in case 2), but there happens to be additional copies of this block due to "ditto blocks" (i.e. multiple DVA's in this blkptr_t), and one of those copies is good, then RAIDZ/DRAID compares each sector of the data or parity that it retrieved with the good data from the other DVA, and if they differ then it reports a checksum error on this child. This differs from case 2 in that the checksum error is reported on only the subset of children that actually have bad data or parity. This case happens very rarely, since normally only metadata has ditto blocks. If the silent damage is extensive, there will be many instances of case 2, and the pool will likely be unrecoverable. The code for handling case 3 is considerably more complicated than the other cases, for two reasons: 1. It needs to run after the main raidz read logic has completed. The data RAIDZ read needs to be preserved until after the alternate DVA has been read, which necessitates refcounts and callbacks managed by the non-raidz-specific zio layer. 2. It's nontrivial to map the sections of data read by RAIDZ to the correct data. For example, the correct data does not include the parity information, so the parity must be recalculated based on the correct data, and then compared to the parity that was read from the RAIDZ children. Due to the complexity of case 3, the rareness of hitting it, and the minimal benefit it provides above case 2, this commit removes the code for case 3. These types of errors will now be handled the same as case 2, i.e. the checksum error will be reported against all children that returned data or parity. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #11735
2021-03-20 02:22:10 +03:00
abd_t *rc_orig_data; /* pre-reconstruction */
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
int rc_error; /* I/O error for this device */
RAID-Z expansion feature This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). == Initiating expansion == A new device (disk) can be attached to an existing RAIDZ vdev, by running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank raidz2-0 sda`. The new device will become part of the RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The `feature@raidz_expansion` on-disk feature flag must be `enabled` to initiate an expansion, and it remains `active` for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. == During expansion == The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with `zpool status`. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. == After expansion == When the expansion completes, the additional space is available for use, and is reflected in the `available` zfs property (as seen in `zfs list`, `df`, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to `zfs list`, `df`, `ls -s`, and similar tools. Sponsored-by: The FreeBSD Foundation Sponsored-by: iXsystems, Inc. Sponsored-by: vStack Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Authored-by: Matthew Ahrens <mahrens@delphix.com> Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com> Contributions-by: Stuart Maybee <stuart.maybee@comcast.net> Contributions-by: Thorsten Behrens <tbehrens@outlook.com> Contributions-by: Fmstrat <nospam@nowsci.com> Contributions-by: Don Brady <dev.fs.zfs@gmail.com> Signed-off-by: Don Brady <dev.fs.zfs@gmail.com> Closes #15022
2023-11-08 21:19:41 +03:00
uint8_t rc_tried:1; /* Did we attempt this I/O column? */
uint8_t rc_skipped:1; /* Did we skip this I/O column? */
uint8_t rc_need_orig_restore:1; /* need to restore from orig_data? */
uint8_t rc_force_repair:1; /* Write good data to this column */
uint8_t rc_allow_repair:1; /* Allow repair I/O to this column */
int rc_shadow_devidx; /* for double write during expansion */
int rc_shadow_error; /* for double write during expansion */
uint64_t rc_shadow_offset; /* for double write during expansion */
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
} raidz_col_t;
Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid|raidz|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102
2020-11-14 00:51:51 +03:00
typedef struct raidz_row {
RAID-Z expansion feature This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). == Initiating expansion == A new device (disk) can be attached to an existing RAIDZ vdev, by running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank raidz2-0 sda`. The new device will become part of the RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The `feature@raidz_expansion` on-disk feature flag must be `enabled` to initiate an expansion, and it remains `active` for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. == During expansion == The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with `zpool status`. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. == After expansion == When the expansion completes, the additional space is available for use, and is reflected in the `available` zfs property (as seen in `zfs list`, `df`, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to `zfs list`, `df`, `ls -s`, and similar tools. Sponsored-by: The FreeBSD Foundation Sponsored-by: iXsystems, Inc. Sponsored-by: vStack Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Authored-by: Matthew Ahrens <mahrens@delphix.com> Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com> Contributions-by: Stuart Maybee <stuart.maybee@comcast.net> Contributions-by: Thorsten Behrens <tbehrens@outlook.com> Contributions-by: Fmstrat <nospam@nowsci.com> Contributions-by: Don Brady <dev.fs.zfs@gmail.com> Signed-off-by: Don Brady <dev.fs.zfs@gmail.com> Closes #15022
2023-11-08 21:19:41 +03:00
int rr_cols; /* Regular column count */
int rr_scols; /* Count including skipped columns */
int rr_bigcols; /* Remainder data column count */
int rr_missingdata; /* Count of missing data devices */
int rr_missingparity; /* Count of missing parity devices */
int rr_firstdatacol; /* First data column/parity count */
Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid|raidz|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102
2020-11-14 00:51:51 +03:00
abd_t *rr_abd_empty; /* dRAID empty sector buffer */
int rr_nempty; /* empty sectors included in parity */
#ifdef ZFS_DEBUG
uint64_t rr_offset; /* Logical offset for *_io_verify() */
uint64_t rr_size; /* Physical size for *_io_verify() */
#endif
raidz_col_t rr_col[]; /* Flexible array of I/O columns */
Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid|raidz|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102
2020-11-14 00:51:51 +03:00
} raidz_row_t;
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
typedef struct raidz_map {
Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid|raidz|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102
2020-11-14 00:51:51 +03:00
boolean_t rm_ecksuminjected; /* checksum error was injected */
int rm_nrows; /* Regular row count */
int rm_nskip; /* RAIDZ sectors skipped for padding */
int rm_skipstart; /* Column index of padding start */
RAID-Z expansion feature This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). == Initiating expansion == A new device (disk) can be attached to an existing RAIDZ vdev, by running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank raidz2-0 sda`. The new device will become part of the RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The `feature@raidz_expansion` on-disk feature flag must be `enabled` to initiate an expansion, and it remains `active` for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. == During expansion == The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with `zpool status`. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. == After expansion == When the expansion completes, the additional space is available for use, and is reflected in the `available` zfs property (as seen in `zfs list`, `df`, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to `zfs list`, `df`, `ls -s`, and similar tools. Sponsored-by: The FreeBSD Foundation Sponsored-by: iXsystems, Inc. Sponsored-by: vStack Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Authored-by: Matthew Ahrens <mahrens@delphix.com> Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com> Contributions-by: Stuart Maybee <stuart.maybee@comcast.net> Contributions-by: Thorsten Behrens <tbehrens@outlook.com> Contributions-by: Fmstrat <nospam@nowsci.com> Contributions-by: Don Brady <dev.fs.zfs@gmail.com> Signed-off-by: Don Brady <dev.fs.zfs@gmail.com> Closes #15022
2023-11-08 21:19:41 +03:00
int rm_original_width; /* pre-expansion width of raidz vdev */
int rm_nphys_cols; /* num entries in rm_phys_col[] */
zfs_locked_range_t *rm_lr;
Linux 5.0 compat: SIMD compatibility Restore the SIMD optimization for 4.19.38 LTS, 4.14.120 LTS, and 5.0 and newer kernels. This is accomplished by leveraging the fact that by definition dedicated kernel threads never need to concern themselves with saving and restoring the user FPU state. Therefore, they may use the FPU as long as we can guarantee user tasks always restore their FPU state before context switching back to user space. For the 5.0 and 5.1 kernels disabling preemption and local interrupts is sufficient to allow the FPU to be used. All non-kernel threads will restore the preserved user FPU state. For 5.2 and latter kernels the user FPU state restoration will be skipped if the kernel determines the registers have not changed. Therefore, for these kernels we need to perform the additional step of saving and restoring the FPU registers. Invalidating the per-cpu global tracking the FPU state would force a restore but that functionality is private to the core x86 FPU implementation and unavailable. In practice, restricting SIMD to kernel threads is not a major restriction for ZFS. The vast majority of SIMD operations are already performed by the IO pipeline. The remaining cases are relatively infrequent and can be handled by the generic code without significant impact. The two most noteworthy cases are: 1) Decrypting the wrapping key for an encrypted dataset, i.e. `zfs load-key`. All other encryption and decryption operations will use the SIMD optimized implementations. 2) Generating the payload checksums for a `zfs send` stream. In order to avoid making any changes to the higher layers of ZFS all of the `*_get_ops()` functions were updated to take in to consideration the calling context. This allows for the fastest implementation to be used as appropriate (see kfpu_allowed()). The only other notable instance of SIMD operations being used outside a kernel thread was at module load time. This code was moved in to a taskq in order to accommodate the new kernel thread restriction. Finally, a few other modifications were made in order to further harden this code and facilitate testing. They include updating each implementations operations structure to be declared as a constant. And allowing "cycle" to be set when selecting the preferred ops in the kernel as well as user space. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #8754 Closes #8793 Closes #8965
2019-07-12 19:31:20 +03:00
const raidz_impl_ops_t *rm_ops; /* RAIDZ math operations */
RAID-Z expansion feature This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). == Initiating expansion == A new device (disk) can be attached to an existing RAIDZ vdev, by running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank raidz2-0 sda`. The new device will become part of the RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The `feature@raidz_expansion` on-disk feature flag must be `enabled` to initiate an expansion, and it remains `active` for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. == During expansion == The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with `zpool status`. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. == After expansion == When the expansion completes, the additional space is available for use, and is reflected in the `available` zfs property (as seen in `zfs list`, `df`, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to `zfs list`, `df`, `ls -s`, and similar tools. Sponsored-by: The FreeBSD Foundation Sponsored-by: iXsystems, Inc. Sponsored-by: vStack Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Authored-by: Matthew Ahrens <mahrens@delphix.com> Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com> Contributions-by: Stuart Maybee <stuart.maybee@comcast.net> Contributions-by: Thorsten Behrens <tbehrens@outlook.com> Contributions-by: Fmstrat <nospam@nowsci.com> Contributions-by: Don Brady <dev.fs.zfs@gmail.com> Signed-off-by: Don Brady <dev.fs.zfs@gmail.com> Closes #15022
2023-11-08 21:19:41 +03:00
raidz_col_t *rm_phys_col; /* if non-NULL, read i/o aggregation */
raidz_row_t *rm_row[]; /* flexible array of rows */
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
} raidz_map_t;
RAID-Z expansion feature This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). == Initiating expansion == A new device (disk) can be attached to an existing RAIDZ vdev, by running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank raidz2-0 sda`. The new device will become part of the RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The `feature@raidz_expansion` on-disk feature flag must be `enabled` to initiate an expansion, and it remains `active` for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. == During expansion == The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with `zpool status`. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. == After expansion == When the expansion completes, the additional space is available for use, and is reflected in the `available` zfs property (as seen in `zfs list`, `df`, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to `zfs list`, `df`, `ls -s`, and similar tools. Sponsored-by: The FreeBSD Foundation Sponsored-by: iXsystems, Inc. Sponsored-by: vStack Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Authored-by: Matthew Ahrens <mahrens@delphix.com> Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com> Contributions-by: Stuart Maybee <stuart.maybee@comcast.net> Contributions-by: Thorsten Behrens <tbehrens@outlook.com> Contributions-by: Fmstrat <nospam@nowsci.com> Contributions-by: Don Brady <dev.fs.zfs@gmail.com> Signed-off-by: Don Brady <dev.fs.zfs@gmail.com> Closes #15022
2023-11-08 21:19:41 +03:00
/*
* Nodes in vdev_raidz_t:vd_expand_txgs.
* Blocks with physical birth time of re_txg or later have the specified
* logical width (until the next node).
*/
typedef struct reflow_node {
uint64_t re_txg;
uint64_t re_logical_width;
avl_node_t re_link;
} reflow_node_t;
Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid|raidz|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102
2020-11-14 00:51:51 +03:00
#define RAIDZ_ORIGINAL_IMPL (INT_MAX)
extern const raidz_impl_ops_t vdev_raidz_scalar_impl;
extern boolean_t raidz_will_scalar_work(void);
#if defined(__x86_64) && defined(HAVE_SSE2) /* only x86_64 for now */
extern const raidz_impl_ops_t vdev_raidz_sse2_impl;
#endif
#if defined(__x86_64) && defined(HAVE_SSSE3) /* only x86_64 for now */
extern const raidz_impl_ops_t vdev_raidz_ssse3_impl;
#endif
#if defined(__x86_64) && defined(HAVE_AVX2) /* only x86_64 for now */
extern const raidz_impl_ops_t vdev_raidz_avx2_impl;
#endif
#if defined(__x86_64) && defined(HAVE_AVX512F) /* only x86_64 for now */
extern const raidz_impl_ops_t vdev_raidz_avx512f_impl;
#endif
#if defined(__x86_64) && defined(HAVE_AVX512BW) /* only x86_64 for now */
extern const raidz_impl_ops_t vdev_raidz_avx512bw_impl;
#endif
Add parity generation/rebuild using 128-bits NEON for Aarch64 This re-use the framework established for SSE2, SSSE3 and AVX2. However, GCC is using FP registers on Aarch64, so unlike SSE/AVX2 we can't rely on the registers being left alone between ASM statements. So instead, the NEON code uses C variables and GCC extended ASM syntax. Note that since the kernel explicitly disable vector registers, they have to be locally re-enabled explicitly. As we use the variable's number to define the symbolic name, and GCC won't allow duplicate symbolic names, numbers have to be unique. Even when the code is not going to be used (e.g. the case for 4 registers when using the macro with only 2). Only the actually used variables should be declared, otherwise the build will fails in debug mode. This requires the replacement of the XOR(X,X) syntax by a new ZERO(X) macro, which does the same thing but without repeating the argument. And perhaps someday there will be a machine where there is a more efficient way to zero a register than XOR with itself. This affects scalar, SSE2, SSSE3 and AVX2 as they need the new macro. It's possible to write faster implementations (different scheduling, different unrolling, interleaving NEON and scalar, ...) for various cores, but this one has the advantage of fitting in the current state of the code, and thus is likely easier to review/check/merge. The only difference between aarch64-neon and aarch64-neonx2 is that aarch64-neonx2 unroll some functions some more. Reviewed-by: Gvozden Neskovic <neskovic@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net> Closes #4801
2016-10-03 19:44:00 +03:00
#if defined(__aarch64__)
extern const raidz_impl_ops_t vdev_raidz_aarch64_neon_impl;
extern const raidz_impl_ops_t vdev_raidz_aarch64_neonx2_impl;
#endif
#if defined(__powerpc__)
extern const raidz_impl_ops_t vdev_raidz_powerpc_altivec_impl;
#endif
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
/*
* Commonly used raidz_map helpers
*
* raidz_parity Returns parity of the RAIDZ block
* raidz_ncols Returns number of columns the block spans
Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid|raidz|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102
2020-11-14 00:51:51 +03:00
* Note, all rows have the same number of columns.
* raidz_nbigcols Returns number of big columns
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
* raidz_col_p Returns pointer to a column
* raidz_col_size Returns size of a column
* raidz_big_size Returns size of big columns
* raidz_short_size Returns size of short columns
*/
Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid|raidz|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102
2020-11-14 00:51:51 +03:00
#define raidz_parity(rm) ((rm)->rm_row[0]->rr_firstdatacol)
#define raidz_ncols(rm) ((rm)->rm_row[0]->rr_cols)
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
#define raidz_nbigcols(rm) ((rm)->rm_bigcols)
#define raidz_col_p(rm, c) ((rm)->rm_col + (c))
#define raidz_col_size(rm, c) ((rm)->rm_col[c].rc_size)
#define raidz_big_size(rm) (raidz_col_size(rm, CODE_P))
#define raidz_short_size(rm) (raidz_col_size(rm, raidz_ncols(rm)-1))
/*
* Macro defines an RAIDZ parity generation method
*
* @code parity the function produce
* @impl name of the implementation
*/
#define _RAIDZ_GEN_WRAP(code, impl) \
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
static void \
Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid|raidz|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102
2020-11-14 00:51:51 +03:00
impl ## _gen_ ## code(void *rrp) \
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
{ \
Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid|raidz|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102
2020-11-14 00:51:51 +03:00
raidz_row_t *rr = (raidz_row_t *)rrp; \
raidz_generate_## code ## _impl(rr); \
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
}
/*
* Macro defines an RAIDZ data reconstruction method
*
* @code parity the function produce
* @impl name of the implementation
*/
#define _RAIDZ_REC_WRAP(code, impl) \
static int \
Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid|raidz|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102
2020-11-14 00:51:51 +03:00
impl ## _rec_ ## code(void *rrp, const int *tgtidx) \
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
{ \
Distributed Spare (dRAID) Feature This patch adds a new top-level vdev type called dRAID, which stands for Distributed parity RAID. This pool configuration allows all dRAID vdevs to participate when rebuilding to a distributed hot spare device. This can substantially reduce the total time required to restore full parity to pool with a failed device. A dRAID pool can be created using the new top-level `draid` type. Like `raidz`, the desired redundancy is specified after the type: `draid[1,2,3]`. No additional information is required to create the pool and reasonable default values will be chosen based on the number of child vdevs in the dRAID vdev. zpool create <pool> draid[1,2,3] <vdevs...> Unlike raidz, additional optional dRAID configuration values can be provided as part of the draid type as colon separated values. This allows administrators to fully specify a layout for either performance or capacity reasons. The supported options include: zpool create <pool> \ draid[<parity>][:<data>d][:<children>c][:<spares>s] \ <vdevs...> - draid[parity] - Parity level (default 1) - draid[:<data>d] - Data devices per group (default 8) - draid[:<children>c] - Expected number of child vdevs - draid[:<spares>s] - Distributed hot spares (default 0) Abbreviated example `zpool status` output for a 68 disk dRAID pool with two distributed spares using special allocation classes. ``` pool: tank state: ONLINE config: NAME STATE READ WRITE CKSUM slag7 ONLINE 0 0 0 draid2:8d:68c:2s-0 ONLINE 0 0 0 L0 ONLINE 0 0 0 L1 ONLINE 0 0 0 ... U25 ONLINE 0 0 0 U26 ONLINE 0 0 0 spare-53 ONLINE 0 0 0 U27 ONLINE 0 0 0 draid2-0-0 ONLINE 0 0 0 U28 ONLINE 0 0 0 U29 ONLINE 0 0 0 ... U42 ONLINE 0 0 0 U43 ONLINE 0 0 0 special mirror-1 ONLINE 0 0 0 L5 ONLINE 0 0 0 U5 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 L6 ONLINE 0 0 0 U6 ONLINE 0 0 0 spares draid2-0-0 INUSE currently in use draid2-0-1 AVAIL ``` When adding test coverage for the new dRAID vdev type the following options were added to the ztest command. These options are leverages by zloop.sh to test a wide range of dRAID configurations. -K draid|raidz|random - kind of RAID to test -D <value> - dRAID data drives per group -S <value> - dRAID distributed hot spares -R <value> - RAID parity (raidz or dRAID) The zpool_create, zpool_import, redundancy, replacement and fault test groups have all been updated provide test coverage for the dRAID feature. Co-authored-by: Isaac Huang <he.huang@intel.com> Co-authored-by: Mark Maybee <mmaybee@cray.com> Co-authored-by: Don Brady <don.brady@delphix.com> Co-authored-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mmaybee@cray.com> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10102
2020-11-14 00:51:51 +03:00
raidz_row_t *rr = (raidz_row_t *)rrp; \
return (raidz_reconstruct_## code ## _impl(rr, tgtidx)); \
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
}
/*
* Define all gen methods for an implementation
*
* @impl name of the implementation
*/
#define DEFINE_GEN_METHODS(impl) \
_RAIDZ_GEN_WRAP(p, impl); \
_RAIDZ_GEN_WRAP(pq, impl); \
_RAIDZ_GEN_WRAP(pqr, impl)
/*
* Define all rec functions for an implementation
*
* @impl name of the implementation
*/
#define DEFINE_REC_METHODS(impl) \
_RAIDZ_REC_WRAP(p, impl); \
_RAIDZ_REC_WRAP(q, impl); \
_RAIDZ_REC_WRAP(r, impl); \
_RAIDZ_REC_WRAP(pq, impl); \
_RAIDZ_REC_WRAP(pr, impl); \
_RAIDZ_REC_WRAP(qr, impl); \
_RAIDZ_REC_WRAP(pqr, impl)
#define RAIDZ_GEN_METHODS(impl) \
{ \
[RAIDZ_GEN_P] = & impl ## _gen_p, \
[RAIDZ_GEN_PQ] = & impl ## _gen_pq, \
[RAIDZ_GEN_PQR] = & impl ## _gen_pqr \
}
#define RAIDZ_REC_METHODS(impl) \
{ \
[RAIDZ_REC_P] = & impl ## _rec_p, \
[RAIDZ_REC_Q] = & impl ## _rec_q, \
[RAIDZ_REC_R] = & impl ## _rec_r, \
[RAIDZ_REC_PQ] = & impl ## _rec_pq, \
[RAIDZ_REC_PR] = & impl ## _rec_pr, \
[RAIDZ_REC_QR] = & impl ## _rec_qr, \
[RAIDZ_REC_PQR] = & impl ## _rec_pqr \
}
typedef struct raidz_impl_kstat {
uint64_t gen[RAIDZ_GEN_NUM]; /* gen method speed B/s */
uint64_t rec[RAIDZ_REC_NUM]; /* rec method speed B/s */
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
} raidz_impl_kstat_t;
/*
* Enumerate various multiplication constants
* used in reconstruction methods
*/
typedef enum raidz_mul_info {
/* Reconstruct Q */
MUL_Q_X = 0,
/* Reconstruct R */
MUL_R_X = 0,
/* Reconstruct PQ */
MUL_PQ_X = 0,
MUL_PQ_Y = 1,
/* Reconstruct PR */
MUL_PR_X = 0,
MUL_PR_Y = 1,
/* Reconstruct QR */
MUL_QR_XQ = 0,
MUL_QR_X = 1,
MUL_QR_YQ = 2,
MUL_QR_Y = 3,
/* Reconstruct PQR */
MUL_PQR_XP = 0,
MUL_PQR_XQ = 1,
MUL_PQR_XR = 2,
MUL_PQR_YU = 3,
MUL_PQR_YP = 4,
MUL_PQR_YQ = 5,
MUL_CNT = 6
} raidz_mul_info_t;
/*
* Powers of 2 in the Galois field.
*/
extern const uint8_t vdev_raidz_pow2[256] __attribute__((aligned(256)));
/* Logs of 2 in the Galois field defined above. */
extern const uint8_t vdev_raidz_log2[256] __attribute__((aligned(256)));
/*
* Multiply a given number by 2 raised to the given power.
*/
static inline uint8_t
vdev_raidz_exp2(const uint8_t a, const unsigned exp)
{
if (a == 0)
return (0);
return (vdev_raidz_pow2[(exp + (unsigned)vdev_raidz_log2[a]) % 255]);
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
}
/*
* Galois Field operations.
*
* gf_exp2 - computes 2 raised to the given power
* gf_exp4 - computes 4 raised to the given power
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
* gf_mul - multiplication
* gf_div - division
* gf_inv - multiplicative inverse
*/
typedef unsigned gf_t;
typedef unsigned gf_log_t;
static inline gf_t
gf_mul(const gf_t a, const gf_t b)
{
gf_log_t logsum;
if (a == 0 || b == 0)
return (0);
logsum = (gf_log_t)vdev_raidz_log2[a] + (gf_log_t)vdev_raidz_log2[b];
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
return ((gf_t)vdev_raidz_pow2[logsum % 255]);
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
}
static inline gf_t
gf_div(const gf_t a, const gf_t b)
{
gf_log_t logsum;
ASSERT3U(b, >, 0);
if (a == 0)
return (0);
logsum = (gf_log_t)255 + (gf_log_t)vdev_raidz_log2[a] -
(gf_log_t)vdev_raidz_log2[b];
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
return ((gf_t)vdev_raidz_pow2[logsum % 255]);
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
}
static inline gf_t
gf_inv(const gf_t a)
{
gf_log_t logsum;
ASSERT3U(a, >, 0);
logsum = (gf_log_t)255 - (gf_log_t)vdev_raidz_log2[a];
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
return ((gf_t)vdev_raidz_pow2[logsum]);
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
}
static inline gf_t
gf_exp2(gf_log_t exp)
{
return (vdev_raidz_pow2[exp % 255]);
}
static inline gf_t
gf_exp4(gf_log_t exp)
{
ASSERT3U(exp, <=, 255);
return ((gf_t)vdev_raidz_pow2[(2 * exp) % 255]);
SIMD implementation of vdev_raidz generate and reconstruct routines This is a new implementation of RAIDZ1/2/3 routines using x86_64 scalar, SSE, and AVX2 instruction sets. Included are 3 parity generation routines (P, PQ, and PQR) and 7 reconstruction routines, for all RAIDZ level. On module load, a quick benchmark of supported routines will select the fastest for each operation and they will be used at runtime. Original implementation is still present and can be selected via module parameter. Patch contains: - specialized gen/rec routines for all RAIDZ levels, - new scalar raidz implementation (unrolled), - two x86_64 SIMD implementations (SSE and AVX2 instructions sets), - fastest routines selected on module load (benchmark). - cmd/raidz_test - verify and benchmark all implementations - added raidz_test to the ZFS Test Suite New zfs module parameters: - zfs_vdev_raidz_impl (str): selects the implementation to use. On module load, the parameter will only accept first 3 options, and the other implementations can be set once module is finished loading. Possible values for this option are: "fastest" - use the fastest math available "original" - use the original raidz code "scalar" - new scalar impl "sse" - new SSE impl if available "avx2" - new AVX2 impl if available See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to get the list of supported values. If an implementation is not supported on the system, it will not be shown. Currently selected option is enclosed in `[]`. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4328
2016-04-25 11:04:31 +03:00
}
#ifdef __cplusplus
}
#endif
#endif /* _VDEV_RAIDZ_H */