Distributed Spare (dRAID) Feature

This patch adds a new top-level vdev type called dRAID, which stands
for Distributed parity RAID.  This pool configuration allows all dRAID
vdevs to participate when rebuilding to a distributed hot spare device.
This can substantially reduce the total time required to restore full
parity to pool with a failed device.

A dRAID pool can be created using the new top-level `draid` type.
Like `raidz`, the desired redundancy is specified after the type:
`draid[1,2,3]`.  No additional information is required to create the
pool and reasonable default values will be chosen based on the number
of child vdevs in the dRAID vdev.

    zpool create <pool> draid[1,2,3] <vdevs...>

Unlike raidz, additional optional dRAID configuration values can be
provided as part of the draid type as colon separated values. This
allows administrators to fully specify a layout for either performance
or capacity reasons.  The supported options include:

    zpool create <pool> \
        draid[<parity>][:<data>d][:<children>c][:<spares>s] \
        <vdevs...>

    - draid[parity]       - Parity level (default 1)
    - draid[:<data>d]     - Data devices per group (default 8)
    - draid[:<children>c] - Expected number of child vdevs
    - draid[:<spares>s]   - Distributed hot spares (default 0)

Abbreviated example `zpool status` output for a 68 disk dRAID pool
with two distributed spares using special allocation classes.

```
  pool: tank
 state: ONLINE
config:

    NAME                  STATE     READ WRITE CKSUM
    slag7                 ONLINE       0     0     0
      draid2:8d:68c:2s-0  ONLINE       0     0     0
        L0                ONLINE       0     0     0
        L1                ONLINE       0     0     0
        ...
        U25               ONLINE       0     0     0
        U26               ONLINE       0     0     0
        spare-53          ONLINE       0     0     0
          U27             ONLINE       0     0     0
          draid2-0-0      ONLINE       0     0     0
        U28               ONLINE       0     0     0
        U29               ONLINE       0     0     0
        ...
        U42               ONLINE       0     0     0
        U43               ONLINE       0     0     0
    special
      mirror-1            ONLINE       0     0     0
        L5                ONLINE       0     0     0
        U5                ONLINE       0     0     0
      mirror-2            ONLINE       0     0     0
        L6                ONLINE       0     0     0
        U6                ONLINE       0     0     0
    spares
      draid2-0-0          INUSE     currently in use
      draid2-0-1          AVAIL
```

When adding test coverage for the new dRAID vdev type the following
options were added to the ztest command.  These options are leverages
by zloop.sh to test a wide range of dRAID configurations.

    -K draid|raidz|random - kind of RAID to test
    -D <value>            - dRAID data drives per group
    -S <value>            - dRAID distributed hot spares
    -R <value>            - RAID parity (raidz or dRAID)

The zpool_create, zpool_import, redundancy, replacement and fault
test groups have all been updated provide test coverage for the
dRAID feature.

Co-authored-by: Isaac Huang <he.huang@intel.com>
Co-authored-by: Mark Maybee <mmaybee@cray.com>
Co-authored-by: Don Brady <don.brady@delphix.com>
Co-authored-by: Matthew Ahrens <mahrens@delphix.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Mark Maybee <mmaybee@cray.com>
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10102
This commit is contained in:
Brian Behlendorf
2020-11-13 13:51:51 -08:00
committed by GitHub
parent a724db0374
commit b2255edcc0
153 changed files with 10203 additions and 1882 deletions
+1
View File
@@ -36,6 +36,7 @@ export ZPOOL_SCRIPT_DIR=$$CMD_DIR/zpool/zpool.d
export ZPOOL_SCRIPTS_PATH=$$CMD_DIR/zpool/zpool.d
export CONTRIB_DIR=@abs_top_builddir@/contrib
export LIB_DIR=@abs_top_builddir@/lib
export SYSCONF_DIR=@abs_top_builddir@/etc
export INSTALL_UDEV_DIR=@udevdir@
export INSTALL_UDEV_RULE_DIR=@udevruledir@
+3
View File
@@ -166,6 +166,8 @@ if [ "${INSTALL}" = "yes" ]; then
"$INSTALL_UDEV_RULE_DIR/90-zfs.rules"
install "$CMD_DIR/zpool/zpool.d" \
"$INSTALL_SYSCONF_DIR/zfs/zpool.d"
install "$SYSCONF_DIR/zfs/draid.d" \
"$INSTALL_SYSCONF_DIR/zfs/draid.d"
install "$CONTRIB_DIR/pyzfs/libzfs_core" \
"$INSTALL_PYTHON_DIR/libzfs_core"
# Ideally we would install these in the configured ${libdir}, which is
@@ -185,6 +187,7 @@ else
remove "$INSTALL_UDEV_RULE_DIR/69-vdev.rules"
remove "$INSTALL_UDEV_RULE_DIR/90-zfs.rules"
remove "$INSTALL_SYSCONF_DIR/zfs/zpool.d"
remove "$INSTALL_SYSCONF_DIR/zfs/draid.d"
remove "$INSTALL_PYTHON_DIR/libzfs_core"
remove "/lib/libzfs_core.so"
remove "/lib/libnvpair.so"
+48 -14
View File
@@ -18,6 +18,7 @@
#
# Copyright (c) 2015 by Delphix. All rights reserved.
# Copyright (C) 2016 Lawrence Livermore National Security, LLC.
# Copyright (c) 2017, Intel Corporation.
#
BASE_DIR=$(dirname "$0")
@@ -246,27 +247,60 @@ while [[ $timeout -eq 0 ]] || [[ $curtime -le $((starttime + timeout)) ]]; do
or_die rm -rf "$workdir"
or_die mkdir "$workdir"
# switch between common arrangements & fully randomized
if [[ $((RANDOM % 2)) -eq 0 ]]; then
mirrors=2
raidz=0
parity=1
vdevs=2
else
mirrors=$(((RANDOM % 3) * 1))
parity=$(((RANDOM % 3) + 1))
raidz=$((((RANDOM % 9) + parity + 1) * (RANDOM % 2)))
vdevs=$(((RANDOM % 3) + 3))
fi
# switch between three types of configs
# 1/3 basic, 1/3 raidz mix, and 1/3 draid mix
choice=$((RANDOM % 3))
# ashift range 9 - 15
align=$(((RANDOM % 2) * 3 + 9))
runtime=$((RANDOM % 100))
# randomly use special classes
class="special=random"
if [[ $choice -eq 0 ]]; then
# basic mirror only
parity=1
mirrors=2
draid_data=0
draid_spares=0
raid_children=0
vdevs=2
raid_type="raidz"
elif [[ $choice -eq 1 ]]; then
# fully randomized mirror/raidz (sans dRAID)
parity=$(((RANDOM % 3) + 1))
mirrors=$(((RANDOM % 3) * 1))
draid_data=0
draid_spares=0
raid_children=$((((RANDOM % 9) + parity + 1) * (RANDOM % 2)))
vdevs=$(((RANDOM % 3) + 3))
raid_type="raidz"
else
# fully randomized dRAID (sans mirror/raidz)
parity=$(((RANDOM % 3) + 1))
mirrors=0
draid_data=$(((RANDOM % 8) + 3))
draid_spares=$(((RANDOM % 2) + parity))
stripe=$((draid_data + parity))
extra=$((draid_spares + (RANDOM % 4)))
raid_children=$(((((RANDOM % 4) + 1) * stripe) + extra))
vdevs=$((RANDOM % 3))
raid_type="draid"
fi
# run from 30 to 120 seconds
runtime=$(((RANDOM % 90) + 30))
passtime=$((RANDOM % (runtime / 3 + 1) + 10))
zopt="$zopt -K $raid_type"
zopt="$zopt -m $mirrors"
zopt="$zopt -r $raidz"
zopt="$zopt -r $raid_children"
zopt="$zopt -D $draid_data"
zopt="$zopt -S $draid_spares"
zopt="$zopt -R $parity"
zopt="$zopt -v $vdevs"
zopt="$zopt -a $align"
zopt="$zopt -C $class"
zopt="$zopt -T $runtime"
zopt="$zopt -P $passtime"
zopt="$zopt -s $size"