Commit Graph

18 Commits

Author SHA1 Message Date
loli10K
d48091de81 zed: detect and offline physically removed devices
This commit adds a new test case to the ZFS Test Suite to verify ZED
can detect when a device is physically removed from a running system:
the device will be offlined if a spare is not available in the pool.

We implement this by using the existing libudev functionality and
without relying solely on the FM kernel module capabilities which have
been observed to be unreliable with some kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@delphix.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #1537
Closes #7926
2018-11-09 11:17:24 -08:00
Don Brady
e89f1295d4 Add libzutil for libzfs or libzpool consumers
Adds a libzutil for utility functions that are common to libzfs and
libzpool consumers (most of what was in libzfs_import.c).  This
removes the need for utilities to link against both libzpool and
libzfs.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #8050
2018-11-05 11:22:33 -08:00
Brian Behlendorf
bea7578356
ZTS: Fix auto_replace_001_pos test
The root cause of these failures is that udev can notify the
ZED of newly created partition before its links are created.
Handle this by allowing an auto-replace to briefly wait until
udev confirms the links exist.

Distill this test case down to its essentials so it can be run
reliably.  What we need to check is that:

  1) A new disk, in the same physical location, is automatically
     brought online when added to the system,
  2) It completes the replacement process, and
  3) The pool is now ONLINE and healthy.

There is no need to remove the scsi_debug module.  After exporting
the pool the disk can be zeroed, removed, and then re-added to the
system as a new disk.

Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8051
2018-10-29 15:05:14 -05:00
Brian Behlendorf
d441e85dd7
Add support for autoexpand property
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place that with a few modifications
for Linux it can be supported.

Auto-expand works as follows; when a block device is modified
(re-sized, closed after being open r/w, etc) a change uevent is
generated for udev.  The ZED, which is monitoring udev events,
passes the change event along to zfs_deliver_dle() if the disk
or partition contains a zfs_member as identified by blkid.

From here the device is matched against all imported pool vdevs
using the vdev_guid which was read from the label by blkid.  If
a match is found the ZED reopens the pool vdev.  This re-opening
is important because it allows the vdev to be briefly closed so
the disk partition table can be re-read.  Otherwise, it wouldn't
be possible to report the maximum possible expansion size.

Finally, if the property autoexpand=on a vdev expansion will be
attempted.  After performing some sanity checks on the disk to
verify that it is safe to expand,  the primary partition (-part1)
will be expanded and the partition table updated.  The partition
is then re-opened (again) to detect the updated size which allows
the new capacity to be used.

In order to make all of the above possible the following changes
were required:

* Updated the zpool_expand_001_pos and zpool_expand_003_pos tests.
  These tests now create a pool which is layered on a loopback,
  scsi_debug, and file vdev.  This allows for testing of non-
  partitioned block device (loopback), a partition block device
  (scsi_debug), and a file which does not receive udev change
  events.  This provided for better test coverage, and by removing
  the layering on ZFS volumes there issues surrounding layering
  one pool on another are avoided.

* zpool_find_vdev_by_physpath() updated to accept a vdev guid.
  This allows for matching by guid rather than path which is a
  more reliable way for the ZED to reference a vdev.

* Fixed zfs_zevent_wait() signal handling which could result
  in the ZED spinning when a signal was not handled.

* Removed vdev_disk_rrpart() functionality which can be abandoned
  in favor of kernel provided blkdev_reread_part() function.

* Added a rwlock which is held as a writer while a disk is being
  reopened.  This is important to prevent errors from occurring
  for any configuration related IOs which bypass the SCL_ZIO lock.
  The zpool_reopen_007_pos.ksh test case was added to verify IO
  error are never observed when reopening.  This is not expected
  to impact IO performance.

Additional fixes which aren't critical but were discovered and
resolved in the course of developing this functionality.

* Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for
  ZFS volumes.  This is as good as a unique physical path, while the
  volumes are not used in the test cases anymore for other reasons
  this improvement was included.

Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #120
Closes #2437
Closes #5771
Closes #7366
Closes #7582
Closes #7629
2018-07-23 15:40:15 -07:00
Brian Behlendorf
93ce2b4ca5 Update build system and packaging
Minimal changes required to integrate the SPL sources in to the
ZFS repository build infrastructure and packaging.

Build system and packaging:
  * Renamed SPL_* autoconf m4 macros to ZFS_*.
  * Removed redundant SPL_* autoconf m4 macros.
  * Updated the RPM spec files to remove SPL package dependency.
  * The zfs package obsoletes the spl package, and the zfs-kmod
    package obsoletes the spl-kmod package.
  * The zfs-kmod-devel* packages were updated to add compatibility
    symlinks under /usr/src/spl-x.y.z until all dependent packages
    can be updated.  They will be removed in a future release.
  * Updated copy-builtin script for in-kernel builds.
  * Updated DKMS package to include the spl.ko.
  * Updated stale AUTHORS file to include all contributors.
  * Updated stale COPYRIGHT and included the SPL as an exception.
  * Renamed README.markdown to README.md
  * Renamed OPENSOLARIS.LICENSE to LICENSE.
  * Renamed DISCLAIMER to NOTICE.

Required code changes:
  * Removed redundant HAVE_SPL macro.
  * Removed _BOOT from nvpairs since it doesn't apply for Linux.
  * Initial header cleanup (removal of empty headers, refactoring).
  * Remove SPL repository clone/build from zimport.sh.
  * Use of DEFINE_RATELIMIT_STATE and DEFINE_SPINLOCK removed due
    to build issues when forcing C99 compilation.
  * Replaced legacy ACCESS_ONCE with READ_ONCE.
  * Include needed headers for `current` and `EXPORT_SYMBOL`.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
TEST_ZIMPORT_SKIP="yes"
Closes #7556
2018-05-29 16:00:33 -07:00
LOLi
4e9b156960 Various ZED fixes
* Teach ZED to handle spares usingi the configured ashift: if the zpool
   'ashift' property is set then ZED should use its value when kicking
   in a hotspare; with this change 512e disks can be used as spares
   for VDEVs that were created with ashift=9, even if ZFS natively
   detects them as 4K block devices.

 * Introduce an additional auto_spare test case which verifies that in
   the face of multiple device failures an appropiate number of spares
   are kicked in.

 * Fix zed_stop() in "libtest.shlib" which did not correctly wait the
   target pid.

 * Fix ZED crashing on startup caused by a race condition in libzfs
   when used in multi-threaded context.

 * Convert ZED over to using the tpool library which is already present
   in the Illumos FMA code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #2562 
Closes #6858
2017-12-08 16:58:41 -08:00
Arkadiusz Bubała
d3f2cd7e3b Added no_scrub_restart flag to zpool reopen
Added -n flag to zpool reopen that allows a running scrub
operation to continue if there is a device with Dirty Time Log.

By default if a component device has a DTL and zpool reopen
is executed all running scan operations will be restarted.

Added functional tests for `zpool reopen`

Tests covers following scenarios:
* `zpool reopen` without arguments,
* `zpool reopen` with pool name as argument,
* `zpool reopen` while scrubbing,
* `zpool reopen -n` while scrubbing,
* `zpool reopen -n` while resilvering,
* `zpool reopen` with bad arguments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Arkadiusz Bubała <arkadiusz.bubala@open-e.com>
Closes #6076 
Closes #6746
2017-10-26 12:26:09 -07:00
Brian Behlendorf
95401cb6f7 Enable remaining tests
Enable most of the remaining test cases which were previously
disabled.  The required fixes are as follows:

* cache_001_pos - No changes required.

* cache_010_neg - Updated to use losetup under Linux.  Loopback
  cache devices are allowed, ZVOLs as cache devices are not.
  Disabled until all the builders pass reliably.

* cachefile_001_pos, cachefile_002_pos, cachefile_003_pos,
  cachefile_004_pos - Set set_device_dir path in cachefile.cfg,
  updated CPATH1 and CPATH2 to reference unique files.

* zfs_clone_005_pos - Wait for udev to create volumes.

* zfs_mount_007_pos - Updated mount options to expected Linux names.

* zfs_mount_009_neg, zfs_mount_all_001_pos - No changes required.

* zfs_unmount_005_pos, zfs_unmount_009_pos, zfs_unmount_all_001_pos -
  Updated to expect -f to not unmount busy mount points under Linux.

* rsend_019_pos - Observed to occasionally take a long time on both
  32-bit systems and the kmemleak builder.

* zfs_written_property_001_pos - Switched sync(1) to sync_pool.

* devices_001_pos, devices_002_neg - Updated create_dev_file() helper
  for Linux.

* exec_002_neg.ksh - Fixed mmap_exec.c to preserve errno.  Updated
  test case to expect EPERM from Linux as described by mmap(2).

* grow_pool_001_pos - Adding missing setup.ksh and cleanup.ksh
  scripts from OpenZFS.

* grow_replicas_001_pos.ksh - Added missing $SLICE_* variables.

* history_004_pos, history_006_neg, history_008_pos - Fixed by
  previous commits and were not enabled.  No changes required.

* zfs_allow_010_pos - Added missing spaces after assorted zfs
  commands in delegate_common.kshlib.

* inuse_* - Illumos dump device tests skipped.  Remaining test
  cases updated to correctly create required partitions.

* large_files_001_pos - Fixed largest_file.c to accept EINVAL
  as well as EFBIG as described in write(2).

* link_count_001 - Added nproc to required commands.

* umountall_001 - Updated to use umount -a.

* online_offline_001_* - Pull in OpenZFS change to file_trunc.c
  to make the '-c 0' option run the test in a loop.  Included
  online_offline.cfg file in all test cases.

* rename_dirs_001_pos - Updated to use the rename_dir test binary,
  pkill restricted to exact matches and total runtime reduced.

* slog_013_neg, write_dirs_002_pos - No changes required.

* slog_013_pos.ksh - Updated to use losetup under Linux.

* slog_014_pos.ksh - ZED will not be running, manually degrade
  the damaged vdev as expected.

* nopwrite_varying_compression, nopwrite_volume - Forced pool
  sync with sync_pool to ensure up to date property values.

* Fixed typos in ZED log messages.  Refactored zed_* helper
  functions to resolve all-syslog exit=1 errors in zedlog.

* zfs_copies_005_neg, zfs_get_004_pos, zpool_add_004_pos,
  zpool_destroy_001_pos, largest_pool_001_pos, clone_001_pos.ksh,
  clone_001_pos, - Skip until layering pools on zvols is solid.

* largest_pool_001_pos - Limited to 7eb pool, maximum
  supported size in 8eb-1 on Linux.

* zpool_expand_001_pos, zpool_expand_003_neg - Requires
  additional support from the ZED, updated skip reason.

* zfs_rollback_001_pos, zfs_rollback_002_pos - Properly cleanup
  busy mount points under Linux between test loops.

* privilege_001_pos, privilege_003_pos, rollback_003_pos,
  threadsappend_001_pos - Skip with log_unsupported.

* snapshot_016_pos - No changes required.

* snapshot_008_pos - Increased LIMIT from 512K to 2M and added
  sync_pool to avoid false positives.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6128
2017-05-22 12:34:32 -04:00
Giuseppe Di Natale
f02ad0dc75 Fix coverity defects: CID 161288
CID 161288:  Null pointer dereferences  (REVERSE_INULL)

Ensure physpath != NULL before the strcmp.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5974
2017-04-06 13:18:22 -07:00
Sydney Vanda
7a4500a101 Added auto-replace FMA test for the ZFS Test Suite
Also included are updates to auto-online test

Automated auto-replace test to go along with ZED FMA integration
(PR 4673) auto-replace_001.pos works using a scsi_debug device
(the only usable virtual device currently due to whole_disk var
needing to be set)

Functionality for automated FMA auto-replace test to work with
scsi_debug devs:  Some functionality/exceptions needed to be
added for automation of auto-replace to work correctly.

In the test an alias vdev_id rule is added for any scsi_debug
device which sets the phys_path="scsidebug" after a udevadm
trigger command.

A symlink is created for the vdev_id.conf file (in /etc/zfs/ by
default) to be used in-tree for the test suite
(/var/tmp/zfs/vdev_id.conf).  "./scripts/zfs-helpers.sh -i" needs
to be run before fault tests in the ZTS (to use udev rules in-tree)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: David Quigley <david.quigley@intel.com>
Signed-off-by: Sydney Vanda <sydney.m.vanda@intel.com>
Closes #5944
2017-04-05 16:18:19 -07:00
ka7
4e33ba4c38 Fix spelling
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Haakan T Johansson <f96hajo@chalmers.se>
Closes #5547 
Closes #5543
2017-01-03 11:31:18 -06:00
Brian Behlendorf
02730c333c Use cstyle -cpP in make cstyle check
Enable picky cstyle checks and resolve the new warnings.  The vast
majority of the changes needed were to handle minor issues with
whitespace formatting.  This patch contains no functional changes.

Non-whitespace changes are as follows:

* 8 times ; to { } in for/while loop
* fix missing ; in cmd/zed/agents/zfs_diagnosis.c
* comment (confim -> confirm)
* change endline , to ; in cmd/zpool/zpool_main.c
* a number of /* BEGIN CSTYLED */ /* END CSTYLED */ blocks
* /* CSTYLED */ markers
* change == 0 to !
* ulong to unsigned long in module/zfs/dsl_scan.c
* rearrangement of module_param lines in module/zfs/metaslab.c
* add { } block around statement after for_each_online_node

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Håkan Johansson <f96hajo@chalmers.se>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5465
2016-12-12 10:46:26 -08:00
Tony Hutter
8720e9e748 Add -c to zpool iostat & status to run command
This patch adds a command (-c) option to zpool status and zpool iostat.  The
-c option allows you to run an arbitrary command on each vdev and display
the first line of output in zpool status/iostat.  The environment vars
VDEV_PATH and VDEV_UPATH are set to the vdev's path and "underlying path"
before running the command.  For device mapper, multipath, or partitioned
vdevs, VDEV_UPATH is the actual underlying /dev/sd* disk.  This can be useful
if the command you're running requires a /dev/sd* device.

The patch also uses /sys/block/<dev>/slaves/ to lookup the underlying device
instead of using libdevmapper.  This not only removes the libdevmapper
requirement at build time, but also allows you to resolve device mapper
devices without being root.  This means that UDEV_UPATH get set correctly
when running zpool status/iostat as an unprivileged user.

Example:

$ zpool status -c 'echo I am $VDEV_PATH, $VDEV_UPATH'

NAME        STATE     READ WRITE CKSUM
mypool      ONLINE       0     0     0
  mirror-0  ONLINE       0     0     0
    mpatha  ONLINE       0     0     0  I am /dev/mapper/mpatha, /dev/sdc
    sdb     ONLINE       0     0     0  I am /dev/sdb1, /dev/sdb

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #5368
2016-11-29 14:45:38 -07:00
Don Brady
976246fadd Add illumos FMD ZFS logic to ZED -- phase 2
The phase 2 work primarily entails the Diagnosis Engine and
the Retire Agent modules. It also includes infrastructure
to support a crude FMD environment to host these modules.

The Diagnosis Engine consumes I/O and checksum ereports and
feeds them into a SERD engine which will generate a corres-
ponding fault diagnosis when the SERD engine fires. All the
diagnosis state data is collected into cases, one case per
vdev being tracked.

The Retire Agent responds to diagnosed faults by isolating
the faulty VDEV. It will notify the ZFS kernel module of
the new VDEV state (degraded or faulted). This agent is
also responsible for managing hot spares across pools.
When it encounters a device fault or a device removal it
replaces the device with an appropriate spare if available.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Closes #5343
2016-11-07 15:01:38 -08:00
Tony Hutter
1ad9de6d08 Allow autoreplace even when enclosure LED sysfs entries don't exist
The previous autoreplace code assumed that if you were using autoreplace, then
you also had the enclosure SES driver loaded.  This could lead to autoreplace
not working if the SES driver wasn't loaded, or if it wasn't creating the
proper enclosure_device symlinks (which has happened).  This patch removes
that assumption.

Reviewed by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #5363
2016-11-04 13:34:13 -07:00
Tony Hutter
1bbd877049 Turn on/off enclosure slot fault LED even when disk isn't present
Previously when a drive faulted, the statechange-led.sh script would lookup
the drive's LED sysfs entry in /sys/block/sd*/device/enclosure_device, and
turn it on.  During testing we noticed that if you pulled out a drive, or if
the drive was so badly broken that it no longer appeared to Linux, that the
/sys/block/sd* path would be removed, and the script could not lookup the
LED entry.

To fix this, this patch looks up the disks's more persistent
"/sys/class/enclosure/X:X:X:X/Slot N" LED sysfs path at pool import.  It then
passes that path to the statechange-led script to use, rather than having the
script look it up on the fly.  This allows the script to turn on/off the slot
LEDs even when the drive is missing.

Closes #5309 
Closes #2375
2016-10-24 10:45:59 -07:00
Tony Hutter
6078881aa1 Multipath autoreplace, control enclosure LEDs, event rate limiting
1. Enable multipath autoreplace support for FMA.

This extends FMA autoreplace to work with multipath disks.  This
requires libdevmapper to be installed at build time.

2. Turn on/off fault LEDs when VDEVs become degraded/faulted/online

Set ZED_USE_ENCLOSURE_LEDS=1 in zed.rc to have ZED turn on/off the enclosure
LED for a drive when a drive becomes FAULTED/DEGRADED.  Your enclosure must
be supported by the Linux SES driver for this to work.  The enclosure LED
scripts work for multipath devices as well.  The scripts will clear the LED
when the fault is cleared.

3. Rate limit ZIO delay and checksum events so as not to flood ZED

ZIO delay and checksum events are rate limited to 5/sec in the zfs module.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #2449 
Closes #3017 
Closes #5159
2016-10-19 12:55:59 -07:00
Don Brady
d02ca37979 Bring over illumos ZFS FMA logic -- phase 1
This first phase brings over the ZFS SLM module, zfs_mod.c, to handle
auto operations in response to disk events. Disk event monitoring is
provided from libudev and generates the expected payload schema for
zfs_mod. This work leverages the recently added devid and phys_path
strings in the vdev label.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #4673
2016-09-01 11:39:45 -07:00