Commit Graph

87 Commits

Author SHA1 Message Date
Don Brady
976246fadd Add illumos FMD ZFS logic to ZED -- phase 2
The phase 2 work primarily entails the Diagnosis Engine and
the Retire Agent modules. It also includes infrastructure
to support a crude FMD environment to host these modules.

The Diagnosis Engine consumes I/O and checksum ereports and
feeds them into a SERD engine which will generate a corres-
ponding fault diagnosis when the SERD engine fires. All the
diagnosis state data is collected into cases, one case per
vdev being tracked.

The Retire Agent responds to diagnosed faults by isolating
the faulty VDEV. It will notify the ZFS kernel module of
the new VDEV state (degraded or faulted). This agent is
also responsible for managing hot spares across pools.
When it encounters a device fault or a device removal it
replaces the device with an appropriate spare if available.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Closes #5343
2016-11-07 15:01:38 -08:00
Tony Hutter
1ad9de6d08 Allow autoreplace even when enclosure LED sysfs entries don't exist
The previous autoreplace code assumed that if you were using autoreplace, then
you also had the enclosure SES driver loaded.  This could lead to autoreplace
not working if the SES driver wasn't loaded, or if it wasn't creating the
proper enclosure_device symlinks (which has happened).  This patch removes
that assumption.

Reviewed by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #5363
2016-11-04 13:34:13 -07:00
Tony Hutter
6568379eea Fix statechange-led.sh & unnecessary libdevmapper warning
- Fix autoreplace behaviour on statechange-led.sh script.

ZED sends the following events on an auto-replace:

1. statechange: Disk goes UNAVAIL->ONLINE
2. statechange: Disk goes ONLINE->UNAVAIL
3. vdev_attach: Disk goes ONLINE

Events 1-2 happen when ZED first attempts to do an auto-online.  When that
fails, ZED then tries an auto-replace, generating the vdev_attach event in #3.

In the previous code, statechange-led was only looking at the UNAVAIL->ONLINE
transition to turn off the LED.  It ignored the #2 ONLINE->UNAVAIL transition,
assuming it was just the "old" VDEV going offline.  This is problematic, as
a drive can go from ONLINE->UNAVAIL when it's malfunctioning, and we don't want
to ignore that.

This new patch correctly turns on the fault LED every time a drive becomes
UNAVAIL.  It also monitors vdev_attach events to trigger turning off the LED
when an auto-replaced disk comes online.

- Remove unnecessary libdevmapper warning with --with-config=kernel

This fixes an unnecessary libdevmapper warning when building
--with-config=kernel.  Kernel code does not use libdevmapper, so the warning
is not needed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #2375 
Closes #5312 
Closes #5331
2016-10-25 11:05:30 -07:00
Tony Hutter
1bbd877049 Turn on/off enclosure slot fault LED even when disk isn't present
Previously when a drive faulted, the statechange-led.sh script would lookup
the drive's LED sysfs entry in /sys/block/sd*/device/enclosure_device, and
turn it on.  During testing we noticed that if you pulled out a drive, or if
the drive was so badly broken that it no longer appeared to Linux, that the
/sys/block/sd* path would be removed, and the script could not lookup the
LED entry.

To fix this, this patch looks up the disks's more persistent
"/sys/class/enclosure/X:X:X:X/Slot N" LED sysfs path at pool import.  It then
passes that path to the statechange-led script to use, rather than having the
script look it up on the fly.  This allows the script to turn on/off the slot
LEDs even when the drive is missing.

Closes #5309 
Closes #2375
2016-10-24 10:45:59 -07:00
Tony Hutter
6078881aa1 Multipath autoreplace, control enclosure LEDs, event rate limiting
1. Enable multipath autoreplace support for FMA.

This extends FMA autoreplace to work with multipath disks.  This
requires libdevmapper to be installed at build time.

2. Turn on/off fault LEDs when VDEVs become degraded/faulted/online

Set ZED_USE_ENCLOSURE_LEDS=1 in zed.rc to have ZED turn on/off the enclosure
LED for a drive when a drive becomes FAULTED/DEGRADED.  Your enclosure must
be supported by the Linux SES driver for this to work.  The enclosure LED
scripts work for multipath devices as well.  The scripts will clear the LED
when the fault is cleared.

3. Rate limit ZIO delay and checksum events so as not to flood ZED

ZIO delay and checksum events are rate limited to 5/sec in the zfs module.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #2449 
Closes #3017 
Closes #5159
2016-10-19 12:55:59 -07:00
Brian Behlendorf
7515f8f63d Fix file permissions
The following new test cases need to have execute permissions set:

  userquota/groupspace_003_pos.ksh
  userquota/userquota_013_pos.ksh
  userquota/userspace_003_pos.ksh
  upgrade/upgrade_userobj_001_pos.ksh
  upgrade/setup.ksh
  upgrade/cleanup.ksh

The following source files accidentally were marked executable:

  lib/libzpool/kernel.c
  lib/libshare/nfs.c
  lib/libzfs/libzfs_dataset.c
  lib/libzfs/libzfs_util.c
  tests/zfs-tests/cmd/rm_lnkcnt_zero_file/rm_lnkcnt_zero_file.c
  tests/zfs-tests/cmd/dir_rd_update/dir_rd_update.c
  cmd/zed/zed_exec.c
  module/icp/core/kcf_sched.c
  module/zfs/dsl_pool.c
  module/zfs/arc.c
  module/nvpair/nvpair.c
  man/man5/zfs-module-parameters.5

Reviewed-by: GeLiXin <ge.lixin@zte.com.cn>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5241
2016-10-08 14:57:56 -07:00
GeLiXin
ed3ea30fb9 Fix coverity defects: CID 147536, 147537, 147538
coverity scan CID:147536, type: Argument cannot be negative
- may write or close fd which is negative
coverity scan CID:147537, type: Argument cannot be negative
- may call dup2 with a negative fd
coverity scan CID:147538, type: Argument cannot be negative
- may read or fchown with a negative fd

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Closes #5185
2016-09-30 15:40:07 -07:00
Don Brady
d02ca37979 Bring over illumos ZFS FMA logic -- phase 1
This first phase brings over the ZFS SLM module, zfs_mod.c, to handle
auto operations in response to disk events. Disk event monitoring is
provided from libudev and generates the expected payload schema for
zfs_mod. This work leverages the recently added devid and phys_path
strings in the vdev label.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #4673
2016-09-01 11:39:45 -07:00
Hans Rosenfeld
fb390aafc8 OpenZFS 5997 - FRU field not set during pool creation and never updated
Authored by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Reviewed by: Dan Fields <dan.fields@nexenta.com>
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Signed-off-by: Don Brady <don.brady@intel.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/5997
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1437283

Porting Notes:

In addition to the OpenZFS changes this patch realigns the events
with those found in OpenZFS.

Events which would be logged as sysevents on illumos have been
been mapped to the 'sysevent' class for Linux.  In addition, several
subclass names have been changed to match what is used in OpenZFS.
In all cases this means a '.' was changed to an '_' in the subclass.

The scripts provided by ZoL have been updated, however users which
provide scripts for any of the following events will need to rename
them based on the new subclass names.

  ereport.fs.zfs.config.sync         sysevent.fs.zfs.config_sync
  ereport.fs.zfs.zpool.destroy       sysevent.fs.zfs.pool_destroy
  ereport.fs.zfs.zpool.reguid        sysevent.fs.zfs.pool_reguid
  ereport.fs.zfs.vdev.remove         sysevent.fs.zfs.vdev_remove
  ereport.fs.zfs.vdev.clear          sysevent.fs.zfs.vdev_clear
  ereport.fs.zfs.vdev.check          sysevent.fs.zfs.vdev_check
  ereport.fs.zfs.vdev.spare          sysevent.fs.zfs.vdev_spare
  ereport.fs.zfs.vdev.autoexpand     sysevent.fs.zfs.vdev_autoexpand
  ereport.fs.zfs.resilver.start      sysevent.fs.zfs.resilver_start
  ereport.fs.zfs.resilver.finish     sysevent.fs.zfs.resilver_finish
  ereport.fs.zfs.scrub.start         sysevent.fs.zfs.scrub_start
  ereport.fs.zfs.scrub.finish        sysevent.fs.zfs.scrub_finish
  ereport.fs.zfs.bootfs.vdev.attach  sysevent.fs.zfs.bootfs_vdev_attach
2016-08-12 13:06:48 -07:00
Chris Dunlap
048bb5bd49 Ensure zed _finish_daemonize() leaves fds 0-2 open
In zed's _finish_daemonize(), /dev/null is open()d onto a temporary
file descriptor which is then dup()d onto stdin, stdout, and stderr.
But if file descriptors 0, 1, or 2 are not already open at the start
of this function, then the temporary file descriptor will fall within
this range and be inadvertently closed when the function cleans up.

This commit adds a check to prevent inadvertently closing this
(presumably temporary) file descriptor when it shouldn't.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4384
2016-03-08 17:46:41 -08:00
Chris Dunlap
69520d6855 Rework zed_notify_email for configurable PROG/OPTS
This commit reworks the zed_notify_email() function to allow
configuration of the mail executable and command-line arguments.

ZED_EMAIL_PROG specifies the name or path of the executable responsible
for sending notifications via email.  This variable defaults to "mail".

ZED_EMAIL_OPTS specifies command-line options passed to ZED_EMAIL_PROG.
The following keyword substitutions are performed:
- @ADDRESS@ is replaced with the recipient email address(es)
- @SUBJECT@ is replaced with the notification subject
This variable defaults to "-s '@SUBJECT@' @ADDRESS@".

ZED_EMAIL_ADDR replaces ZED_EMAIL (although the latter is retained
for backward compatibility).  This variable can contain multiple
addresses as long as they are delimited by whitespace.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3634
Closes #3631
2015-07-30 11:52:56 -07:00
Chris Dunlap
6f1eccff2c Fix whitespace in zed_log_err
This commit fixes the two adjacent spaces that appear in zed_log_err()
messages when ZEVENT_EID is undefined.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-07-30 11:52:35 -07:00
Turbo Fredriksson
47a4a6fd5f Support parallel build trees (VPATH builds)
Build products from an out of tree build should be written
relative to the build directory.  Sources should be referred
to by their locations in the source directory.

This is accomplished by adding the 'src' and 'obj' variables
for the module Makefile.am, using relative paths to reference
source files, and by setting VPATH when source files are not
co-located with the Makefile.  This enables the following:

  $ mkdir build
  $ cd build
  $ ../configure \
    --with-spl=$HOME/src/git/spl/ \
    --with-spl-obj=$HOME/src/git/spl/build
  $ make -s

This change also has the advantage of resolving the following
warning which is generated by modern versions of automake.

  Makefile.am:00: warning: source file 'xxx' is in a subdirectory,
  Makefile.am:00: but option 'subdir-objects' is disabled

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1082
2015-07-17 13:42:51 -07:00
Chris Dunlap
492b1d2ef0 Update ZED copyright boilerplate
This commit updates the copyright boilerplate within the ZED subtree.

The instructions for appending a contributor copyright line have
been removed.  Manually maintaining copyright notices in this
manner is error-prone, imprecise at a file-scope granularity, and
oftentimes inaccurate.  These lines can become a pernicious source of
merge conflicts.  A commit log is better suited to maintaining this
information.  Consequently, a line has been added to the boilerplate
to refer to the git commit log for authoritative copyright attribution.

To account for the scenario where a file may become separated from
the codebase and commit history (i.e., it is copied somewhere else),
a line has been added to identify the file's origin.

http://softwarefreedom.org/resources/2012/ManagingCopyrightInformation.html

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3384
2015-05-11 15:07:00 -07:00
Chris Dunlap
ce119da33d Combine data-notify.sh with io-notify.sh
The data-notify.sh ZEDLET serves a very similar purpose to
io-notify.sh, namely, to generate a notification in response to a
particular error event.  Initially, data-notify.sh was separated from
io-notify.sh since the "data" zevent does not (as I understand it)
pertain to a specific vdev device.  This stands in contrast to the
"checksum" and "io" zevents (both handled by io-notify.sh) that can
be attributed to a specific vdev.  At the time, it seemed simpler to
handle these two cases in separate scripts.

This commit adds support for the "data" zevent to io-notify.sh, and
symlinks io-notify.sh to data-notify.sh.  It also adds the counts
for vdev_read_errors, vdev_write_errors, and vdev_cksum_errors to
the notification message.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
2015-04-27 12:08:08 -07:00
Chris Dunlap
090b19158a Add notification to io-spare.sh
The io-spare.sh ZEDLET does not generate a notification when a failing
device is replaced with a hot spare.  Maybe it should tell someone.

This commit adds a notification message to the io-spare.sh ZEDLET.
This notification is triggered when a failing device is successfully
replaced with a hot spare after encountering a checksum or io error.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
2015-04-27 12:08:08 -07:00
Chris Dunlap
a0d065fa92 Add support for Pushbullet notifications
This commit adds the zed_notify_pushbullet() function and hooks
it into zed_notify(), thereby integrating it with the existing
"notify" ZEDLETs.  This enables ZED to push notifications to your
desktop computer and/or mobile device(s).  It is configured with the
ZED_PUSHBULLET_ACCESS_TOKEN and ZED_PUSHBULLET_CHANNEL_TAG variables
in zed.rc.

  https://www.pushbullet.com/

The Makefile install-data-local target has been replaced with
install-data-hook.  With the "-local" target, there is no particular
guarantee of execution order.  But with the zed.rc now potentially
containing sensitive information (i.e., the Pushbullet access token),
the recommended permissions have changed to 0600.  The "-hook" target
is always executed after the main rule's work is done; thus, the
chmod will always take place after the zed.rc file has been installed.

  https://www.gnu.org/software/automake/manual/automake.html#Extending

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
2015-04-27 12:08:08 -07:00
Chris Dunlap
20967ff1a4 Replace "email" ZEDLETs with "notify" ZEDLETs
Several ZEDLETs already exist for sending email in reponse to a
particular zevent.  While email is ubiquitous, alternative methods may
be better suited for some configurations.  Instead of duplicating the
"email" ZEDLETs for every future notification method, it is preferable
to abstract the notification method into a function.  This has the
added benefit of reducing the amount of code duplicated between
ZEDLETs, and allowing related bugs to be fixed in a single location.

This commit replaces the existing "email" ZEDLETs with corresponding
"notify" ZEDLETs.  In addition, the ZEDLET code for sending an
email message has been moved into the zed_notify_email() function.
And this zed_notify_email() has been added to a generic zed_notify()
function for sending notifications via all available methods that
have been configured.

This commit also changes a couple of related zed.rc variables.
ZED_EMAIL_INTERVAL_SECS is changed to ZED_NOTIFY_INTERVAL_SECS,
and ZED_EMAIL_VERBOSE is changed to ZED_NOTIFY_VERBOSE.  Note that
ZED_EMAIL remains unchanged as its use is solely for the email
notification method.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
2015-04-27 12:08:07 -07:00
Chris Dunlap
aded9a6814 Cleanup ZEDLETs
This commit factors out several common ZEDLET code blocks into
zed-functions.sh.  This shortens the length of the scripts, thereby
(hopefully) making them easier to understand and maintain.

In addition, this commit revamps the coding style used by the
scripts to be more consistent and (again, hopefully) maintainable.
It now mostly follows the Google Shell Style Guide.  I've tried to
assimilate the following resources:

  Google Shell Style Guide
  https://google-styleguide.googlecode.com/svn/trunk/shell.xml

  Dash as /bin/sh
  https://wiki.ubuntu.com/DashAsBinSh

  Filenames and Pathnames in Shell: How to do it Correctly
  http://www.dwheeler.com/essays/filenames-in-shell.html

  Common shell script mistakes
  http://www.pixelbeat.org/programming/shell_script_mistakes.html

Finally, this commit updates the exit codes used by the ZEDLETs to be
more consistent with one another.

All scripts run cleanly through ShellCheck <http://www.shellcheck.net/>.
All scripts have been tested on bash and dash.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
2015-04-27 12:08:01 -07:00
Chris Dunlap
de6d197683 Fix io-spare.sh to work with disk vdevs
The "zpool status" output shows the full pathname for file-type vdevs,
but only the basename component for disk-type vdevs.  In commit
bee6665, the "basename" command was dropped from altering the vdev
name used when searching the "zpool status" output.  Consequently,
hot-disk sparing for disk vdevs broke since "zpool status" output
was now being searched for the full pathname to the disk vdev.

Parsing the "zpool status" output in this manner is rather brittle.
It would be preferable to search for the vdev based on its guid.
But until that happens, this commit adds back the "basename" command
to fix the vdev name breakage.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3310
2015-04-17 14:21:26 -07:00
Chunwei Chen
aa506dcb3d Fix build error when make deb
After 53698a4, the following error occurs when make deb.

  CCLD     zed
../../lib/libzfs/.libs/libzfs.so: undefined reference to `get_system_hostid'

Add libzpool.la to zed/Makefile.am to fix this

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3080
2015-02-06 09:16:32 -08:00
Chris Dunlap
2c41df5bf8 Cleanup _zed_event_add_nvpair()
When _zed_event_add_var() was updated to be the common routine
for adding zedlet environment variables, an additional snprintf()
was added to the processing of each nvpair.  This commit changes
_zed_event_add_nvpair() to directly call _zed_event_add_var()
for nvpair non-array types, thereby removing a superfluous call to
snprintf().  For consistency, the helper functions for converting
nvpair array types are similarly adjusted to add variables.

The _zed_event_value_is_hex() and _zed_event_add_var() functions have
been moved up in the file since forward declarations are not used,
but no changes have been made to these functions.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3042
2015-01-30 14:46:44 -08:00
Chris Dunlap
854f30a91f Protect against adding duplicate strings in ZED
The zed_strings container stores strings in an AVL, but does not
check for duplicate strings being added.  Within the AVL, strings
are indexed by the string value itself.  avl_add() requires the node
being added must not already exist in the tree, and will assert()
if this is not the case.

This should not cause problems in practice.  ZED uses this container
in two places.  In zed_conf.c, it is used to store the names of
enabled zedlets as zed scans the zedlet directory listing; duplicate
entries cannot occur here since duplicate names cannot occur within
a directory.  In zed_event.c, it is used to store the environment
variables (as "NAME=VALUE" strings) that will be passed to zedlets;
duplicate strings here should never happen unless there is a bug
resulting in a duplicate nvpair or environment variable.

This commit protects against adding a duplicate to a zed_strings
container by first checking for the string being added, and removing
the previous entry should one exist.  This implements a "last one
wins" policy.

This commit also changes the prototype for zed_strings_add() to allow
the string key (by which it is indexed in the AVL) to differ from
the string value.  By adding zedlet environment variables using the
variable name as the key, multiple adds for the same variable name
will result in only the last value being stored.

Finally, this commit routes all additions of zedlet environment
variables through the updated _zed_event_add_var().  This ensures
all zedlet environment variable names are properly converted.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3042
2015-01-30 14:46:17 -08:00
Chris Dunlap
8ac9b5e6b5 Cleanup struct zed_conf vars in zed_conf_destroy
Reset struct zed_conf file descriptors to -1 after close(),
and pointers to NULL after free().

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2756
2014-10-06 13:18:11 -07:00
Chris Dunlap
56697c4264 Obtain advisory lock on ZED PID file
ZED uses an advisory lock on its state file to protect against
multiple instances running concurrently.  However, work is planned
to move this state information into the kernel, and ZED will still
need to protect against starting multiple instances.

This commit adds an advisory lock on the PID file to protect against
starting multiple instances.  A lock failure can be overridden with
the "-f" (force) command-line option.  The advisory lock on the state
file is being retained for as long as the state information is stored
in the state file.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2756
2014-10-06 13:17:40 -07:00
Chris Dunlap
dcca723ace Refer to ZED's scripts as ZEDLETs
The executables invoked by the ZED in response to a given zevent
have been generically referred to as "scripts".  By convention,
these scripts have aimed to be /bin/sh compatible for reasons of
portability and comprehensibility.  However, the ZED only requires
they be executable and (ideally) capable of reading environment
variables.  As such, these scripts are now referred to as ZEDLETs
(ZFS Event Daemon Linkage for Executable Tasks).

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2735
2014-09-25 13:54:17 -07:00
Chris Dunlap
8cb8cf91df Replace zed's use of malloc with calloc
When zed allocates memory via malloc(), it typically follows that
with a memset().  However, calloc() implementations can often perform
optimizations when zeroing memory:

https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc

This commit replaces zed's use of malloc() with calloc().

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2736
2014-09-25 13:43:57 -07:00
Chris Dunlap
bee6665b88 Fix zed io-spare.sh dash incompatibility
The zed's io-spare.sh script defines a vdev_status() function to query
the 'zpool status' output for obtaining the status of a specified vdev.
This function contains a small awk script that uses a parameter
expansion (${parameter/pattern/string}) supported in bash but not
in dash.  Under dash, this fails with a "Bad substitution" error.

This commit replaces the awk script with a (hopefully more portable)
sed script that has been tested under both bash and dash.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2536
2014-09-23 15:32:30 -07:00
Chris Dunlap
5043deaa40 Remove reverse indentation from zed comments.
Remove all occurrences of reverse indentation from zed comments for
consistency within the project code base.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2695
2014-09-22 12:17:53 -07:00
louwrentius
bcd9624d0f Change delimiter for ZED email scripts
When the ZED_EMAIL_INTERVAL_SECS="3600" option is set in zed.rc
configuration file then notification emails should be rate limited.

Rate limiting is accomplished by maintaining a colon delimited state
file which includes the device name.  Unfortunately there are valid
device names which include a colon and therefore prevent the rate
limiting for working properly.  For this reason the delimiter has
been changed to a semi-colon.

Signed-off-by: louwrentius <louwrentius@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Closes #2645
2014-09-02 14:18:54 -07:00
Chris Dunlap
8125fb7190 Cleanup zed logging
This is a set of minor cleanup changes related to zed logging:
- Remove the program identity prefix from messages written to stderr
  since systemd already prepends this output with the program name.
- Replace the copy of the program identity string with a ptr reference.
- Replace "pid" with "PID" for consistency in comments & strings.
- Rename the zed_log.c struct _ctx component "level" to "priority".
- Add the LOG_PID option for messages written to syslog.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2252
2014-09-02 14:18:53 -07:00
Chris Dunlap
5a8855b716 Fix race condition with zed pidfile creation
When the zed is started as a forking daemon (by default),
a race-condition exists where the parent process can terminate before
the pidfile has been created by the grandchild process.  When invoked
as a Type=forking systemd service, this can result in the following:

  systemd[1]: Starting ZFS Event Daemon (zed)...
  systemd[1]: PID file /var/run/zed.pid not readable (yet?) after start.

This commit adds a daemonize pipe to allow the grandchild process to
signal the parent process that initialization is complete (and the
pidfile has been created).  The parent process will wait for this
notification before exiting.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2252
2014-09-02 14:18:53 -07:00
Tim Chase
603cb25ca5 zed needs libzfs_core
As of a recent group of Illumos/Delphix updates, zed needs libzfs_core
in order to resolve lzc_get_bookmarks() and likely other functions
going forward.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2534
2014-07-31 09:49:01 -07:00
Chris Dunlap
6ac770b196 Replace zed_file_create_dirs() with mkdirp()
When processing directory components starting from the root dir,
zed_file_create_dirs() contained a bug in checking the return value of
mkdir().  A typo was made, and the test for (mkdir_errno != EEXIST) was
erroneously written as (mkdir_errno == EEXIST).  If some of the leading
directory components already existed, this bug would cause the routine
to exit before creating the remaining directory components.

Instead of fixing the above mkdir_errno test, this commit replaces
zed_file_create_dirs() with mkdirp().  This cleanup was already
planned, and zed_file_create_dirs() only existed because I didn't
realize mkdirp() was already in tree at the time.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2248
2014-04-09 13:32:54 -07:00
Chris Dunlap
518eba1492 Replace check for _POSIX_MEMLOCK w/ HAVE_MLOCKALL
zed supports a '-M' cmdline opt to lock all pages in memory via
mlockall().  The _POSIX_MEMLOCK define is checked to determine whether
this function is supported.  The current test assumes mlockall()
is supported if _POSIX_MEMLOCK is non-zero.  However, this test is
insufficient according to mlock(2) and sysconf(3).  If _POSIX_MEMLOCK
is -1, mlockall() is not supported; but if _POSIX_MEMLOCK is 0,
availability must be checked at runtime.

This commit adds an autoconf check for mlockall() to user.m4.  The zed
code block for mlockall() is now guarded with a test for HAVE_MLOCKALL.
If defined, mlockall() will be called and its runtime availability
checked via its return value.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2
2014-04-02 13:10:08 -07:00
Brian Behlendorf
904ea2763e Add automatic hot spare functionality
When a vdev starts getting I/O or checksum errors it is now
possible to automatically rebuild to a hot spare device.

To cleanly support this functionality in a shell script some
additional information was added to all zevent ereports which
include a vdev.  This covers both io and checksum zevents but
may be used but other scripts.

In the Illumos FMA solution the same information is required
but it is retrieved through the libzfs library interface.
Specifically the following members were added:

  vdev_spare_paths  - List of vdev paths for all hot spares.
  vdev_spare_guids  - List of vdev guids for all hot spares.
  vdev_read_errors  - Read errors for the problematic vdev
  vdev_write_errors - Write errors for the problematic vdev
  vdev_cksum_errors - Checksum errors for the problematic vdev.

By default the required hot spare scripts are installed but this
functionality is disabled.  To enable hot sparing uncomment the
ZED_SPARE_ON_IO_ERRORS and ZED_SPARE_ON_CHECKSUM_ERRORS in the
/etc/zfs/zed.d/zed.rc configuration file.

These scripts do no add support for the autoexpand property. At
a minimum this requires adding a new udev rule to detect when
a new device is added to the system.  It also requires that the
autoexpand policy be ported from Illumos, see:

  https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/syseventd/modules/zfs_mod/zfs_mod.c

Support for detecting the correct name of a vdev when it's not
a whole disk was added by Turbo Fredriksson.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Issue #2
2014-04-02 13:10:08 -07:00
Chris Dunlap
9e246ac3d8 Initial implementation of zed (ZFS Event Daemon)
zed monitors ZFS events.  When a zevent is posted, zed will run any
scripts that have been enabled for the corresponding zevent class.
Multiple scripts may be invoked for a given zevent.  The zevent
nvpairs are passed to the scripts as environment variables.

Events are processed synchronously by the single thread, and there is
no maximum timeout for script execution.  Consequently, a misbehaving
script can delay (or forever block) the processing of subsequent
zevents.  Plans are to address this in future commits.

Initial scripts have been developed to log events to syslog
and send email in response to checksum/data/io errors and
resilver.finish/scrub.finish events.  By default, email will only
be sent if the ZED_EMAIL variable is configured in zed.rc (which is
serving as a config file of sorts until a proper configuration file
is implemented).

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2
2014-04-02 13:10:03 -07:00