Commit Graph

204 Commits

Author SHA1 Message Date
Don Brady
f0f330e121
Fix ZED auto-replace for VDEVs using by-id paths
The change is simple -- restore the original code so that the VDEV 
path is updated when using by-id paths.  The more challenging part 
was to devise a second ZTS test, that would test auto-replace for 
'by-id' and help prevent a future regression.

With that new test, we can now do an A|B test with , and without, 
the fix to confirm that auto-replace for by-id paths works. The 
existing auto-replace test, functional/fault/auto_replace_001_pos, 
will confirm that we didn't break auto-replace for 'by-vdev' paths.

In the original functional/fault/auto_replace_001_pos test, the disk 
wipe (using dd) was not effective in removing the partitioning since 
the kernel was never informed of the wipe.

Added a call to wipefs(8) so that the kernel is informed and ZED will 
re-partition the device.
    
Added a validation step that the re-partitioning occurred by
confirming  that the GPT partition UUID changes.

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15363
2023-10-20 09:29:02 -07:00
Tony Hutter
b53077a9e7
Add zfs_prepare_disk script for disk firmware install
Have libzfs call a special `zfs_prepare_disk` script before a disk is
included into the pool.  The user can edit this script to add things
like a disk firmware update or a disk health check.  Use of the script
is totally optional. See the zfs_prepare_disk manpage for full details.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15243
2023-09-21 08:36:26 -07:00
Tony Hutter
529bec7d7b
zed: Allow autoreplace and fault LEDs for removed vdevs
Allow zed to autoreplace vdevs marked as REMOVED.  Also update
statechange-led zedlet to toggle fault LEDs for REMOVED vdevs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15281
2023-09-18 16:25:58 -07:00
Serapheim Dimitropoulos
cad00d5180
checkstyle: fix action failures
Reviewed-by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #15220
2023-08-29 09:12:40 -07:00
Brian Behlendorf
f0e34c8879
zed: update zed.d/statechange-slot_off.sh
The statechange-slot_off.sh zedlet which was added in #15200
needed to be installed so it's included by the packages.

Additional testing has also shown that multiple retries are
often needed for the script to operate reliably.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15210
2023-08-26 11:22:28 -07:00
Tony Hutter
11fbcacf37
zed: Add zedlet to power off slot when drive is faulted
If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then
power off the drive's slot in the enclosure if it becomes FAULTED.
This can help silence misbehaving drives.  This assumes your drive
enclosure fully supports slot power control via sysfs.

Reviewed-by: @AllKind
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15200
2023-08-24 11:59:03 -07:00
Brian Behlendorf
782312c612
zed: Reduce log noise for large JBODs
For large JBODs the log message "zfs_iter_vdev: no match" can
account for the bulk of the log messages (over 70%).  Since this
message is purely informational and not that useful we remove it.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15086
Closes #15094
2023-07-25 13:55:29 -07:00
Tony Hutter
ab0b0393cb
zed: Fix zed ASSERT on slot power cycle
We would see zed assert on one of our systems if we powered off a
slot.  Further examination showed zfs_retire_recv() was reporting
a GUID of 0, which in turn would return a NULL nvlist.  Add
in a check for a zero GUID.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15084
2023-07-21 11:46:58 -07:00
Alexander Motin
b3ad3f48d9
Use list_remove_head() where possible.
... instead of list_head() + list_remove().  On FreeBSD the list
functions are not inlined, so in addition to more compact code
this also saves another function call.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #14955
2023-06-09 10:12:52 -07:00
Brian Behlendorf
577e835f30
Probe vdevs before marking removed
Before allowing the ZED to mark a vdev as REMOVED due to a
hotplug event confirm that it is non-responsive with probe.
Any device which can be successfully probed should be left
ONLINE to prevent a healthy pool from being incorrectly
SUSPENDED.  This may occur for at least the following two
scenarios.

1) Drive expansion (zpool online -e) in VMware environments.
   If, during the partition resize operation, a partition is
   removed and re-created then udev will send a removed event.

2) Re-scanning the namespaces of an NVMe device (nvme ns-rescan)
   may result in a udev remove and add event being delivered.

Finally, update the ZED to only kick in a spare when the
removal was successful.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14859
Closes #14861
2023-05-19 13:05:09 -07:00
Richard Yao
d1807f168e nvpair: Constify string functions
After addressing coverity complaints involving `nvpair_name()`, the
compiler started complaining about dropping const. This lead to a rabbit
hole where not only `nvpair_name()` needed to be constified, but also
`nvpair_value_string()`, `fnvpair_value_string()` and a few other static
functions, plus variable pointers throughout the code. The result became
a fairly big change, so it has been split out into its own patch.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14612
2023-03-14 15:25:50 -07:00
rob-wing
69f024a56e
Configure zed's diagnosis engine with vdev properties
Introduce four new vdev properties:
    checksum_n
    checksum_t
    io_n
    io_t

These properties can be used for configuring the thresholds of zed's
diagnosis engine and are interpeted as <N> events in T <seconds>.

When this property is set to a non-default value on a top-level vdev,
those thresholds will also apply to its leaf vdevs. This behavior can be
overridden by explicitly setting the property on the leaf vdev.

Note that, these properties do not persist across vdev replacement. For
this reason, it is advisable to set the property on the top-level vdev
instead of the leaf vdev.

The default values for zed's diagnosis engine (10 events, 600 seconds)
remains unchanged.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Sponsored-by: Seagate Technology LLC
Closes #13805
2023-01-23 13:14:25 -08:00
Ameer Hamza
5091867ee6
zed: add hotplug support for spare vdevs
This commit supports for spare vdev hotplug. The
spare vdev associated with all the pools will be
marked as "Removed" when the drive is physically
detached and will become "Available" when the
drive is reattached. Currently, the spare vdev
status does not change on the drive removal and
the same is the case with reattachment.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14295
2023-01-09 12:43:03 -08:00
Marcel Menzel
70ac2654f5
Change ZEVENT_POOL_GUID to ZEVENT_POOL to display pool names
Outgoing mails for ZFS pool events include the pool GUID,
but not the actual pool name. Let's change this for better
readability, as it is already done in the mails for finished
pool resilvers.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Marcel Menzel <mail@mcl.gg>
Closes #14272
2022-12-13 17:26:10 -08:00
Ameer Hamza
e996c502e4
zed: unclean disk attachment faults the vdev
If the attached disk already contains a vdev GUID, it
means the disk is not clean. In such a scenario, the
physical path would be a match that makes the disk
faulted when trying to online it. So, we would only
want to proceed if either GUID matches with the last
attached disk or the disk is in a clean state.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14181
2022-11-29 09:24:10 -08:00
Richard Yao
97143b9d31 Introduce kmem_scnprintf()
`snprintf()` is meant to protect against buffer overflows, but operating
on the buffer using its return value, possibly by calling it again, can
cause a buffer overflow, because it will return how many characters it
would have written if it had enough space even when it did not. In a
number of places, we repeatedly call snprintf() by successively
incrementing a buffer offset and decrementing a buffer length, by its
return value. This is a potentially unsafe usage of `snprintf()`
whenever the buffer length is reached. CodeQL complained about this.

To fix this, we introduce `kmem_scnprintf()`, which will return 0 when
the buffer is zero or the number of written characters, minus 1 to
exclude the NULL character, when the buffer was too small. In all other
cases, it behaves like snprintf(). The name is inspired by the Linux and
XNU kernels' `scnprintf()`. The implementation was written before I
thought to look at `scnprintf()` and had a good name for it, but it
turned out to have identical semantics to the Linux kernel version.
That lead to the name, `kmem_scnprintf()`.

CodeQL only catches this issue in loops, so repeated use of snprintf()
outside of a loop was not caught. As a result, a thorough audit of the
codebase was done to examine all instances of `snprintf()` usage for
potential problems and a few were caught. Fixes for them are included in
this patch.

Unfortunately, ZED is one of the places where `snprintf()` is
potentially used incorrectly. Since using `kmem_scnprintf()` in it would
require changing how it is linked, we modify its usage to make it safe,
no matter what buffer length is used. In addition, there was a bug in
the use of the return value where the NULL format character was not
being written by pwrite(). That has been fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098
2022-10-29 13:05:11 -07:00
Ameer Hamza
0b2428da20
zed: Avoid core dump if wholedisk property does not exist
zed aborts and dumps core in vdev_whole_disk_from_config() if
wholedisk property does not exist. make_leaf_vdev() adds the
property but there may be already pools that don't have the
wholedisk in the label.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14062
2022-10-21 10:46:38 -07:00
Richard Yao
09453dea6a
ZED: Fix uninitialized value reads
Coverity complained about a couple of uninitialized value reads in ZED.

 * zfs_deliver_dle() can pass an uninitialized string to zed_log_msg()
 * An uninitialized sev.sigev_signo is passed to timer_create()

The former would log garbage while the latter is not a real issue, but
we might as well suppress it by initializing the field to 0 for
consistency's sake.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14047
2022-10-18 12:42:14 -07:00
Richard Yao
72c99dc959
Handle possible null pointers from malloc/strdup/strndup()
GCC 12.1.1_p20220625's static analyzer caught these.

Of the two in the btree test, one had previously been caught by Coverity
and Smatch, but GCC flagged it as a false positive. Upon examining how
other test cases handle this, the solution was changed from
`ASSERT3P(node, !=, NULL);` to using `perror()` to be consistent with
the fixes to the other fixes done to the ZTS code.

That approach was also used in ZED since I did not see a better way of
handling this there. Also, upon inspection, additional unchecked
pointers from malloc()/calloc()/strdup() were found in ZED, so those
were handled too.

In other parts of the code, the existing methods to avoid issues from
memory allocators returning NULL were used, such as using
`umem_alloc(size, UMEM_NOFAIL)` or returning `ENOMEM`.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13979
2022-10-06 17:18:40 -07:00
Richard Yao
55d7afa4ad
Reduce false positives from Static Analyzers
Both Clang's Static Analyzer and Synopsys' Coverity would ignore
assertions. Following Clang's advice, we annotate our assertions:

https://clang-analyzer.llvm.org/annotations.html#custom_assertions

This makes both Clang's Static Analyzer and Coverity properly identify
assertions. This change reduced Clang's reported defects from 246 to
180. It also reduced the false positives reported by Coverityi by 10,
while enabling Coverity to find 9 more defects that previously were
false negatives.

A couple examples of this would be CID-1524417 and CID-1524423. After
submitting a build to coverity with the modified assertions, CID-1524417
disappeared while the report for CID-1524423 no longer claimed that the
assertion tripped.

Coincidentally, it turns out that it is possible to more accurately
annotate our headers than the Coverity modelling file permits in the
case of format strings. Since we can do that and this patch annotates
headers whenever `__coverity_panic__()` would have been used in the
model file, we drop all models that use `__coverity_panic__()` from the
model file.

Upon seeing the success in eliminating false positives involving
assertions, it occurred to me that we could also modify our headers to
eliminate coverity's false positives involving byte swaps. We now have
coverity specific byteswap macros, that do nothing, to disable
Coverity's false positives when we do byte swaps. This allowed us to
also drop the byteswap definitions from the model file.

Lastly, a model file update has been done beyond the mentioned
deletions:

 * The definitions of `umem_alloc_aligned()`, `umem_alloc()` andi
   `umem_zalloc()` were originally implemented in a way that was
   intended to inform coverity that when KM_SLEEP has been passed these
   functions, they do not return NULL. A small error in how this was
   done was found, so we correct it.

 * Definitions for umem_cache_alloc() and umem_cache_free() have been
   added.

In practice, no false positives were avoided by making these changes,
but in the interest of correctness from future coverity builds, we make
them anyway.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13902
2022-09-30 15:30:12 -07:00
Ameer Hamza
55c12724d3
zed: mark disks as REMOVED when they are removed
ZED does not take any action for disk removal events if there is no
spare VDEV available. Added zpool_vdev_remove_wanted() in libzfs
and vdev_remove_wanted() in vdev.c to remove the VDEV through ZED
on removal event.  This means that if you are running zed and
remove a disk, it will be properly marked as REMOVED.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #13797
2022-09-28 09:48:46 -07:00
Richard Yao
2a493a4c71
Fix unchecked return values and unused return values
Coverity complained about unchecked return values and unused values that
turned out to be unused return values.

Different approaches were used to handle the different cases of
unchecked return values:

* cmd/zdb/zdb.c: VERIFY0 was used in one place since the existing code
  had no error handling. An error message was printed in another to
  match the rest of the code.

* cmd/zed/agents/zfs_retire.c: We dismiss the return value with `(void)`
  because the value is expected to be potentially unset.

* cmd/zpool_influxdb/zpool_influxdb.c: We dismiss the return value with
  `(void)` because the values are expected to be potentially unset.

* cmd/ztest.c: VERIFY0 was used since we want failures if something goes
  wrong in ztest.

* module/zfs/dsl_dir.c: We dismiss the return value with `(void)`
  because there is no guarantee that the zap entry will always be there.
  For example, old pools imported readonly would not have it and we do
  not want to fail here because of that.

* module/zfs/zfs_fm.c: `fnvlist_add_*()` was used since the
  allocations sleep and thus can never fail.

* module/zfs/zvol.c: We dismiss the return value with `(void)` because
  we do not need it. This matches what is already done in the analogous
  `zfs_replay_write2()`.

* tests/zfs-tests/cmd/draid.c: We suppress one return value with
  `(void)` since the code handles errors already. The other return value
  is handled by switching to `fnvlist_lookup_uint8_array()`.

* tests/zfs-tests/cmd/file/file_fadvise.c: We add error handling.

* tests/zfs-tests/cmd/mmap_sync.c: We add error handling for munmap, but
  ignore failures on remove() with (void) since it is expected to be
  able to fail.

* tests/zfs-tests/cmd/mmapwrite.c: We add error handling.

As for unused return values, they were all in places where there was
error handling, so logic was added to handle the return values.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13920
2022-09-23 16:52:03 -07:00
Richard Yao
de6c0d3d8c
Fix potential NULL pointer dereference in zfsdle_vdev_online()
Coverity complained about this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13903
2022-09-20 15:20:04 -07:00
Richard Yao
f272960d52
Fix usage of zed_log_msg() and zfs_panic_recover()
Coverity complained about the format specifiers not matching variables.
In one case, the variable is a constant, so we fix it. In another, we
were missing an argument (about which coverity also complained).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #13888
2022-09-19 17:32:18 -07:00
Tony Hutter
e27e692bcc
zed: Fix config_sync autoexpand flood
Users were seeing floods of `config_sync` events when autoexpand was
enabled.  This happened because all "disk status change" udev events
invoke the autoexpand codepath, which calls zpool_relabel_disk(),
which in turn cause another "disk status change" event to happen,
in a feedback loop.  Note that "disk status change" happens every time
a user calls close() on a block device.

This commit breaks the feedback loop by only allowing an autoexpand
to happen if the disk actually changed size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes: #7132
Closes: #7366
Closes #13729
2022-09-08 10:32:30 -07:00
Paul Dagnelie
17e212652d
Prevent zevent list from consuming all of kernel memory
There are a couple changes included here. The first is to introduce 
a cap on the size the ZED will grow the zevent list to. One million 
entries is more than enough for most use cases, and if you are 
overflowing that value, the problem needs to be addressed another 
way. The value is also tunable, for those who want the limit to be 
higher or lower. 
 
The other change is to add a kernel module parameter that allows 
snapshot creation/deletion to be exempted from the history logging; 
for most workloads, having these things logged is valuable, but for 
some workloads it produces large quantities of log spam and isn't 
especially helpful.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Issue #13374 
Closes #13753
2022-08-22 12:36:22 -07:00
Tony Hutter
9fe2f262aa
zed: Look for NVMe DEVPATH if no ID_BUS
We tried replacing an NVMe drive using autoreplace, only
to see zed reject it with:

zed[27955]: zed_udev_monitor: /dev/nvme5n1 no devid source

This happened because ZED saw that ID_BUS was not set by udev
for the NVMe drive, and thus didn't think it was "real drive".
This commit allows NVMe drives to be autoreplaced even if
ID_BUS is not set.

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13512 
Closes #13646
2022-07-14 10:19:37 -07:00
Tino Reichardt
1d3ba0bf01
Replace dead opensolaris.org license link
The commit replaces all findings of the link:
http://www.opensolaris.org/os/licensing with this one:
https://opensource.org/licenses/CDDL-1.0

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #13619
2022-07-11 14:16:13 -07:00
Tony Hutter
e4ab3f40df
zed: Ignore false 'atari' partitions in autoreplace
libudev will sometimes falsely identify an 'atari' partition on a
blank disk, preventing it from being used in an autoreplace.  This
seems to be a known issue.  The workaround is to just ignore the
fake partition and continue with the autoreplace.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13497
Closes #13632
2022-07-11 13:35:19 -07:00
наб
a926aab902 Enable -Wwrite-strings
Also, fix leak from ztest_global_vars_to_zdb_args()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13348
2022-06-29 14:08:54 -07:00
Brian Behlendorf
d98a67a53a
Replace EXTRA_DIST with dist_noinst_DATA
The EXTRA_DIST variable is ignored when used in the FALSE conditional
of a Makefile.am.  This results in the `make dist` target omitting
these files from the generated tarball unless CONFIG_USER is defined.
This issue can be avoided by switching to use the dist_noinst_DATA
variable which is handled as expected by autoconf.

This change also adds support for --with-config=dist as an alias
for --with-config=srpm and updates the GitHub workflows to use it.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13459
Closes #13505
2022-05-26 09:24:50 -07:00
Brian Behlendorf
cf70c0f8ae
zed: Take no action on scrub/resilver checksum errors
When scrubbing/resilvering a pool it can be counter productive to
cancel the scan and kick of a replace operation to a hot spare
when encountering checksum errors.  In this case, the best course
of action is to allow the scrub/resilver to complete as quickly
as possible and to keep the vdevs fully online if possible.

Realistically, this is less of an issue for a RAIDZ since a
traditional resilver must be used and checksums will be verified.
However, this is not the case for a mirror or dRAID pool which is
sequentially resilvered and checksum verification is deferred
until after the replace operation completes.

Regardless, we apply this policy to all pool types since it's
a good idea for all vdevs.  Degrading additional vdevs has the
potential to make a bad situation worse.  Note the checksum
errors will still be reported as both an event and by
`zpool status`.  This change only prevents the ZED from
proactively taking any action.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #13499
2022-05-24 09:36:07 -07:00
наб
89e81bc6ad Remove final K&R definitions
Clang trunk now warns -Wstrict-prototypes on this, and they're removed
in C2x

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13447
2022-05-18 12:10:47 -07:00
heeplr
08b32c6fa9
zed: support subject as header in zed_notify_email()
Some minimal MUAs don't support passing the subjects as cmdline option.
This commit checks if "@SUBJECT@" is missing in ZED_EMAIL_OPTS and then
prepends a subject header to the notification message.
Also set a default for ${subject}.

Reviewed-by: Ahelenia Ziemia<C5><84>ska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Daniel Hiepler <d-git@coderdu.de>
Closes #13440
2022-05-18 10:27:53 -07:00
наб
5cdca5b1da autoconf: use include directives instead of recursing down cmd
No installation diff, dist lost
  -zfs-2.1.99/cmd/fsck_zfs/fsck.zfs
which was distributed erroneously, since it's generated

Also clean gitrev on clean

Also add -e 'any possible bashisms' to default checkbashisms flags,
and fully parallelise it and shellcheck, and it works out-of-tree, too

Also align the Release in the dist META file correctly

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:18:38 -07:00
наб
c8970f52ed autoconf: use include directives instead of recursing down lib
As a bonus, this also adds zfs-mount-generator (previously undescended
down) and libzstd (not included) to CppCheck

As a bonus bonus, abigail rules work out-of-tree, too

Against current trunk:
  $ diff -U0 ./destdir.listing ~/store/code/zfs/destdir.listing
  -destdir/usr/local/include/libspl/sscanf.h

  $ diff --color -U0 ./zfs-2.1.99.tar.gz.listing ../oot/zfs-2.1.99.tar.gz.listing | grep -v @@ | grep -v /Makefile
  -zfs-2.1.99/config/Abigail.am
  -zfs-2.1.99/lib/libspl/include/util/
  -zfs-2.1.99/lib/libspl/include/util/sscanf.h

  $ diff --color -U0 ./zfs-2.1.99.tar.gz.listing ../oot/zfs-2.1.99.tar.gz.listing | grep -v @@ | grep /Makefile
  -zfs-2.1.99/lib/libavl/Makefile.in
  -zfs-2.1.99/lib/libefi/Makefile.in
  -zfs-2.1.99/lib/libicp/Makefile.in
  -zfs-2.1.99/lib/libnvpair/Makefile.in
  -zfs-2.1.99/lib/libshare/Makefile.in
  -zfs-2.1.99/lib/libspl/include/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/freebsd/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/freebsd/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/freebsd/sys/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/freebsd/sys/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/linux/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/linux/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/linux/sys/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/linux/sys/Makefile.in
  -zfs-2.1.99/lib/libspl/include/os/Makefile.am
  -zfs-2.1.99/lib/libspl/include/os/Makefile.in
  -zfs-2.1.99/lib/libspl/include/rpc/Makefile.am
  -zfs-2.1.99/lib/libspl/include/rpc/Makefile.in
  -zfs-2.1.99/lib/libspl/include/sys/dktp/Makefile.am
  -zfs-2.1.99/lib/libspl/include/sys/dktp/Makefile.in
  -zfs-2.1.99/lib/libspl/include/sys/Makefile.am
  -zfs-2.1.99/lib/libspl/include/sys/Makefile.in
  -zfs-2.1.99/lib/libspl/include/util/Makefile.am
  -zfs-2.1.99/lib/libspl/include/util/Makefile.in
  -zfs-2.1.99/lib/libspl/Makefile.in
  -zfs-2.1.99/lib/libtpool/Makefile.in
  -zfs-2.1.99/lib/libunicode/Makefile.in
  -zfs-2.1.99/lib/libuutil/Makefile.in
  -zfs-2.1.99/lib/libzfsbootenv/Makefile.in
  -zfs-2.1.99/lib/libzfs_core/Makefile.in
  -zfs-2.1.99/lib/libzfs/Makefile.in
  -zfs-2.1.99/lib/libzpool/Makefile.in
  -zfs-2.1.99/lib/libzstd/Makefile.in
  -zfs-2.1.99/lib/libzutil/Makefile.in
  -zfs-2.1.99/lib/Makefile.in

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13316
2022-05-10 10:18:11 -07:00
наб
ed715283de
cmd: zed: rc: drop "should be owned by root and 0600"
It doesn't matter, 0600 are Weird Permissions, and it's even weirder to
spec them for no reason ‒ it's perfectly fine if it's the usual 0:0 644,
or literally anything else, so long as unprivileged users can't edit it
(which (a) 644 accomplishes and (b) is at the administrator's
 discretion, it's not unheard of to have adm users and having it
 be 664 in that case is just as good; it's not our place to say)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12544
Closes #13276
2022-04-05 13:50:55 -07:00
наб
9185303b32 libspl: zed: event: use localtime_r()
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13284
2022-04-05 09:45:39 -07:00
наб
df6e0f0092 zed: functions: zed_log_err: forward to zed_log_msg
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:59:12 -07:00
наб
be866839f0 zed: lets: all-debug: printenv -> env
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #13259
2022-04-01 17:59:06 -07:00
наб
c21819026f Replace FMD_B_{TRUE,FALSE} with B_{TRUE,FALSE}
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:58 -07:00
наб
d465fc5844 Forbid b{copy,zero,cmp}(). Don't include <strings.h> for <string.h>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:48 -07:00
наб
861166b027 Remove bcopy(), bzero(), bcmp()
bcopy() has a confusing argument order and is actually a move, not a
copy; they're all deprecated since POSIX.1-2001 and removed in -2008,
and we shim them out to mem*() on Linux anyway

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12996
2022-03-15 15:13:42 -07:00
Tony Hutter
f2f6c18f17
zed: Misc multipath autoreplace fixes
We recently had a case where our operators replaced a bad
multipathed disk, only to see it fail to autoreplace.  The
zed logs showed that the multipath replacement disk did not pass
the 'is_dm' test in zfs_process_add() even though it should have.
is_dm is set if there exists a sysfs entry for to the
underlying /dev/sd* paths for the multipath disk.  It's
possible this path didn't exist due to a race condition where
the sysfs paths weren't created at the time the udev event came
in to zed, but this was never verified.

This patch updates the check to look for udev properties that
indicate if the new autoreplace disk is an empty multipath disk,
rather than looking for the underlying sysfs entries. It also
adds in additional logging, and fixes a bug where zed allowed
you to use an already zfs-formatted disk from another pool
as a multipath auto-replacement disk.

Furthermore, while testing this patch, I also ran across a case
where a force-faulted disk did not have a ZPOOL_CONFIG_PHYS_PATH
entry in its config.  This prevented it from being autoreplaced.
I added additional logic to derive the PHYS_PATH from the PATH if
the PATH was a /dev/disk/by-vdev/ path.  For example, if PATH
was /dev/disk/by-vdev/L28, then PHYS_PATH would be L28.  This is
safe since by-vdev paths represent physical locations and do not
change between boots.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #13023
2022-02-24 11:43:39 -08:00
Damian Szuberski
8df0bde321
Add --enable=all to ShellCheck by default
Change enforced shell type from `dash` to `sh` and excluded
`SC2039` and `SC3043` by default. `local` keyword is accepted by all
POSIX shells from practical point of view. There is no need anymore
to enforce dash so `local` is accepted.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13020
2022-02-07 11:59:09 -08:00
Phil Kauffman
aa9905d89b
zed-functions.sh: escape newline to produce valid json
This was discovered when using Discords Slack compatible webhook.

Slack webhooks works without the escape, however Discord rightly refuses
the POST as it contains invalid JSON.

https://discord.com/developers/docs/resources/webhook#execute-slackcompatible-webhook

Valid (while escaping the newline:
```
+ msg_json='{"text": "*ZFS scrub_finish error for test on quartz*\nZFS has detected a data error:\n\n   eid: 124\n class: scrub_finish\n  host: quartz\n  time: \n error: \n objid: :\n  pool: test\n"}'
```

Invalid (no escape):
```
+ msg_json='{"text": "*ZFS scrub_finish error for test on quartz*
ZFS has detected a data error:\n\n   eid: 124\n class: scrub_finish\n  host: quartz\n  time: \n error: \n objid: :\n  pool: test\n"}'
```
The new line gets rendered and not sent inside the JSON as intended.

```
++ curl -X POST https://discord.com/api/webhooks/{webhook.id}/{webhook.token}/slack --header 'Content-Type: application/json' --data-binary '{"text": "*ZFS scrub_finish error for test on quartz*
ZFS has detected a data error:\n\n   eid: 124\n class: scrub_finish\n  host: quartz\n  time: \n error: \n objid: :\n  pool: test\n"}'
+ msg_out='{"message": "Cannot send an empty message", "code": 50006}'
```

Test method:
`root@quartz:/etc/zfs/zed.d# export ZED_ZEDLET_DIR=/etc/zfs/zed.d; export ZEVENT_EID=124; export ZEVENT_SUBCLASS=scrub_finish; export ZEVENT_POOL=test; export ZED_NOTIFY_DATA=1; bash -x ./data-notify.sh`

Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Philip Kauffman <philip@kauffman.me>
Closes #13049
2022-02-03 14:31:57 -08:00
Damian Szuberski
c1d3be19d7
Add ShellCheck's --enable=all inside cmd/
The only exception is `cmd/vdev_id/vdev_id` which might be a subject of
refactoring (see #12084)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #12912
2022-01-06 16:07:54 -08:00
shodanshok
bfb3b0d77a
zed: send notification email by default
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #12806
2021-12-21 16:24:05 -08:00
наб
724e6c68e5 zed: agents: fmd_api: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00
наб
d6fccfe62e zed: agents: zfs_retire: fix unused, remove argsused
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12835
2021-12-21 12:05:11 -08:00