Compare commits

..

270 Commits

Author SHA1 Message Date
Tony Hutter a8c2b7ebc6 Tag zfs-0.7.13
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2019-02-22 09:47:55 -08:00
John Wren Kennedy 2af898ee24 test-runner: python3 support
Updated to be compatible with Python 2.6, 2.7, 3.5 or newer.

Reviewed-by: John Ramsden <johnramsden@riseup.net>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com>
Closes #8096
2019-02-22 09:47:34 -08:00
Gregor Kopka c32c2f17d0 Fix flake 8 style warnings
Ran zts-report.py and test-runner.py from ./tests/test-runner/bin/
through the 2to3 (https://docs.python.org/2/library/2to3.html).
Checked the result, fixed:
- 'maxint' -> 'maxsize' that 2to3 missed.
- 'cmp=' parameter for a 'sorted()' with a 'key=' version.
- try/except wrapping of configparser import as there are still
  python 2.7 systems that lack a compatibility shim

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregor Kopka <gregor@kopka.net>
Closes #7925
Closes #7952
2019-02-22 09:47:34 -08:00
Tony Hutter 2254b2bbbe GCC 9.0: Fix ztest "directive argument is not a nul-terminated string"
GCC 9.0 is complaining because we're trying to print strings that
are defined like this:

.zo_pool = { 'z', 't', 'e', 's', 't', '\0' },

Fix them by making them actual strings.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8330
2019-02-22 09:47:34 -08:00
Brian Behlendorf 5c4ec382a7 Linux 5.0 compat: Fix bio_set_dev()
The Linux 5.0 kernel updated the bio_set_dev() macro so it calls the
GPL-only bio_associate_blkg() symbol thus inadvertently converting
the entire macro.  Provide a minimal version which always assigns the
request queue's root_blkg to the bio.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8287
2019-02-22 09:47:34 -08:00
Tony Hutter e22bfd8149 Linux 5.0 compat: Disable vector instructions on 5.0+ kernels
The 5.0 kernel no longer exports the functions we need to do vector
(SSE/SSE2/SSE3/AVX...) instructions.  Disable vector-based checksum
algorithms when building against those kernels.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8259
2019-02-22 09:47:34 -08:00
Tony Hutter f45ad7bff6 Linux 5.0 compat: Fix SUBDIRs
SUBDIRs has been deprecated for a long time, and was finally removed in
the 5.0 kernel.  Use "M=" instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8257
2019-02-22 09:47:34 -08:00
Tony Hutter 0a3a4d067a Linux 5.0 compat: Convert MS_* macros to SB_*
In the 5.0 kernel, only the mount namespace code should use the MS_*
macos. Filesystems should use the SB_* ones.

https://patchwork.kernel.org/patch/10552493/

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8264
2019-02-22 09:47:34 -08:00
Tony Hutter ba8024a284 Linux 5.0 compat: Use totalram_pages()
totalram_pages() was converted to an atomic variable in 5.0:

https://patchwork.kernel.org/patch/10652795/

Its value should now be read though the totalram_pages() helper
function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8263
2019-02-22 09:47:34 -08:00
Tony Hutter edc2675aed Linux 5.0 compat: access_ok() drops 'type' parameter
access_ok no longer needs a 'type' parameter in the 5.0 kernel.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8261
2019-02-22 09:47:34 -08:00
ilbsmart 98bb45e27a deadlock between mm_sem and tx assign in zfs_write() and page fault
The bug time sequence:
1. thread #1, `zfs_write` assign a txg "n".
2. In a same process, thread #2, mmap page fault (which means the
   `mm_sem` is hold) occurred, `zfs_dirty_inode` open a txg failed,
   and wait previous txg "n" completed.
3. thread #1 call `uiomove` to write, however page fault is occurred
   in `uiomove`, which means it need `mm_sem`, but `mm_sem` is hold by
   thread #2, so it stuck and can't complete,  then txg "n" will
   not complete.

So thread #1 and thread #2 are deadlocked.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Grady Wong <grady.w@xtaotech.com>
Closes #7939
2019-02-22 09:47:34 -08:00
Neal Gompa (ニール・ゴンパ) 44f463824b dkms: Enable debuginfo option to be set with zfs sysconfig file
On some Linux distributions, the kernel module build will not
default to building with debuginfo symbols, which can make it
difficult for debugging and testing.

For this case, we provide a flag to override the build to force
debuginfo to be produced for the kernel module build.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Neal Gompa <ngompa@datto.com>
Co-authored-by: Simon Watson <swatson@datto.com>
Signed-off-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Simon Watson <swatson@datto.com>
Closes #8304
2019-02-22 09:47:34 -08:00
Neal Gompa (ニール・ゴンパ) b0d579bc55 Bump commit subject length to 72 characters
There's not really a reason to keep the subject length so short,
since the reason to make it this short was for making nice renders
of a summary list of the git log. With 72 characters, this still
works out fine, so let's just raise it to that so that it's easier
to give slightly more descriptive change summaries.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Neal Gompa <ngompa@datto.com>
Closes #8250
2019-02-22 09:47:34 -08:00
Benjamin Gentil 7e5def8ae0 zfs.8 uses wrong snapshot names in Example 15
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: bunder2015 <omfgbunder@gmail.com>
Signed-off-by: Benjamin Gentil <benjamin@gentil.io>
Closes #8241
2019-02-22 09:47:34 -08:00
Tony Hutter 89019a846b Add enclosure_symlinks option to vdev_id
Add an 'enclosure_symlinks' option to vdev_id.conf.  This creates
consistently named symlinks to the enclosure devices (/dev/sg*) based
off the configuration in vdev_id.conf.  The enclosure symlinks show
up in /dev/by-enclosure/<prefix>-<channel><num>.  The links make it
make it easy to run sg_ses on a particular enclosure device.  The
enclosure links are created in addition to the normal
/dev/disk/by-vdev links.

'enclosure_symlinks' is only valid in sas_direct configurations.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Simon Guest <simon.guest@tesujimath.org>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #8194
2019-02-22 09:47:34 -08:00
Simon Guest 41f7723e9c vdev_id: new slot type ses
This extends vdev_id to support a new slot type, ses, for SCSI Enclosure
Services.  With slot type ses, the disk slot numbers are determined by
using the device slot number reported by sg_ses for the device with
matching SAS address, found by querying all available enclosures.

This is primarily of use on systems with a deficient driver omitting
support for bay_identifier in /sys/devices.  In my testing, I found that
the existing slot types of port and id were not stable across disk
replacement, so an alternative was required.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Simon Guest <simon.guest@tesujimath.org>
Closes #6956
2019-02-22 09:47:34 -08:00
Simon Guest 2b8c3cb0c8 vdev_id: extension for new scsi topology
On systems with SCSI rather than SAS disk topology, this change enables
the vdev_id script to match against the block device path, and therefore
create a vdev alias in /dev/disk/by-vdev.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Simon Guest <simon.guest@tesujimath.org>
Closes #6592
2019-02-22 09:47:34 -08:00
Olaf Faaland f325d76e96 Rename macro ZFS_MINOR due to Lustre conflict
Macro ZFS_MINOR, introduced in commit a6cc9756 to record the chosen
static minor number for /dev/zfs, conflicts with an existing macro
in Lustre.  The lustre macro (along with _MAJOR, _PATCH, _FIX) is
used to record the zfsonlinux version Lustre is being built against.

Since the Lustre macro came first, and is used in past versions of
lustre at least going back to 2.10, it makes sense to rename the
macro in ZFS instead of doing so in Lustre which would require
backporting the patch.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #8195
2019-02-22 09:47:34 -08:00
Brian Behlendorf e3fb781c5f Add kernel module auto-loading
Historically a dynamic misc minor number was registered for the
/dev/zfs device in order to prevent minor number collisions.  This
was fine but it prevented us from being able to use the kernel
module auto-loaded which requires a known reserved value.

Resolve this issue by adding a configure test to find an available
misc minor number which can then be used in MODULE_ALIAS_MISCDEV at
build time.  By adding this alias the zfs kmod is added to the list
of known static-nodes and the systemd-tmpfiles-setup-dev service
will create a /dev/zfs character device at boot time.

This in turn allows us to update the 90-zfs.rules file to make it
aware this is a static node.  The upshot of this is that whenever
a process (zpool, zfs, zed) opens the /dev/zfs the kmods will be
automatic loaded.  This even works for unprivileged users so there
is no longer a need to manually load the modules at boot time.

As an additional bonus the zed now no longer needs to start after
the zfs-import.service since it will trigger the module load.

In the unlikely event the minor number we selected conflicts with
another out of tree unregistered minor number the code falls back
to dynamically allocating it.  In this case the modules again
must be manually loaded.

Note that due to the change in the method of registering the minor
number the zimport.sh test case may incorrectly fail when the
static node for the installed packages is created instead of the
dynamic one.  This issue will only transiently impact zimport.sh
for this single commit when we transition and are mixing and
matching methods.

Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
TEST_ZIMPORT_SKIP="yes"
Closes #7287
2019-02-22 09:47:34 -08:00
Ben Wolsieffer 14a5e48fb9 Use autoconf variable for C preprocessor
This fixes the build when cross-compiling, where the preprocessor might
be prefixed.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ben Wolsieffer <benwolsieffer@gmail.com>
Closes #8180
2019-02-22 09:47:34 -08:00
Matthew Ahrens 01937958ce OpenZFS 9577 - remove zfs_dbuf_evict_key tsd
The zfs_dbuf_evict_key TSD (thread-specific data) is not necessary -
we can instead pass a flag down in a few places to prevent recursive
dbuf eviction. Making this change has 3 benefits:

1. The code semantics are easier to understand.
2. On Linux, performance is improved, because creating/removing
   TSD values (by setting to NULL vs non-NULL) is expensive, and
   we do it very often.
3. According to Nexenta, the current semantics can cause a
   deadlock when concurrently calling dmu_objset_evict_dbufs()
   (which is rare today, but they are working on a "parallel
   unmount" change that triggers this more easily):

Porting Notes:
* Minor conflict with OpenZFS 9337 which has not yet been ported.

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/9577
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/645
External-issue: DLPX-58547
Closes #7602
2019-02-22 09:47:34 -08:00
LOLi edb504f9db Honor --with-mounthelperdir where applicable
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6962
2019-02-22 09:47:34 -08:00
LOLi 2428fbbfcf contrib/initramfs: switch to automake
Use automake to build initramfs scripts and hooks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6761
2019-02-22 09:47:33 -08:00
Tony Hutter 16d298188f Tag zfs-0.7.12
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2018-11-08 14:38:37 -08:00
Tony Hutter f42f8702ce Add BuildRequires gcc, make, elfutils-libelf-devel
This adds a BuildRequires for gcc, make, and elfutils-libelf-devel
into our spec files.  gcc has been a packaging requirement for
awhile now:

https://fedoraproject.org/wiki/Packaging:C_and_C%2B%2B

These additional BuildRequires allow us to mock build in
Fedora 29.

Reviewed-by: Neal Gompa <ngompa@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  Tony Hutter <hutter2@llnl.gov>
Closes #8095
Closes #8102
2018-11-08 14:38:28 -08:00
Brian Behlendorf 9e58d5ef38 Fix flake8 "invalid escape sequence 'x'" warning
From, https://lintlyci.github.io/Flake8Rules/rules/W605.html

As of Python 3.6, a backslash-character pair that is not a valid
escape sequence now generates a DeprecationWarning. Although this
will eventually become a SyntaxError, that will not be for several
Python releases.

Note 'float_pobj' was simply removed from arcstat.py since it
was entirely unused.

Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8056
2018-11-08 14:38:28 -08:00
Brian Behlendorf 320f9de8ab ZTS: Update O_TMPFILE support check
In CentOS 7.5 the kernel provided a compatibility wrapper to support
O_TMPFILE.  This results in the test setup script correctly detecting
kernel support.  But the ZFS module was built without O_TMPFILE
support due to the non-standard CentOS kernel interface.

Handle this case by updating the setup check to fail either when
the kernel or the ZFS module fail to provide support.  The reason
will be clearly logged in the test results.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7528
2018-11-08 14:38:28 -08:00
George Melikov 262275ab26 Allow use of pool GUID as root pool
It's helpful if there are pools with same names,
but you need to use only one of them.

Main case is twin servers, meanwhile some software
requires the same name of pools (e.g. Proxmox).

Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Igor ‘guardian’ Lidin of Moscow, Russia
Closes #8052
2018-11-08 14:38:28 -08:00
Brian Behlendorf 55f39a01e6 Fix arc_release() refcount
Update arc_release to use arc_buf_size().  This hunk was accidentally
dropped when porting compressed send/recv, 2aa34383b.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8000
2018-11-08 14:38:28 -08:00
Tim Schumacher b884768e46 Prefix all refcount functions with zfs_
Recent changes in the Linux kernel made it necessary to prefix
the refcount_add() function with zfs_ due to a name collision.

To bring the other functions in line with that and to avoid future
collisions, prefix the other refcount functions as well.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Closes #7963
2018-11-08 14:38:28 -08:00
Tim Schumacher f8f4e13776 Linux 4.19-rc3+ compat: Remove refcount_t compat
torvalds/linux@59b57717f ("blkcg: delay blkg destruction until
after writeback has finished") added a refcount_t to the blkcg
structure. Due to the refcount_t compatibility code, zfs_refcount_t
was used by mistake.

Resolve this by removing the compatibility code and replacing the
occurrences of refcount_t with zfs_refcount_t.

Reviewed-by: Franz Pletz <fpletz@fnordicwalking.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Schumacher <timschumi@gmx.de>
Closes #7885
Closes #7932
2018-11-08 14:38:28 -08:00
Gregor Kopka 5f07d51751 Zpool iostat: remove latency/queue scaling
Bandwidth and iops are average per second while *_wait are averages
per request for latency or, for queue depths, an instantaneous
measurement at the end of an interval (according to man zpool).

When calculating the first two it makes sense to do
x/interval_duration (x being the increase in total bytes or number of
requests over the duration of the interval, interval_duration in
seconds) to 'scale' from amount/interval_duration to amount/second.

But applying the same math for the latter (*_wait latencies/queue) is
wrong as there is no interval_duration component in the values (these
are time/requests to get to average_time/request or already an
absulute number).

This bug leads to the only correct continuous *_wait figures for both
latencies and queue depths from 'zpool iostat -l/q' being with
duration=1 as then the wrong math cancels itself (x/1 is a nop).

This removes temporal scaling from latency and queue depth figures.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregor Kopka <gregor@kopka.net>
Closes #7945
Closes #7694
2018-11-08 14:38:28 -08:00
Brian Behlendorf b2f003c4f4 Fix statfs(2) for 32-bit user space
When handling a 32-bit statfs() system call the returned fields,
although 64-bit in the kernel, must be limited to 32-bits or an
EOVERFLOW error will be returned.

This is less of an issue for block counts since the default
reported block size in 128KiB. But since it is possible to
set a smaller block size, these values will be scaled as
needed to fit in a 32-bit unsigned long.

Unlike most other filesystems the total possible file counts
are more likely to overflow because they are calculated based
on the available free space in the pool. In order to prevent
this the reported value must be capped at 2^32-1. This is
only for statfs(2) reporting, there are no changes to the
internal ZFS limits.

Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #7927
Closes #7122
Closes #7937
2018-11-08 14:38:28 -08:00
Olaf Faaland 9014da2b01 Skip import activity test in more zdb code paths
Since zdb opens the pools read-only, it cannot damage the pool in the
event the pool is already imported either on the same host or on
another one.

If the pool vdev structure is changing while zdb is importing the
pool, it may cause zdb to crash.  However this is unlikely, and in any
case it's a user space process and can simply be run again.

For this reason, zdb should disable the multihost activity test on
import that is normally run.

This commit fixes a few zdb code paths where that had been overlooked.
It also adds tests to ensure that several common use cases handle this
properly in the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Gu Zheng <guzheng2331314@163.com>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7797
Closes #7801
2018-11-08 14:38:28 -08:00
Matthew Ahrens 45579c9515 Reduce taskq and context-switch cost of zio pipe
When doing a read from disk, ZFS creates 3 ZIO's: a zio_null(), the
logical zio_read(), and then a physical zio. Currently, each of these
results in a separate taskq_dispatch(zio_execute).

On high-read-iops workloads, this causes a significant performance
impact. By processing all 3 ZIO's in a single taskq entry, we reduce the
overhead on taskq locking and context switching.  We accomplish this by
allowing zio_done() to return a "next zio to execute" to zio_execute().

This results in a ~12% performance increase for random reads, from
96,000 iops to 108,000 iops (with recordsize=8k, on SSD's).

Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Wilson <george.wilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-59292
Closes #7736
2018-11-08 14:38:28 -08:00
Tom Caputi b32f1279d4 Fix race in dnode_check_slots_free()
Currently, dnode_check_slots_free() works by checking dn->dn_type
in the dnode to determine if the dnode is reclaimable. However,
there is a small window of time between dnode_free_sync() in the
first call to dsl_dataset_sync() and when the useraccounting code
is run when the type is set DMU_OT_NONE, but the dnode is not yet
evictable, leading to crashes. This patch adds the ability for
dnodes to track which txg they were last dirtied in and adds a
check for this before performing the reclaim.

This patch also corrects several instances when dn_dirty_link was
treated as a list_node_t when it is technically a multilist_node_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7147
Closes #7388
2018-11-08 14:38:28 -08:00
Tony Hutter 1b0cd07131 Tag zfs-0.7.11
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2018-09-13 10:13:41 -07:00
Dr. András Korn 8c6867dae4 tx_waited -> tx_dirty_delayed in trace_dmu.h
This change was missed in 0735ecb334.

Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: András Korn <korn-github.com@elan.rulez.org>
Closes #7096
2018-09-13 10:12:22 -07:00
Tony Hutter 99310c0aa0 Revert "zpool reopen should detect expanded devices"
This reverts commit 2a16d4cfaf.

The commit was causing a "attempt to access beyond the end
of device" error:

list.zfsonlinux.org/pipermail/zfs-discuss/2018-September/032217.html
2018-09-13 10:11:42 -07:00
Tony Hutter d126980e5f Tag zfs-0.7.10
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2018-09-05 10:37:32 -07:00
Chris Siebenmann 88ef5b238b Correctly handle errors from kern_path
As a regular kernel function, kern_path() returns errors as negative
errnos, such as -ELOOP. zfsctl_snapdir_vget() must convert these into
the positive errnos used throughout the ZFS code when it returns them
to other ZFS functions so that the ZFS code properly sees them as
errors.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu>
Closes #7764
Closes #7864
2018-07-06 02:46:51 -07:00
Georgy Yakovlev 30d8b85702 Fix build with CONFIG_GCC_PLUGIN_RANDSTRUCT
fs/zfs/zfs/metaslab.c:1055:2: error: positional initialization of field
in ‘struct’ declared with ‘designated_init’ attribute
[-Werror=designated-init]
  metaslab_rt_remove,

Signed-off-by: Georgy Yakovlev <ya@sysdump.net>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes: #7069
2018-07-06 02:46:51 -07:00
Tom Caputi 45f0437912 Fix 'zfs recv' of non large_dnode send streams
Currently, there is a bug where older send streams without the
DMU_BACKUP_FEATURE_LARGE_DNODE flag are not handled correctly.
The code in receive_object() fails to handle cases where
drro->drr_dn_slots is set to 0, which is always the case when the
sending code does not support this feature flag. This patch fixes
the issue by ensuring that that a value of 0 is treated as
DNODE_MIN_SLOTS.

Tested-by:  DHE <git@dehacked.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7617
Closes #7662
2018-07-06 02:46:51 -07:00
Tom Caputi dc3eea871a Fix object reclaim when using large dnodes
Currently, when the receive_object() code wants to reclaim an
object, it always assumes that the dnode is the legacy 512 bytes,
even when the incoming bonus buffer exceeds this length. This
causes a buffer overflow if --enable-debug is not provided and
triggers an ASSERT if it is. This patch resolves this issue and
adds an ASSERT to ensure this can't happen again.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #7097
Closes #7433
2018-07-06 02:46:51 -07:00
Tim Chase d2c8103a68 Fix problems receiving reallocated dnodes
This is a port of 047116ac - Raw sends must be able to decrease nlevels,
to the zfs-0.7-stable branch.  It includes the various fixes to the
problem of receiving incremental streams which include reallocated dnodes
in which the number of dnode slots has changed but excludes the parts
which are related to raw streams.

From 047116ac:

    Currently, when a raw zfs send file includes a
    DRR_OBJECT record that would decrease the number of
    levels of an existing object, the object is reallocated
    with dmu_object_reclaim() which creates the new dnode
    using the old object's nlevels. For non-raw sends this
    doesn't really matter, but raw sends require that
    nlevels on the receive side match that of the send
    side so that the checksum-of-MAC tree can be properly
    maintained. This patch corrects the issue by freeing
    the object completely before allocating it again in
    this case.

    This patch also corrects several issues with
    dnode_hold_impl() and related functions that prevented
    dnodes (particularly multi-slot dnodes) from being
    reallocated properly due to the fact that existing
    dnodes were not being fully cleaned up when they
    were freed.

    This patch adds a test to make sure that zfs recv
    functions properly with incremental streams containing
    dnodes of different sizes.

This also includes a one-liner fix from loli10K to fix a test failure:
https://github.com/zfsonlinux/zfs/pull/7792#discussion_r212769264

Authored-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>

Closes #6821
Closes #6864

NOTE: This is the first of the port of 3 related patches patches to the
zfs-0.7-release branch of ZoL.  The other two patches should immediately
follow this one.
2018-07-06 02:46:51 -07:00
Joao Carlos Mendes Luis 3ea1f7f193 Fedora 28: Fix misc bounds check compiler warnings
Fix a bunch of truncation compiler warnings that show up
on Fedora 28 (GCC 8.0.1).

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #7368
Closes #7826
Closes #7830
2018-07-06 02:46:51 -07:00
LOLi 4356dd23a9 Fix libaio-devel requirement for Debian-based distributions
BuildRequires tags for "-devel" packages in the RPM spec file do not
work when building on Debian-based distributions.

Fix this issue by making this requirement conditional to RPM-based
distributions.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7829
Closes #7831
2018-07-06 02:46:51 -07:00
Brian Behlendorf 75318ec497 Add libaio-devel BuildRequires
The zfs-test package needs a build requirement on the libaio-devel
package.  Without it ./configure will correctly determine that
mmap_libaio cannot be built and it will be skipped.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7821
Closes #7824
2018-07-06 02:46:51 -07:00
Brian Behlendorf c1629734ab Add missing zfs-dracut RPM dependencies
The zfs-dracut package requires the hostid, basename, head, awk,
and grep utilities be installed.  The first three are provided by
coreutils but additional dependencies are required for awk and grep.

Reviewed-by: Manuel Amador (Rudd-O) <rudd-o@rudd-o.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7729
Closes #7747
2018-07-06 02:46:51 -07:00
DeHackEd 778290d5bc Don't modify argv[] in user tools
argv[] gets modified during string parsing for input arguments. This
is reflected in the live process listing. Don't do that.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #7760
2018-07-06 02:46:51 -07:00
LOLi 98bc8e0b23 Fix arcstat.py handling of unsupported options
This change allows the arcstat.py script to handle unsupported options
gracefully and print both error and usage messages when one such option
is provided.

Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7799
2018-07-06 02:46:51 -07:00
LOLi caafa436eb Allow inherited properties in zfs_check_settable()
This change modifies how 'checksum' and 'dedup' properties are verified
in zfs_check_settable() handling the case where they are explicitly
inherited in the dataset hierarchy when receiving a recursive send
stream.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7755
Closes #7576
Closes #7757
2018-07-06 02:46:51 -07:00
LOLi fe8de1c8a6 Fix zfs incremental send remove '-o' properties
When receiving an incremental send stream with intermediary snapshots
zfs_receive_one() does not correctly identify the top-level dataset:
consequently we restore said snapshots as if they were children
datasets in the hierarchy, forcing inheritance of any property received
with 'zfs send -o' and effectively removing any locally set value.

The test case did not correctly verify this situation because it uses
adjacent snapshots, basically testing 'zfs send -i' instead of
'zfs send -I': this commit adds an additional intermediary snapshot to
the test script.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7478
2018-07-06 02:46:51 -07:00
Toomas Soome 1bd93ea1e0 OpenZFS 8906 - uts: illumos rootfs should support salted cksum
Porting notes:
* As of grub-2.02 these checksums are not supported.  However, as
  pointed out in #6501 there are alternatives such as EFISTUB which
  work and have no such restriction.  A warning was added to the
  checksum property section of the zfs.8 man page.

Authored by: Toomas Soome <tsoome@me.com>
Reviewed by: C Fraire <cfraire@me.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://illumos.org/issues/8906
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7dec52f
Closes #6501
Closes #7714
2018-07-06 02:46:51 -07:00
Brian Behlendorf 6857950e46 Fix zpl_mount() deadlock
Commit 93b43af10 inadvertently introduced the following scenario which
can result in a deadlock.  This issue was most easily reproduced by
LXD containers using a ZFS storage backend but should be reproducible
under any workload which is frequently mounting and unmounting.

-- THREAD A --
spa_sync()
  spa_sync_upgrades()
    rrw_enter(&dp->dp_config_rwlock, RW_WRITER, FTAG); <- Waiting on B

-- THREAD B --
mount_fs()
  zpl_mount()
    zpl_mount_impl()
      dmu_objset_hold()
        dmu_objset_hold_flags()
          dsl_pool_hold()
            dsl_pool_config_enter()
              rrw_enter(&dp->dp_config_rwlock, RW_READER, tag);
    sget()
      sget_userns()
        grab_super()
          down_write(&s->s_umount); <- Waiting on C

-- THREAD C --
cleanup_mnt()
  deactivate_super()
    down_write(&s->s_umount);
    deactivate_locked_super()
      zpl_kill_sb()
        kill_anon_super()
          generic_shutdown_super()
            sync_filesystem()
              zpl_sync_fs()
                zfs_sync()
                  zil_commit()
                    txg_wait_synced() <- Waiting on A

Reviewed by: Alek Pinchuk <apinchuk@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7598 
Closes #7659 
Closes #7691 
Closes #7693
2018-07-06 02:46:51 -07:00
Brian Behlendorf 716ce2b89e Fix kernel unaligned access on sparc64
Update the SA_COPY_DATA macro to check if architecture supports
efficient unaligned memory accesses at compile time.  Otherwise
fallback to using the sa_copy_data() function.

The kernel provided CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is
used to determine availability in kernel space.  In user space
the x86_64, x86, powerpc, and sometimes arm architectures will
define the HAVE_EFFICIENT_UNALIGNED_ACCESS macro.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7642
Closes #7684
2018-07-06 02:46:51 -07:00
Troels Nørgaard 9daae583d8 Default ashift for Amazon EC2 NVMe devices
Add a default 4 KiB ashift for Amazon EC2 NVMe devices on instances with
NVMe ephemeral devices, such as the types c5d, f1, i3 and m5d.
As per the official documentation [1] a 4096 byte blocksize should be
used to match the underlying hardware.

The string was identified via:

$ sudo sginfo -M /dev/nvme0n1
INQUIRY response (cmd: 0x12)
----------------------------
Device Type                        0
Vendor:                    NVMe
Product:                   Amazon EC2 NVMe
Revision level:

$ lsblk -io KNAME,TYPE,SIZE,MODEL
KNAME   TYPE    SIZE MODEL
nvme0n1 disk  442.4G Amazon EC2 NVMe Instance Storage

[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/
    storage-optimized-instances.html
    Retrived 2018-07-03

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Troels Nørgaard <tnn@tradeshift.com>
Closes #7676
2018-07-06 02:46:51 -07:00
Brian Behlendorf b5ee3df776 Linux 4.14 compat: blk_queue_stackable()
The blk_queue_stackable() function was replaced in the 4.14 kernel
by queue_is_rq_based(), commit torvalds/linux@5fdee212.  This change
resulted in the default elevator being used which can negatively
impact performance.

Rather than adding additional compatibility code to detect the
new interface unconditionally attempt to set the elevator.  Since
we expect this to fail for block devices without an elevator the
error message has been moved in to zfs_dbgmsg().

Finally, it was observed that the elevator_change() was removed
from the 4.12 kernel, commit torvalds/linux@c033269.  Update the
comment to clearly specify which are expected to export the
elevator_change() symbol.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7645
2018-07-06 02:46:51 -07:00
Tony Hutter 17cd9a8e0c Add pool state /proc entry, "SUSPENDED" pools
1. Add a proc entry to display the pool's state:

$ cat /proc/spl/kstat/zfs/tank/state
ONLINE

This is done without using the spa config locks, so it will
never hang.

2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7331
Closes #7563
2018-07-06 02:46:51 -07:00
Sara Hartse 2a16d4cfaf zpool reopen should detect expanded devices
Update bdev_capacity to have wholedisk vdevs query the
size of the underlying block device (correcting for the size
of the efi parition and partition alignment) and therefore detect
expanded space.

Correct vdev_get_stats_ex so that the expandsize is aligned
to metaslab size and new space is only reported if it is large
enough for a new metaslab.

Reviewed by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Signed-off-by: sara hartse <sara.hartse@delphix.com>
External-issue: LX-165
Closes #7546
Issue #7582
2018-07-06 02:46:51 -07:00
Antonio Russo 3350a33908 Support Debian DKMS builds
scripts/dkms.mkconf calls configure with
`--with-linux=${kernel_source_dir}`, but Debian puts it kernel source at
`/lib/modules/<version>/source`. This patch adds the same logic to the
DKMS file produced by `scripts/dkms.mkconf` that Debian has shipped in
its official ZFS packaging: at DKMS build time, it checks if the system
is a Debian system, and adjusts the path accordingly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #7358 
Closes #7540 
Closes #7554
2018-07-06 02:46:51 -07:00
Olaf Faaland 3eef58c9b6 module param callbacks check for initialized spa
Callbacks provided for module parameters are executed both
after the module is loaded, when a user alters it via sysfs, e.g
	echo bar > /sys/modules/zfs/parameters/foo

as well as when the module is loaded with an argument, e.g.
	modprobe zfs foo=bar

In the latter case, the init functions likely have not run yet,
including spa_init() which initializes the namespace lock so it is safe
to use.

Instead of immediately taking the namespace lock and attemping to
iterate over initialized spa structures, check whether spa_mode_global
is nonzero.  This is set by spa_init() after it has initialized the
namespace lock.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7496 
Closes #7521
2018-07-06 02:46:51 -07:00
Brian Behlendorf 4805781c74 Trim new line from zfs_vdev_scheduler
Add a helper function to trim the tailing new line.  While we're
here use this new hook to immediately apply the new scheduler.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3356 
Closes #6573
2018-07-06 02:46:51 -07:00
Chunwei Chen b06f40ea9b Fix ENOSPC in "Handle zap_add() failures in ..."
Commit cc63068 caused ENOSPC error when copy a large amount of files
between two directories. The reason is that the patch limits zap leaf
expansion to 2 retries, and return ENOSPC when failed.

The intent for limiting retries is to prevent pointlessly growing table
to max size when adding a block full of entries with same name in
different case in mixed mode. However, it turns out we cannot use any
limit on the retry. When we copy files from one directory in readdir
order, we are copying in hash order, one leaf block at a time. Which
means that if the leaf block in source directory has expanded 6 times,
and you copy those entries in that block, by the time you need to expand
the leaf in destination directory, you need to expand it 6 times in one
go. So any limit on the retry will result in error where it shouldn't.

Note that while we do use different salt for different directories, it
seems that the salt/hash function doesn't provide enough randomization
to the hash distance to prevent this from happening.

Since cc63068 has already been reverted. This patch adds it back and
removes the retry limit.

Also, as it turn out, failing on zap_add() has a serious side effect for
mzap_upgrade(). When upgrading from micro zap to fat zap, it will
call zap_add() to transfer entries one at a time. If it hit any error
halfway through, the remaining entries will be lost, causing those files
to become orphan. This patch add a VERIFY to catch it.

Reviewed-by: Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Albert Lee <trisk@forkgnu.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7401 
Closes #7421
2018-07-06 02:46:51 -07:00
Olaf Faaland 6b5cc49d81 Fix divide-by-zero in mmp_delay_update()
vdev_count_leaves() in the denominator may return 0, caught by Coverity.
Introduced by

* 533ea04 Update mmp_delay on sync or skipped, failed write

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7391
2018-07-06 02:46:51 -07:00
Prakash Surya ef7a79488a OpenZFS 8997 - ztest assertion failure in zil_lwb_write_issue
PROBLEM
=======

When `dmu_tx_assign` is called from `zil_lwb_write_issue`, it's possible
for either `ERESTART` or `EIO` to be returned.

If `ERESTART` is returned, this will cause an assertion to fail directly
in `zil_lwb_write_issue`, where the code assumes the return value is
`EIO` if `dmu_tx_assign` returns a non-zero value. This can occur if the
SPA is suspended when `dmu_tx_assign` is called, and most often occurs
when running `zloop`.

If `EIO` is returned, this can cause assertions to fail elsewhere in the
ZIL code. For example, `zil_commit_waiter_timeout` contains the
following logic:

    lwb_t *nlwb = zil_lwb_write_issue(zilog, lwb);
    ASSERT3S(lwb->lwb_state, !=, LWB_STATE_OPENED);

In this case, if `dmu_tx_assign` returned `EIO` from within
`zil_lwb_write_issue`, the `lwb` variable passed in will not be issued
to disk. Thus, it's `lwb_state` field will remain `LWB_STATE_OPENED` and
this assertion will fail. `zil_commit_waiter_timeout` assumes that after
it calls `zil_lwb_write_issue`, the `lwb` will be issued to disk, and
doesn't handle the case where this is not true; i.e. it doesn't handle
the case where `dmu_tx_assign` returns `EIO`.

SOLUTION
========

This change modifies the `dmu_tx_assign` function such that `txg_how` is
a bitmask, rather than of the `txg_how_t` enum type. Now, the previous
`TXG_WAITED` semantics can be used via `TXG_NOTHROTTLE`, along with
specifying either `TXG_NOWAIT` or `TXG_WAIT` semantics.

Previously, when `TXG_WAITED` was specified, `TXG_NOWAIT` semantics was
automatically invoked. This was not ideal when using `TXG_WAITED` within
`zil_lwb_write_issued`, leading the problem described above. Rather, we
want to achieve the semantics of `TXG_WAIT`, while also preventing the
`tx` from being penalized via the dirty delay throttling.

With this change, `zil_lwb_write_issued` can acheive the semtantics that
it requires by passing in the value `TXG_WAIT | TXG_NOTHROTTLE` to
`dmu_tx_assign`.

Further, consumers of `dmu_tx_assign` wishing to achieve the old
`TXG_WAITED` semantics can pass in the value `TXG_NOWAIT | TXG_NOTHROTTLE`.

Authored by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

Porting Notes:
- Additionally updated `zfs_tmpfile` to use `TXG_NOTHROTTLE`

OpenZFS-issue: https://www.illumos.org/issues/8997
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/19ea6cb0f9
Closes #7084
2018-07-06 02:46:51 -07:00
Brian Behlendorf a2f759146d Linux compat 4.18: check_disk_size_change()
Added support for the bops->check_events() interface which was
added in the 2.6.38 kernel to replace bops->media_changed().
Fully implementing this functionality allows the volume resize
code to rely on revalidate_disk(), which is the preferred
mechanism, and removes the need to use check_disk_size_change().

In order for bops->check_events() to lookup the zvol_state_t
stored in the disk->private_data the zvol_state_lock needs to
be held.  Since the check events interface may poll the mutex
has been converted to a rwlock for better concurrently.  The
rwlock need only be taken as a writer in the zvol_free() path
when disk->private_data is set to NULL.

The configure checks for the block_device_operations structure
were consolidated in a single kernel-block-device-operations.m4
file.

The ZFS_AC_KERNEL_BDEV_BLOCK_DEVICE_OPERATIONS configure checks
and assoicated dead code was removed.  This interface was added
to the 2.6.28 kernel which predates the oldest supported 2.6.32
kernel and will therefore always be available.

Updated maximum Linux version in META file.  The 4.17 kernel
was released on 2018-06-03 and ZoL is compatible with the
finalized kernel.

Reviewed-by: Boris Protopopov <boris.protopopov@actifio.com>
Reviewed-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7611
2018-07-06 02:46:51 -07:00
Brian Behlendorf f79c0de208 Linux 4.18 compat: inode timespec -> timespec64
Commit torvalds/linux@95582b0 changes the inode i_atime, i_mtime,
and i_ctime members form timespec's to timespec64's to make them
2038 safe.  As part of this change the current_time() function was
also updated to return the timespec64 type.

Resolve this issue by introducing a new inode_timespec_t type which
is defined to match the timespec type used by the inode.  It should
be used when working with inode timestamps to ensure matching types.

The timestruc_t type under Illumos was used in a similar fashion but
was specified to always be a timespec_t.  Rather than incorrectly
define this type all timespec_t types have been replaced by the new
inode_timespec_t type.

Finally, the kernel and user space 'sys/time.h' headers were aligned
with each other.  They define as appropriate for the context several
constants as macros and include static inline implementation of
gethrestime(), gethrestime_sec(), and gethrtime().

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7643
Backported-by: Richard Yao <ryao@gentoo.org>
2018-07-06 02:46:51 -07:00
Boris Protopopov 1667816089 zv_suspend_lock in zvol_open()/zvol_release()
Acquire zv_suspend_lock on first open and last close only.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Closes #6342
2018-07-06 02:46:51 -07:00
Tony Hutter d1ed1be3cd Tag zfs-0.7.9
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2018-05-08 13:33:38 -07:00
Tony Hutter e749242a99 Remove DEBUG_STACKFLAGS to bypass compiler error
'Support -fsanitize=address with --enable-asan' (fed9035) removed
DEBUG_STACKFLAGS="-fstack-check" from zfs-build.m4 in master.
However, that's too heavyweight a patch to merge in to the 0.7.x branch,
so just take the one-liner we need to get around a compiler error
on Fedora 28:

$ ./configure --enable-debug --enable-debuginfo && make pkg-utils
  CC       gethrtime.lo
cc1: error: '-fstack-check=' and '-fstack-clash_protection' are mutually
exclusive.  Disabling '-fstack-check=' [-Werror]

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>

Requires-spl: #701
2018-05-07 17:19:58 -07:00
Tony Hutter 9267ef84fd Fedora 28: Add BuildRequires: libtirpc-devel
Add "BuildRequires: libtirpc-devel" to fix mock builds on Fedora 28.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7494
Closes #7495
2018-05-07 17:19:57 -07:00
Brian Behlendorf 0ee129199f RHEL 7.5 compat: FMODE_KABI_ITERATE
As of RHEL 7.5 the mainline fops.iterate() method was added to
the file_operations structure and is correctly detected by the
configure script.

Normally this is what we want, but in order to maintain KABI
compatibility the RHEL change additionally does the following:

* Requires that callers intending to use this extended interface
  set the FMODE_KABI_ITERATE flag on the file structure when
  opening the directory.
* Adds the fops.iterate() method to the end of the structure,
  without removing fops.readdir().

This change updates the configure check to ignore the RHEL 7.5+
variant of fops.iterate() when detected.  Instead fallback to
the fops.readdir() interface which will be available.

Finally, add the 'zpl_' prefix to the directory context wrappers
to avoid colliding with the kernel provided symbols when both
the fops.iterate() and fops.readdir() are provided by the kernel.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7460
Closes #7463
2018-05-07 17:19:57 -07:00
George Melikov 245be00597 Add back iostat -y or -w descriptions
The iostat -y and -w descriptions were left in cda0317e,
get them back.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #7479
Closes #7483
2018-05-07 17:19:57 -07:00
Antonio Russo c38d702330 Add test with two kinds of file creation orders
Data loss was identified in #7401 when many small files were copied.
This adds a reproducer for this bug and other similar ones: randomly
generate N files. Then, listing M of them by `ls -U` order, produce
those same files in a directory of the same name.

This triggers the bug consistently, provided N and M are large enough.
Here, N=2^16 and M=2^13.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #7411
2018-05-07 17:19:57 -07:00
Seth Forshee 3f729907c8 Allow mounting datasets more than once
Currently mounting an already mounted zfs dataset results in an
error, whereas it is typically allowed with other filesystems.
This causes some bad interactions with mount namespaces. Take
this sequence for example:

- Create a dataset
- Create a snapshot of the dataset
- Create a clone of the snapshot
- Create a new mount namespace
- Rename the original dataset

The rename results in unmounting and remounting the clone in the
original mount namespace, however the remount fails because the
dataset is still mounted in the new mount namespace. (Note that
this means the mount in the new mount namespace is never being
unmounted, so perhaps the unmount/remount of the clone isn't
actually necessary.)

The problem here is a result of the way mounting is implemented
in the kernel module. Since it is not mounting block devices it
uses mount_nodev() instead of the usual mount_bdev(). However,
mount_nodev() is written for filesystems for which each mount is
a new instance (i.e. a new super block), and zfs should be able
to detect when a mount request can be satisfied using an existing
super block.

Change zpl_mount() to call sget() directly with it's own test
callback. Passing the objset_t object as the fs data allows
checking if a superblock already exists for the dataset, and in
that case we just need to return a new reference for the sb's
root dentry.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Closes #5796
Closes #7207
2018-05-07 17:19:57 -07:00
beren12 cca220d7c6 Fix zfs_arc_max minimum tuning
When setting `zfs_arc_max` its minimum value is allowed
to be 64 MiB.  There was an off-by-1 error which can matter
on tiny systems.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Zubrzycki <github@mid-earth.net>
Closes #7417
2018-05-07 17:19:57 -07:00
Brian Behlendorf 4ed30958ce Linux compat 4.16: blk_queue_flag_{set,clear}
The HAVE_BLK_QUEUE_WRITE_CACHE_GPL_ONLY case was overlooked in
the original 10f88c5c commit because blk_queue_write_cache()
was available for the in-kernel builds.

Update the blk_queue_flag_{set,clear} wrappers to call the locked
versions to avoid confusion.  This is safe for all existing callers.

The blk_queue_set_write_cache() function has been updated to use
these wrappers.  This means setting/clearing both QUEUE_FLAG_WC
and QUEUE_FLAG_FUA is no longer atomic but this only done early
in zvol_alloc() prior to any requests so there is no issue.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Kash Pande <kash@tripleback.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7428
Closes #7431
2018-05-07 17:19:57 -07:00
Giuseppe Di Natale 2f118072cb Linux compat 4.16: blk_queue_flag_{set,clear}
queue_flag_{set,clear}_unlocked are now private interfaces in
the Linux kernel (https://github.com/torvalds/linux/commit/8a0ac14).
Use blk_queue_flag_{set,clear} interfaces which were introduced as
of https://github.com/torvalds/linux/commit/8814ce8.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #7410
2018-05-07 17:19:57 -07:00
Brian Behlendorf 7440f10ec1 Fix 'zfs send/recv' hang with 16M blocks
When using 16MB blocks the send/recv queue's aren't quite big
enough.  This change leaves the default 16M queue size which a
good value for most pools.  But it additionally ensures that the
queue sizes are at least twice the allowed zfs_max_recordsize.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7365
Closes #7404
2018-05-07 17:19:57 -07:00
Giuseppe Di Natale 8bb800d6b4 Clean up (k)shlib and cfg file shebangs
Most kshlib files are imported by other scripts
and do not have a shebang at the top of their files.
Make all kshlib follow this convention.

Remove shebangs from cfg files as well.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Close #7406
2018-05-07 17:19:57 -07:00
Tony Hutter bbf61c118f Fix "file is executable, but no shebang" warnings
Fedora 28's RPM build checks warn when executable files don't have a
shebang line.  These warnings are caused when we (incorrectly)
include data & config files in the_SCRIPTS automake lines. Files in
_SCRIPTS are marked executable by automake. This patch fixes the
issue by including non-executable scripts in a _DATA line instead.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7359
Closes #7395
2018-05-07 17:19:57 -07:00
Tony Hutter d296b09456 Exclude python scripts from RPM shebang check
The newest Fedora packaging rules print warnings for scripts using the
/usr/bin/python shebang:

    *** WARNING: mangling shebang in /usr/bin/arc_summary.py from
    #!/usr/bin/python to #!/usr/bin/python2. This will become an ERROR,
    fix it manually!

Fedora wants all cross compatible scripts to pick python3.  Since we
don't want our users to have to pick a specific version of python, we
exclude our scripts from the RPM build check.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7360
Closes #7399
2018-05-07 17:19:57 -07:00
Olaf Faaland 5ac017fc04 Update mmp_delay on sync or skipped, failed write
When an MMP write is skipped, or fails, and time since
mts->mmp_last_write is already greater than mts->mmp_delay, increase
mts->mmp_delay.  The original code only updated mts->mmp_delay when a
write succeeded, but this results in the write(s) after delays and
failed write(s) reporting an ub_mmp_delay which is too low.

Update mmp_last_write and mmp_delay if a txg sync was successful.  At
least one uberblock was written, thus extending the time we can be sure
the pool will not be imported by another host.

Do not allow mmp_delay to go below (MSEC2NSEC(zfs_multihost_interval) /
vdev_count_leaves()) so that a period of frequent successful MMP writes,
e.g. due to frequent txg syncs, does not result in an import activity
check so short it is not reliable based on mmp thread writes alone.

Remove unnecessary local variable, start.  We do not use the start time
of the loop iteration.

Add a debug message in spa_activity_check() to allow verification of the
import_delay value and to prove the activity check occurred.

Alter the tests that import pools and attempt to detect an activity
check.  Calculate the expected duration of spa_activity_check() based on
module parameters at the time the import is performed, rather than a
fixed time set in mmp.cfg.  The fixed time may be wrong.  Also, use the
default zfs_multihost_interval value so the activity check is longer and
easier to recognize.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7330
2018-05-07 17:19:57 -07:00
Tony Hutter f5ecab3aef Fedora 28: Fix misc bounds check compiler warnings
Fix a bunch of (mostly) sprintf/snprintf truncation compiler
warnings that show up on Fedora 28 (GCC 8.0.1).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7361
Closes #7368
2018-05-07 17:19:57 -07:00
LOLi fd01167ffd Fix hung z_zvol tasks during 'zfs receive'
During a receive operation zvol_create_minors_impl() can wait
needlessly for the prefetch thread because both share the same tasks
queue.  This results in hung tasks:

<3>INFO: task z_zvol:5541 blocked for more than 120 seconds.
<3>      Tainted: P           O  3.16.0-4-amd64
<3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

The first z_zvol:5541 (zvol_task_cb) is waiting for the long running
traverse_prefetch_thread:260

root@linux:~# cat /proc/spl/taskq
taskq                       act  nthr  spwn  maxt   pri  mina
spl_system_taskq/0            1     2     0    64   100     1
	active: [260]traverse_prefetch_thread [zfs](0xffff88003347ae40)
	wait: 5541
spl_delay_taskq/0             0     1     0     4   100     1
	delay: spa_deadman [zfs](0xffff880039924000)
z_zvol/1                      1     1     0     1   120     1
	active: [5541]zvol_task_cb [zfs](0xffff88001fde6400)
	pend: zvol_task_cb [zfs](0xffff88001fde6800)

This change adds a dedicated, per-pool, prefetch taskq to prevent the
traverse code from monopolizing the global (and limited) system_taskq by
inappropriately scheduling long running tasks on it.

Reviewed-by: Albert Lee <trisk@forkgnu.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6330
Closes #6890
Closes #7343
2018-05-07 17:19:57 -07:00
Don Brady 3b118f0a34 Add support for nvme based devids
Adds a devid for nvme devices. This is very similar to how the
other 'bus' (scsi|sata|usb) devids are generated. The devid
resides in a name/value pair in the leaf vdevs in a zpool config.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes #7356
2018-05-07 17:19:57 -07:00
Tony Hutter ebe443c8ff chmod -x on etc/init.d/zfs-*.in automake files
Clear executable bit on zfs-import.in, zfs-mount.in,
zfs-share.in, and zfs-zed.in.  These are automake files and
should not be marked executable.  This fixes a RPM build error
on Fedora 28.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7355
Closes #7327
2018-05-07 17:19:57 -07:00
Brian Behlendorf 63f3396233 Fix mmap / libaio deadlock
Calling uiomove() in mappedread() under the page lock can result
in a deadlock if the user space page needs to be faulted in.

Resolve the issue by dropping the page lock before the uiomove().
The inode range lock protects against concurrent updates via
zfs_read() and zfs_write().

Reviewed-by: Albert Lee <trisk@forkgnu.org>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7335
Closes #7339
2018-05-07 17:19:57 -07:00
DeHackEd 2deb4526ee Remove libattr requirement
RHEL/CentOS 6 supports sys/xattr.h eliminating the need for
libattr-devel as a dependency.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #7344
Closes #7351
2018-05-07 17:19:57 -07:00
Tony Hutter a1662ffcaa Fedora 28: Fix "Macro %_dracutdir has empty body"
If you run ./configure --with-config=srpm, it will not trigger
the user m4 scripts to populate the dracut and udev directories.
This causes a build error on Fedora 28.  Make the dracut and
udev lines conditional to get around this.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7326
Closes #7328
2018-05-07 17:19:57 -07:00
kpande ea921bf6a6 modprobe zfs during dracut mount
Resolves importing root pool during boot in dracut.  This case was
inadvertently broken with the module autoloading change in #7287.

Reviewed-by: Matthew Thode <prometheanfire@gentoo.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Kash Pande <kash@tripleback.net>
Closes #7322
2018-05-07 17:19:57 -07:00
timor 6e627cc468 Add support for nvme disk detection
This treats /dev/nvme.. devices the same way as /dev/sd... devices.  The
motivation behind this is that whole disk detection did not work on nvme
SSDs without that, because it DKC_UNKNOWN was returned for such devices.

Perhaps there should be a separate DKC_ type for this, but I don't know
enough about the code to know the implications of that.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: timor <timor.dd@googlemail.com>
Closes #7304
2018-05-07 17:19:56 -07:00
Olaf Faaland 3eb3a13628 Report pool suspended due to MMP
When the pool is suspended, record whether it was due to an I/O error or
due to MMP writes failing to succeed within the required time.

Change spa_suspended from uint8_t to zio_suspend_reason_t to store the
reason.

When userspace queries pool status via spa_tryimport(), report the
reason the pool was suspended in a new key,
ZPOOL_CONFIG_SUSPENDED_REASON.

In libzfs, when interpreting the returned config nvlist, report
suspension due to MMP with a new pool status enum value,
ZPOOL_STATUS_IO_FAILURE_MMP.

In status_callback(), which generates and emits the message when 'zpool
status' is executed, add a case to print an appropriate message for the
new pool status enum value.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7296
2018-05-07 17:19:56 -07:00
Tim Chase c234706270 Add zfs_scan_ignore_errors tunable
When it's set, a DTL range will be cleared even if its scan/scrub had
errors.  This allows to work around resilver/scrub upon import when the
pool has errors.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #7293
2018-05-07 17:19:56 -07:00
Tony Hutter 6059ba27c4 Allow to limit zed's syslog chattiness
Some usage patterns like send/recv of replication streams can
produce a large number of events. In such a case, the current
all-syslog.sh zedlet will hold up to its name, and flood the
logs with mostly redundant information. Two mitigate this
situation, this changeset introduces to new variables
ZED_SYSLOG_SUBCLASS_INCLUDE and ZED_SYSLOG_SUBCLASS_EXCLUDE
to zed.rc that give more control over which event classes end
up in the syslog.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Daniel Kobras <d.kobras@science-computing.de>
Closes #6886
Closes #7260
2018-05-07 17:19:56 -07:00
Olaf Faaland 927f40d089 Record skipped MMP writes in multihost_history
Once per pass through the MMP thread's loop, the vdev tree is walked to
find a suitable leaf to write the next MMP block to.  If no such leaf is
found, the thread sleeps for a while and resumes at the top of the loop.

Add an entry to multihost_history when no leaf can be found, and record
the reason in the error column.  The error code for such entries is a
bitfield, displayed in hex:

0x1  At least one vdev (interior or leaf) was not writeable.
0x2  At least one writeable leaf vdev was found, but it had a pending
MMP write.

timestamp = the time in seconds since the epoch when no leaf could be
found originally.

duration = the time (in ns) during which no MMP block was written for
this reason.  This does not include the preceeding inter-write period
nor the following inter-write period.

vdev_guid = the number of sequential cycles of the MMP thread looop when
this occurred.

Sample output, truncated to fit:

For records of skipped MMP writes the right-most column, vdev_path, is
reported as "-".

id   txg  timestamp   error  duration    mmp_delay  vdev_guid     ...
936  11   1520036441  0      146264      891422313  1740883117838 ...
937  11   1520036441  0      163956      888356657  7320395061548 ...
938  11   1520036442  0      130690      885314969  7320395061548 ...
939  11   1520036442  0      2001068577  882296582  1740883117838 ...
940  11   1520036443  0      161806      882296582  7320395061548 ...
941  11   1520036443  0x2    0           998020546  1             ...
942  11   1520036444  0      136585      998020546  7320395061548 ...
943  11   1520036444  0x2    0           998020257  1             ...
944  11   1520036445  5      2002662964  994160219  1740883117838 ...
945  11   1520036445  0x2    998073118   994160219  3             ...
946  11   1520036447  0      247136      994160219  7320395061548 ...

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7212
2018-05-07 17:19:56 -07:00
Giuseppe Di Natale 6356d50e67 Introduce a destroy_dataset helper
Datasets can be busy when calling zfs destroy. Introduce
a helper function to destroy datasets and use it to destroy
datasets in zfs_allow_004_pos, zfs_promote_008_pos, and
zfs_destroy_002_pos.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #7224
Closes #7246
Closes #7249
Closes #7267
2018-05-07 17:19:56 -07:00
Tony Hutter bd69ae3b53 Tag zfs-0.7.8
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2018-04-09 14:31:57 -07:00
Tony Hutter 9a2e90c9fc Revert "Handle zap_add() failures in mixed ... "
This reverts commit cc63068e95.

Under certain circumstances this change can result in an ENOSPC
error when adding new files to a directory.  See #7401 for full
details.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Issue #7401
Closes #7416
2018-04-09 17:29:59 -04:00
Tony Hutter 240ccfc13a Tag zfs-0.7.7
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2018-03-14 16:16:43 -07:00
Brian Behlendorf c30e716c81 Fix MMP write frequency for large pools
When a single pool contains more vdevs than the CONFIG_HZ for
for the kernel the mmp thread will not delay properly.  Switch
to using cv_timedwait_sig_hires() to handle higher resolution
delays.

This issue was reported on Arch Linux where HZ defaults to only
100 and this could be fairly easily reproduced with a reasonably
large pool.  Most distribution kernels set CONFIG_HZ=250 or
CONFIG_HZ=1000 and thus are unlikely to be impacted.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7205
Closes #7289
2018-03-14 16:10:38 -07:00
Olaf Faaland 267fd7b0f1 Handle zio_resume and mmp => off
When multihost is disabled on a pool, and the pool is resumed via zpool
clear, within a single cycle of the mmp thread's loop (e.g.  while it's
in the cv_timedwait call), both mmp_last_write and mmp_delay should be
updated.

The original code mistakenly treated the two cases as if they could not
occur at the same time.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7286
2018-03-14 16:10:38 -07:00
LOLi dc0176eeec Fix zfs-kmod builds when using rpm >= 4.14
With rpm-software-management/rpm@5e94633 a package version containing
invalid characters (most commonly a double '-') causes the kmod package
generation to terminate with an error.  This change takes advantage of
the newly introduced rpm macro "_wrong_version_format_terminate_build"
to allow kmod packages to be built.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:  loli10K <ezomori.nozomu@gmail.com>
Closes #7284
2018-03-14 16:10:38 -07:00
Paul Zuchowski 0a0af41bd9 zdb and inuse tests don't pass with real disks
Due to zpool create auto-partioning in Linux (i.e. sdb1),
certain utilities need to use the parition (sdb1) while
others use the whole disk name (sdb).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #6939
Closes #7261
2018-03-14 16:10:38 -07:00
Wolfgang Bumiller 3808006edf Take user namespaces into account in policy checks
Change file related checks to use user namespaces and make
sure involved uids/gids are mappable in the current
namespace.

Note that checks without file ownership information will
still not take user namespaces into account, as some of
these should be handled via 'zfs allow' (otherwise root in a
user namespace could issue commands such as `zpool export`).

This also adds an initial user namespace regression test
for the setgid bit loss, with a user_ns_exec helper usable
in further tests.

Additionally, configure checks for the required user
namespace related features are added for:
  * ns_capable
  * kuid/kgid_has_mapping()
  * user_ns in cred_t

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Closes #6800
Closes #7270
2018-03-14 16:10:38 -07:00
Olaf Faaland c17922b8a9 Detect long config lock acquisition in mmp
If something holds the config lock as a writer for too long, MMP will
fail to issue MMP writes in a timely manner.  This will result either in
the pool being suspended, or in an extreme case, in the pool not being
protected.

If the time to acquire the config lock exceeds 1/10 of the minimum
zfs_multihost_interval, report it in the zfs debug log.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7212
2018-03-14 16:10:38 -07:00
Giuseppe Di Natale 8d7f17798d Linux 4.16 compat: get_disk_and_module()
As of https://github.com/torvalds/linux/commit/fb6d47a, get_disk()
is now get_disk_and_module(). Add a configure check to determine
if we need to use get_disk_and_module().

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #7264
2018-03-14 16:10:38 -07:00
Tony Hutter 6dc40e2ada Change checksum & IO delay ratelimit values
Change checksum & IO delay ratelimit thresholds from 5/sec to 20/sec.
This allows zed to actually trigger if a bunch of these events arrive in
a short period of time (zed has a threshold of 10 events in 10 sec).
Previously, if you had, say, 100 checksum errors in 1 sec, it would get
ratelimited to 5/sec which wouldn't trigger zed to fault the drive.

Also, convert the checksum and IO delay thresholds to module params for
easy testing.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7252
2018-03-14 16:10:38 -07:00
chrisrd 792f88131c Increment zil_itx_needcopy_bytes properly
In zil_lwb_commit() with TX_WRITE, we copy the log write record (lrw)
into the log write block (lwb) and send it off using zil_lwb_add_txg().
If we also have WR_NEED_COPY, we additionally copy the lwr's data into
the lwb to be sent off.  If the lwr + data doesn't fit into the lwb, we
send the lrw and as much data as will fit (dnow bytes), then go back
and do the same with the remaining data.

Each time through this loop we're sending dnow data bytes. I.e.
zil_itx_needcopy_bytes should be incremented by dnow.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #6988
Closes #7176
2018-03-14 16:10:38 -07:00
John Eismeier 33bb1e8256 Fix some typos
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: John Eismeier <john.eismeier@gmail.com>
Closes #7237
2018-03-14 16:10:38 -07:00
Tomohiro Kusumi bcaba38e42 Fix zpool(8) list example to match actual format
a05dfd00 (Illumos 5147) has swapped FRAG and EXPANDSZ,
so it's natural to modify these examples.

 # zpool list | head -1
 NAME     SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
                              ^^^^^^^^^^^^^^^

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes #7244
2018-03-14 16:10:38 -07:00
Tony Hutter 5e3085e360 Add SMART self-test results to zpool status -c
Add in SMART self-test results to zpool status|iostat -c.  This
works for both SAS and SATA drives.

Also, add plumbing to allow the 'smart' script to take smartctl
output from a directory of output text files instead of running
it against the vdevs.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #7178
2018-03-14 16:10:37 -07:00
Tony Hutter 99920d823e Add scrub after resilver zed script
* Add a zed script to kick off a scrub after a resilver.  The script is
disabled by default.

* Add a optional $PATH (-P) option to zed to allow it to use a custom
$PATH for its zedlets.  This is needed when you're running zed under
the ZTS in a local workspace.

* Update test scripts to not copy in all-debug.sh and all-syslog.sh by
default.  They can be optionally copied in as part of zed_setup().
These scripts slow down zed considerably under heavy events loads and
can cause events to be dropped or their delivery delayed. This was
causing some sporadic failures in the 'fault' tests.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #4662
Closes #7086
2018-03-14 16:10:37 -07:00
chrisrd 338523dd6e Fix free memory calculation on v3.14+
Provide infrastructure to auto-configure to enum and API changes in the
global page stats used for our free memory calculations.

arc_free_memory has been broken since an API change in Linux v3.14:

2016-07-28 v4.8 599d0c95 mm, vmscan: move LRU lists to node
2016-07-28 v4.8 75ef7184 mm, vmstat: add infrastructure for per-node
  vmstats

These commits moved some of global_page_state() into
global_node_page_state(). The API change was particularly egregious as,
instead of breaking the old code, it silently did the wrong thing and we
continued using global_page_state() where we should have been using
global_node_page_state(), thus indexing into the wrong array via
NR_SLAB_RECLAIMABLE et al.

There have been further API changes along the way:

2017-07-06 v4.13 385386cf mm: vmstat: move slab statistics from zone to
  node counters
2017-09-06 v4.14 c41f012a mm: rename global_page_state to
  global_zone_page_state

...and various (incomplete, as it turns out) attempts to accomodate
these changes in ZoL:

2017-08-24 2209e409 Linux 4.8+ compatibility fix for vm stats
2017-09-16 787acae0 Linux 3.14 compat: IO acct, global_page_state, etc
2017-09-19 661907e6 Linux 4.14 compat: IO acct, global_page_state, etc

The config infrastructure provided here resolves these issues going back
to the original API change in v3.14 and is robust against further Linux
changes in this area.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #7170
2018-03-14 16:10:37 -07:00
Olaf Faaland 2644784f49 Report duration and error in mmp_history entries
After an MMP write completes, update the relevant mmp_history entry
with the time between submission and completion, and the error
status of the write.

[faaland1@toss3a zfs]$ cat /proc/spl/kstat/zfs/pool/multihost
39 0 0x01 100 8800 69147946270893 72723903122926
id       txg     timestamp  error  duration   mmp_delay    vdev_guid
10607    1166    1518985089 0      138301     637785455    4882...
10608    1166    1518985089 0      136154     635407747    1151...
10609    1166    1518985089 0      803618560  633048078    9740...
10610    1166    1518985090 0      144826     633048078    4882...
10611    1166    1518985090 0      164527     666187671    1151...

Where duration = gethrtime_in_done_fn - gethrtime_at_submission, and
error = zio->io_error.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7190
2018-03-14 16:10:37 -07:00
Olaf Faaland b1f61f05b4 Do not initiate MMP writes while pool is suspended
While the pool is suspended on host A, it may be imported on host B.
If host A continued to write MMP blocks, it would be blindly
overwriting MMP blocks written by host B, and the blocks written by
host A would have outdated txg information.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7182
2018-03-14 16:10:37 -07:00
Tony Hutter e5ba614d05 Linux 4.16 compat: use correct *_dec_and_test()
Use refcount_dec_and_test() on 4.16+ kernels, atomic_dec_and_test()
on older kernels.  https://lwn.net/Articles/714974/

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes: #7179
Closes: #7211
2018-03-14 16:10:37 -07:00
Matthew Thode 30ac8de48a Allow modprobe to fail when called within systemd
This allows for systems with zfs built into the kernel manually to run
these services.  Otherwise the service will fail to start.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Kash Pande <kash@tripleback.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Thode <mthode@mthode.org>
Closes #7174
2018-03-14 16:10:37 -07:00
bunder2015 c705d8386b Add SMART attributes for SSD and NVMe
This adds the SMART attributes required to probe Samsung SSD and NVMe
(and possibly others) disks when using the "zpool status -c" command.

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #7183
Closes #7193
2018-03-14 16:10:37 -07:00
Giuseppe Di Natale d5b10b3ef3 Correct count_uberblocks in mmp.kshlib
A log_must call was causing count_uberblocks to return more
than just the uberblock count. Remove the log_must since it
was only logging a sleep.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #7191
2018-03-14 16:10:37 -07:00
chrisrd 5a84c60fb9 Fix config issues: frame size and headers
1. With various (debug and/or tracing?) kernel options enabled it's
possible for 'struct inode' and 'struct super_block' to exceed the
default frame size, leaving errors like this in config.log:

build/conftest.c:116:1: error: the frame size of 1048 bytes is larger
than 1024 bytes [-Werror=frame-larger-than=]

Fix this by removing the frame size warning for config checks

2. Without the correct headers included, it's possible for declarations
to be missed, leaving errors like this in the config.log:

build/conftest.c:131:14: error: ‘struct nameidata’ declared inside
parameter list [-Werror]

Fix this by adding appropriate headers.

Note: Both these issues can result in silent config failures because
the compile failure is taken to mean "this option is not supported by
this kernel" rather than "there's something wrong with the config
test". This can lead to something merely annoying (compile failures) to
something potentially serious (miscompiled or misused kernel primitives
or functions). E.g. the fixes included here resulted in these
additional defines in zfs_config.h with linux v4.14.19:

Also, drive-by whitespace fixes in config/* files which don't mention
"GNU" (those ones look to be imported from elsewhere so leave them
alone).

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #7169
2018-03-14 16:10:37 -07:00
Olaf Faaland 26941ce90b Clarify zinject(8) explanation of -e
Error injection of EIO or ENXIO simply sets the zio's io_error value,
rather than preventing the read or write from occurring.  This is
important information as it affects how the probes must be used.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #7172
2018-03-14 16:10:37 -07:00
George Wilson 07ce5d7390 OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio
PROBLEM
=======
It's possible for a parent zio to complete even though it has children
which have not completed. This can result in the following panic:
    > $C
    ffffff01809128c0 vpanic()
    ffffff01809128e0 mutex_panic+0x58(fffffffffb94c904, ffffff597dde7f80)
    ffffff0180912950 mutex_vector_enter+0x347(ffffff597dde7f80)
    ffffff01809129b0 zio_remove_child+0x50(ffffff597dde7c58, ffffff32bd901ac0,
    ffffff3373370908)
    ffffff0180912a40 zio_done+0x390(ffffff32bd901ac0)
    ffffff0180912a70 zio_execute+0x78(ffffff32bd901ac0)
    ffffff0180912b30 taskq_thread+0x2d0(ffffff33bae44140)
    ffffff0180912b40 thread_start+8()
    > ::status
    debugging crash dump vmcore.2 (64-bit) from batfs0390
    operating system: 5.11 joyent_20170911T171900Z (i86pc)
    image uuid: (not set)
    panic message: mutex_enter: bad mutex, lp=ffffff597dde7f80
    owner=ffffff3c59b39480 thread=ffffff0180912c40
    dump content: kernel pages only
The problem is that dbuf_prefetch along with l2arc can create a zio tree
which confuses the parent zio and allows it to complete with while children
still exist. Here's the scenario:
    zio tree:
        pio
         |--- lio
The parent zio, pio, has entered the zio_done stage and begins to check its
children to see there are still some that have not completed. In zio_done(),
the children are checked in the following order:
    zio_wait_for_children(zio, ZIO_CHILD_VDEV, ZIO_WAIT_DONE)
    zio_wait_for_children(zio, ZIO_CHILD_GANG, ZIO_WAIT_DONE)
    zio_wait_for_children(zio, ZIO_CHILD_DDT, ZIO_WAIT_DONE)
    zio_wait_for_children(zio, ZIO_CHILD_LOGICAL, ZIO_WAIT_DONE)
If pio, finds any child which has not completed then it stops executing and
goes to sleep. Each call to zio_wait_for_children() will grab the io_lock
while checking the particular child.
In this scenario, the pio has completed the first call to
zio_wait_for_children() to check for any ZIO_CHILD_VDEV children. Since
the only zio in the zio tree right now is the logical zio, lio, then it
completes that call and prepares to check the next child type.
In the meantime, the lio completes and in its callback creates a child vdev
zio, cio. The zio tree looks like this:
    zio tree:
        pio
         |--- lio
         |--- cio
The lio then grabs the parent's io_lock and removes itself.
    zio tree:
        pio
         |--- cio
The pio continues to run but has already completed its check for ZIO_CHILD_VDEV
and will erroneously complete. When the child zio, cio, completes it will panic
the system trying to reference the parent zio which has been destroyed.
SOLUTION
========
The fix is to rework the zio_wait_for_children() logic to accept a bitfield
for all the children types that it's interested in checking. The
io_lock will is held the entire time we check all the children types. Since
the function now accepts a bitfield, a simple ZIO_CHILD_BIT() macro is provided
to allow for the conversion between a ZIO_CHILD type and the bitfield used by
the zio_wiat_for_children logic.

Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Youzhong Yang <youzhong@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@omniti.com>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/8857
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/862ff6d99c
Issue #5918
Closes #7168
2018-03-14 16:10:37 -07:00
LOLi 1d805a534b 'zfs receive' fails with "dataset is busy"
Receiving an incremental stream after an interrupted "zfs receive -s"
fails with the message "dataset is busy": this is because we still have
the hidden clone ../%recv from the resumable receive.

Improve the error message suggesting the existence of a partially
complete resumable stream from "zfs receive -s" which can be either
aborted ("zfs receive -A") or resumed ("zfs send -t").

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7129
Closes #7154
2018-03-14 16:10:37 -07:00
LOLi a9ff89e05c contrib/initramfs: add missing conf.d/zfs
When upgrading from the distribution-provided zfs-initramfs package on
root-on-zfs Ubuntu and Debian the system may fail to boot: this change
adds the missing initramfs configuration file.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7158
2018-03-14 16:10:37 -07:00
sanjeevbagewadi d85011ed69 mmp should use a fixed tag for spa_config locks
mmp_write_uberblock() and mmp_write_done() should the same tag
for spa_config_locks.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Closes #6530
Closes #7155
2018-03-14 16:10:37 -07:00
sanjeevbagewadi b3da003ebf Handle zap_add() failures in mixed case mode
With "casesensitivity=mixed", zap_add() could fail when the number of
files/directories with the same name (varying in case) exceed the
capacity of the leaf node of a Fatzap. This results in a ASSERT()
failure as zfs_link_create() does not expect zap_add() to fail. The fix
is to handle these failures and rollback the transactions.

Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Closes #7011
Closes #7054
2018-03-14 16:10:37 -07:00
Chunwei Chen 478754a8f5 Fix zdb -ed on objset for exported pool
zdb -ed on objset for exported pool would failed with:
  failed to own dataset 'qq/fs0': No such file or directory

The reason is that zdb pass objset name to spa_import, it uses that
name to create a spa. Later, when dmu_objset_own tries to lookup the spa
using real pool name, it can't find one.

We fix this by make sure we pass pool name rather than objset name to
spa_import.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7099
Closes #6464
2018-03-14 16:10:37 -07:00
Chunwei Chen 31ff122aa2 Fix zdb -E segfault
SPA_MAXBLOCKSIZE is too large for stack.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7099
2018-03-14 16:10:36 -07:00
Chunwei Chen 18c662b845 Fix zdb -R decompression
There are some issues in the zdb -R decompression implementation.

The first is that ZLE can easily decompress non-ZLE streams. So we add
ZDB_NO_ZLE env to make zdb skip ZLE.

The second is the random bytes appended to pabd, pbuf2 stuff. This serve
no purpose at all, those bytes shouldn't be read during decompression
anyway. Instead, we randomize lbuf2, so that we can make sure
decompression fill exactly to lsize by bcmp lbuf and lbuf2.

The last one is the condition to detect fail is wrong.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7099
Closes #4984
2018-03-14 16:10:36 -07:00
Chunwei Chen c797f0898e Fix racy assignment of zcb.zcb_haderrors
zcb_haderrors will be modified in zdb_blkptr_done, which is
asynchronous. So we must move this assignment after zio_wait.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7099
2018-03-14 16:10:36 -07:00
Chunwei Chen 5e566c5772 Fix zle_decompress out of bound access
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7099
2018-03-14 16:10:36 -07:00
Chunwei Chen 23227313a2 Fix zdb -c traverse stop on damaged objset root
If a corruption happens to be on a root block of an objset, zdb -c will
not correctly report the error, and it will not traverse the datasets
that come after. This is because traverse_visitbp, which does the
callback and reset error for TRAVERSE_HARD, is skipped when traversing
zil is failed in traverse_impl.

Here's example of what 'zdb -eLcc' command looks like on a pool with
damaged objset root:

== before patch:

Traversing all blocks to verify checksums ...

Error counts:

	errno  count
block traversal size 379392 != alloc 33987072 (unreachable 33607680)

	bp count:             172
	ganged count:           0
	bp logical:       1678336      avg:   9757
	bp physical:       130560      avg:    759     compression:  12.85
	bp allocated:      379392      avg:   2205     compression:   4.42
	bp deduped:             0    ref>1:      0   deduplication:   1.00
	SPA allocated:   33987072     used:  0.80%

	additional, non-pointer bps of type 0:         71
	Dittoed blocks on same vdev: 101

== after patch:

Traversing all blocks to verify checksums ...

zdb_blkptr_cb: Got error 52 reading <54, 0, -1, 0>  -- skipping

Error counts:

	errno  count
	   52  1
block traversal size 33963520 != alloc 33987072 (unreachable 23552)

	bp count:             447
	ganged count:           0
	bp logical:      36093440      avg:  80745
	bp physical:     33699840      avg:  75391     compression:   1.07
	bp allocated:    33963520      avg:  75981     compression:   1.06
	bp deduped:             0    ref>1:      0   deduplication:   1.00
	SPA allocated:   33987072     used:  0.80%

	additional, non-pointer bps of type 0:         76
	Dittoed blocks on same vdev: 115

==

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7099
2018-03-14 16:10:36 -07:00
Brian Behlendorf 3713b73335 Linux 4.11 compat: avoid refcount_t name conflict
Related to commit 4859fe796, when directly using the kernel's
refcount functions in kernel compatibility code do not map
refcount_t to zfs_refcount_t.  This leads to a type mismatch.

Longer term we should consider renaming refcount_t to
zfs_refcount_t in the zfs code base.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7148
2018-03-14 16:10:36 -07:00
Brian Behlendorf 310e63dfd1 Linux 4.16 compat: inode_set_iversion()
A new interface was added to manipulate the version field of an
inode.  Add a inode_set_iversion() wrapper for older kernels and
use the new interface when available.

The i_version field was dropped from the trace point due to the
switch to an atomic64_t i_version type.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7148
2018-03-14 16:10:36 -07:00
WHR a196b3bc3d OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable
Authored by: WHR <msl0000023508@gmail.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/8966
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c95549fcdc
Closes #7141
2018-03-14 16:10:36 -07:00
Richard Elling a58e1284d8 Remove deprecated zfs_arc_p_aggressive_disable
zfs_arc_p_aggressive_disable is no more. This PR removes docs
and module parameters for zfs_arc_p_aggressive_disable.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Richard Elling <Richard.Elling@RichardElling.com>
Closes #7135
2018-03-14 16:10:36 -07:00
Brian Behlendorf f1dde3fb20 Fix default libdir for Debian/Ubuntu
The distribution provided architecture specific RPM macro files
for x86_64 and other architectures on Debian/Ubuntu specify the
wrong default libdir install location.  When building deb packages
override _lib with the correct location.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7083
Closes #7101
2018-03-14 16:10:36 -07:00
wli5 5f38142e7b Bug fix in qat_compress.c for vmalloc addr check
Remove the unused vmalloc address check, and function mem_to_page
will handle the non-vmalloc address when map it to a physical
address.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Weigang Li <weigang.li@intel.com>
Closes #7125
2018-03-14 16:10:36 -07:00
LOLi 29b79dcfe9 Fix systemd_ RPM macros usage on Debian-based distributions
Debian-based distributions do not seem to provide RPM macros for
dealing with systemd pre- and post- (un)install actions: this results
in errors when installing or upgrading .deb packages because the
resulting control scripts contain the following unresolved macros:

 * %systemd_post
 * %systemd_preun
 * %systemd_postun

Fix this by providing default values for postinstall, preuninstall and
postuninstall scripts when these macros are not defined.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7074
Closes #7100
2018-03-14 16:10:36 -07:00
John L. Hammond ecc972c7f0 Emit an error message before MMP suspends pool
In mmp_thread(), emit an MMP specific error message before calling
zio_suspend() so that the administrator will understand why the pool
is being suspended.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Closes #7048
2018-03-14 16:10:36 -07:00
LOLi 6c891ade8b ZTS: Fix create-o_ashift test case
The function that fills the uberblock ring buffer on every device label
has been reworked to avoid occasional failures caused by a race
condition that prevents 'zpool sync' from writing some uberblock
sequentially: this happens when the pool sync ioctl dispatch code calls
txg_wait_synced() while we're already waiting for a TXG to sync.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6924
Closes #6977
2018-03-14 16:10:36 -07:00
LOLi 03658d5081 Fix --with-systemd on Debian-based distributions (#6963)
These changes propagate the "--with-systemd" configure option to the
RPM spec file, allowing Debian-based distributions to package
systemd-related files.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6591
Closes #6963
2018-03-14 16:10:36 -07:00
Brian Behlendorf 5d62588032 Remove vn_rename and vn_remove dependency
The only place vn_rename and vn_remove are used is when writing
out an updated pool configuration file.  By truncating the file
instead of renaming and removing it we can avoid having to implement
these interfaces entirely.  Functionally an empty cache file is
treated the same as a missing cache file.  This is particularly
advantageous because the Linux kernel has never provided a way
to reliably implement vn_rename and vn_remove.

The cachefile_004_pos.ksh test case was updated to understand
that an empty cache file is the same as a missing one.

The zfs-import-* systemd service files were not updated to use
ConditionFileNotEmpty in place of ConditionPathExists.  This
means that after exporting all pools and rebooting new pools
will not the scanned for on the next boot.  This small change
should not impact normal usage since pools are not exported
as part of a normal shutdown.

Documentation was updated accordingly.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Arkadiusz Bubała <arkadiusz.bubala@open-e.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/spl#648
Closes #6753
2018-03-14 16:10:36 -07:00
Brian Behlendorf 6897ea475f Fix "--enable-code-coverage" debug build
When --enable-code-coverage is provided it should not result
in NDEBUG being defined.  This is controlled by --enable-debug.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6674
2018-03-14 16:10:36 -07:00
Brian Behlendorf 3790bfa80f Update codecov.yml
Update the codecov.yml to make the following functional changes.

* Do not require the CI testing to pass before posting results.
* Set red-yellow-green coverage percent from 50%-100%
* Allow a 1% drop in coverage to still be considered a pass.
* Reduce the size of the comment posted to the issue.

Additionally, the top level README.markdown has been updated
to include the codecov.io badge and the project summary reworded.

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6669
2018-03-14 16:10:36 -07:00
Prakash Surya 6b278f3223 Add support for "--enable-code-coverage" option
This change adds support for a new option that can be passed to the
configure script: "--enable-code-coverage". Further, the "--enable-gcov"
option has been removed, as this new option provides the same
functionality (plus more).

When using this new option the following make targets are available:

 * check-code-coverage
 * code-coverage-capture
 * code-coverage-clean

Note: these make targets can only be run from the root of the project.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #6670
2018-03-14 16:10:36 -07:00
Prakash Surya f1236ebf35 Make "-fno-inline" compile option more accessible
When functions are inlined, it can make the system much more difficult
to instrument using tools such as ftrace, BPF, crash, etc. Thus, to aid
development and increase the system's observability, when the
"--enable-debuginfo" flag is specified, the "-fno-inline" compilation
option will be used for both userspace and kernel modules.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes #6605
2018-03-14 16:10:36 -07:00
Brian Behlendorf 184087f822 Add configure option to enable gcov analysis
* Add configure option to enable gcov analysis.
* Includes a few minor ctime fixes.
* Add codecov.yml configuration.

Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6642
2018-03-14 16:10:36 -07:00
Richard Yao 834815e9f7 Implement --enable-debuginfo to force debuginfo
Inspection of a Ubuntu 14.04 x64 system revealed that the config file
used to build the kernel image differs from the config file used to
build kernel modules by the presence of CONFIG_DEBUG_INFO=y:

This in itself is insufficient to show that the kernel is built with
debuginfo, but a cursory analysis of the debuginfo provided and the
size of the kernel strongly suggests that it was built with
CONFIG_DEBUG_INFO=y while the modules were not. Installing
linux-image-$(uname -r)-dbgsym had no obvious effect on the debuginfo
provided by either the modules or the kernel.

The consequence is that issue reports from distributions such as Ubuntu
and its derivatives build kernel modules without debuginfo contain
nonsensical backtraces. It is therefore desireable to force generation
of debuginfo, so we implement --enable-debuginfo. Since the build system
can build both userspace components and kernel modules, the generic
--enable-debuginfo option will force debuginfo for both. However, it
also supports --enable-debuginfo=kernel and --enable-debuginfo=user for
finer grained control.

Enabling debuginfo for the kernel modules works by injecting
CONFIG_DEBUG_INFO=y into the make environment. This is enables
generation of debuginfo by the kernel build systems on all Linux
kernels, but the build environment is slightly different int hat
CONFIG_DEBUG_INFO has not been in the CPP. Adding -DCONFIG_DEBUG_INFO
would fix that, but it would also cause build failures on kernels where
CONFIG_DEBUG_INFO=y is already set. That would complicate its use in
DKMS environments that support a range of kernels and is therefore
undesireable. We could write a compatibility shim to enable
CONFIG_DEBUG_INFO only when it is explicitly disabled, but we forgo
doing that because it is unnecessary. Nothing in ZoL or the kernel uses
CONFIG_DEBUG_INFO in the CPP at this time and that is unlikely to
change.

Enabling debuginfo for the userspace components is done by injecting -g
into CPPFLAGS. This is not necessary because the build system honors the
environment's CPPFLAGS by appending them to the actual CPPFLAGS used,
but it is supported for consistency.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@clusterhq.com>
Closes #2734
2018-03-14 16:10:35 -07:00
Richard Yao 0f1ff38476 Make --enable-debug fail when given bogus args
Currently, bogus options to --enable-debug become --disable-debug. That
means that passing --enable-debug=true is analogous to --disable-debug,
but the result is counterintuitive. We switch to AS_CASE to allow us to
fail when given a bogus option.

Also, we modify the text printed to clarify that --enable-debug enables
assertions.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@clusterhq.com>
Closes #2734
2018-03-14 16:10:35 -07:00
Tony Hutter e3b28e16ce Tag zfs-0.7.6
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2018-02-01 10:02:58 -08:00
LOLi 2f62fdd644 Fix 'zfs receive -o' when used with '-e|-d'
When used in conjunction with one of '-e' or '-d' zfs receive options
none of the properties requested to be set (-o) are actually applied:
this is caused by a wrong assumption made about the toplevel dataset
in zfs_receive_one().

Fix this by correctly detecting the toplevel dataset.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7088

Requires-spl: refs/pull/679/head
2018-01-30 10:27:32 -06:00
Brian Behlendorf 137b3e6cff Extend zloop.sh for automated testing
In order to debug issues encountered by ztest during automated
testing it's important that as much debugging information as
possible by dumped at the time of the failure.  The following
changes extend the zloop.sh script in order to make it easier
to integrate with buildbot.

* Add the `-m <maximum cores>` option to zloop.sh to place a
  limit of the number of core dumps generated.  By default, the
  existing behavior is maintained and no limit is set.

* Add the `-l` option to create a 'ztest.core.N' symlink in the
  current directory to the core directory. This functionality
  is provided primarily for buildbot which expects log files to
  have well known names.

* Rename 'ztest.ddt' to 'ztest.zdb' and extend it to dump
  additional basic information on failure for latter analysis.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6999

Conflicts:
	scripts/zloop.sh
2018-01-30 10:27:31 -06:00
Don Brady d1630dda58 Cleanup zloop working directory after each pass
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed by: John Kennedy <jwk404@gmail.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Issue #6595
Closes #6663
2018-01-30 10:27:31 -06:00
Alexander Motin 701ebd014a OpenZFS 8835 - Speculative prefetch in ZFS not working for misaligned reads
In case of misaligned I/O sequential requests are not detected as such
due to overlaps in logical block sequence:

    dmu_zfetch(fffff80198dd0ae0, 27347, 9, 1)
    dmu_zfetch(fffff80198dd0ae0, 27355, 9, 1)
    dmu_zfetch(fffff80198dd0ae0, 27363, 9, 1)
    dmu_zfetch(fffff80198dd0ae0, 27371, 9, 1)
    dmu_zfetch(fffff80198dd0ae0, 27379, 9, 1)
    dmu_zfetch(fffff80198dd0ae0, 27387, 9, 1)

This patch makes single block overlap to be counted as a stream hit,
improving performance up to several times.

Authored by: Alexander Motin <mav@FreeBSD.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed by: George Melikov <mail@gmelikov.ru>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/8835
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/aab6dd482a
Closes #7062
2018-01-30 10:27:31 -06:00
LOLi 5b8ec2cf39 Fix Debian packaging on ARMv7/ARM64
When building packages on Debian-based systems specify the target
architecture used by 'alien' to convert .rpm packages into .deb: this
avoids detecting an incorrect value which results in the following
errors:

<package>.aarch64.rpm is for architecture aarch64 ; the package cannot be built on this system
<package>.armv7l.rpm is for architecture armel ; the package cannot be built on this system

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #7046
Closes #7058
2018-01-30 10:27:31 -06:00
Brian Behlendorf 9d1a39cec6 Fix shellcheck v0.4.6 warnings
Resolve new warnings reported after upgrading to shellcheck
version 0.4.6.  This patch contains no functional changes.

* egrep is non-standard and deprecated. Use grep -E instead. [SC2196]
* Check exit code directly with e.g. 'if mycmd;', not indirectly
  with $?.  [SC2181]  Suppressed.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7040
2018-01-30 10:27:31 -06:00
DeHackEd 2a7b736dce Remove l2arc_nocompress from zfs-module-parameters(5)
Parameter was removed in d3c2ae1c08
(OpenZFS 6950 - ARC should cache compressed data)

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes #7043
2018-01-30 10:27:31 -06:00
Richard Yao ecc8af1812 Fix incompatibility with Reiser4 patched kernels
In ZFSOnLinux, our sources and build system are self contained such that
we do not need to make changes to the Linux kernel sources. Reiser4 on
the other hand exists solely as a kernel tree patch and opts to make
changes to the kernel rather than adapt to it. After Linux 4.1 made a
VFS change that replaced new_sync_read with do_sync_read, Reiser4's
maintainer decided to modify the kernel VFS to export the old function.
This caused our autotools check to misidentify the kernel API as
predating Linux 4.1 on kernels that have been patched with Reiser4
support, which breaks our build.

Reiser4 really should be patched to stop doing this, but lets modify our
check to be more strict to help the affected users of both filesystems.

Also, we were not checking the types of arguments and return value of
new_sync_read() and new_sync_write() . Lets fix that too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #6241
Closes #7021
2018-01-30 10:27:31 -06:00
Alex Zhuravlev 129e3e8dc3 Use zap_count instead of cached z_size for unlink
As a performance optimization Lustre does not strictly update
the SA_ZPL_SIZE when adding/removing from non-directory entries.
This results in entries which cannot be removed through the ZPL
layer even though the ZAP is empty and safe to remove.

Resolve this issue by checking the zap_count() directly instead
on relying on the cached SA_ZPL_SIZE.  Micro-benchmarks show no
significant performance impact due to the additional overhead
of using zap_count().

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7019
2018-01-30 10:27:31 -06:00
Nathaniel Wesley Filardo 9fb09f79e5 Revert raidz_map and _col structure types
As part of the refactoring of ab9f4b0b82,
several uint64_t-s and uint8_t-s were changed to other types.  This
caused ZoL github issue #6981, an overflow of a size_t on a 32-bit ARM
machine.  In absense of any strong motivation for the type changes, this
simply puts them back, modulo the changes accumulated for ABD.

Compile-tested on amd64 and run-tested on armhf.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu>
Closes #6981
Closes #7023
2018-01-30 10:27:31 -06:00
Nathaniel Wesley Filardo a2ee6568c6 zhack: fix getopt return type
This fixes zhack's command processing on ARM.  On ARM char
is unsigned, and so, in promotion to an int, it will never
compare equal to -1.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu>
Closes #7016
2018-01-30 10:27:31 -06:00
Brian Behlendorf 9c1a8eaa51 Fix ARC hit rate
When the compressed ARC feature was added in commit d3c2ae1
the method of reference counting in the ARC was modified.  As
part of this accounting change the arc_buf_add_ref() function
was removed entirely.

This would have be fine but the arc_buf_add_ref() function
served a second undocumented purpose of updating the ARC access
information when taking a hold on a dbuf.  Without this logic
in place a cached dbuf would not migrate its associated
arc_buf_hdr_t to the MFU list.  This would negatively impact
the ARC hit rate, particularly on systems with a small ARC.

This change reinstates the missing call to arc_access() from
dbuf_hold() by implementing a new arc_buf_access() function.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6171
Closes #6852
Closes #6989
2018-01-30 10:27:31 -06:00
LOLi a8fa31b50b Fix 'zpool add' handling of nested interior VDEVs
When replacing a faulted device which was previously handled by a spare
multiple levels of nested interior VDEVs will be present in the pool
configuration; the following example illustrates one of the possible
situations:

   NAME                          STATE     READ WRITE CKSUM
   testpool                      DEGRADED     0     0     0
     raidz1-0                    DEGRADED     0     0     0
       spare-0                   DEGRADED     0     0     0
         replacing-0             DEGRADED     0     0     0
           /var/tmp/fault-dev    UNAVAIL      0     0     0  cannot open
           /var/tmp/replace-dev  ONLINE       0     0     0
         /var/tmp/spare-dev1     ONLINE       0     0     0
       /var/tmp/safe-dev         ONLINE       0     0     0
   spares
     /var/tmp/spare-dev1         INUSE     currently in use

This is safe and allowed, but get_replication() needs to handle this
situation gracefully to let zpool add new devices to the pool.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6678
Closes #6996
2018-01-30 10:27:31 -06:00
lidongyang 8d82a19def Call commit callbacks from the tail of the list
Our zfs backed Lustre MDT had soft lockups while under heavy metadata
workloads while handling transaction callbacks from osd_zfs.

The problem is zfs is not taking advantage of the fast path in
Lustre's trans callback handling, where Lustre will skip the calls
to ptlrpc_commit_replies() when it already saw a higher transaction
number.

This patch corrects this, it also has a positive impact on metadata
performance on Lustre with osd_zfs, plus some cleanup in the headers.

A similar issue for ext4/ldiskfs is described on:
https://jira.hpdd.intel.com/browse/LU-6527

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
Closes #6986
2018-01-30 10:27:31 -06:00
Giuseppe Di Natale c2aacf2087 Handle broken pipes in arc_summary
Using a command similar to 'arc_summary.py | head' causes
a broken pipe exception. Gracefully exit in the case of a
broken pipe in arc_summary.py.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6965 
Closes #6969
2018-01-30 10:27:31 -06:00
LOLi 9a6c57845a Handle invalid options in arc_summary
If an invalid option is provided to arc_summary.py we handle any error
thrown from the getopt Python module and print the usage help message.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6983
2018-01-30 10:27:31 -06:00
Dominik Hassler d27a40d28f OpenZFS 8794 - cstyle generates warnings with recent perl
Authored by: Dominik Hassler <hadfl@omniosce.org>
Reviewed by: Andy Fiddaman <andy@omniosce.org>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/8794
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/578f67364c
Closes #6973
2018-01-30 10:27:31 -06:00
Brian Behlendorf aebc5df418 Update for cppcheck v1.80
Resolve new warnings and errors from cppcheck v1.80.

* [lib/libshare/libshare.c:543]: (warning)
  Possible null pointer dereference: protocol
* [lib/libzfs/libzfs_dataset.c:2323]: (warning)
  Possible null pointer dereference: srctype
* [lib/libzfs/libzfs_import.c:318]: (error)
  Uninitialized variable: link
* [module/zfs/abd.c:353]: (error) Uninitialized variable: sg
* [module/zfs/abd.c:353]: (error) Uninitialized variable: i
* [module/zfs/abd.c:385]: (error) Uninitialized variable: sg
* [module/zfs/abd.c:385]: (error) Uninitialized variable: i
* [module/zfs/abd.c:553]: (error) Uninitialized variable: i
* [module/zfs/abd.c:553]: (error) Uninitialized variable: sg
* [module/zfs/abd.c:763]: (error) Uninitialized variable: i
* [module/zfs/abd.c:763]: (error) Uninitialized variable: sg
* [module/zfs/abd.c:305]: (error) Uninitialized variable: tmp_page
* [module/zfs/zpl_xattr.c:342]: (warning)
   Possible null pointer dereference: value
* [module/zfs/zvol.c:208]: (error) Uninitialized variable: p

Convert the following suppression to inline.

* [module/zfs/zfs_vnops.c:840]: (error)
  Possible null pointer dereference: aiov

Exclude HAVE_UIO_ZEROCOPY and HAVE_DNLC from analysis since
these macro's will never be defined until this functionality
is implemented.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6879
2018-01-30 10:27:31 -06:00
Scot W. Stevenson 7a8bef3983 Fix data on evict_skips in arc_summary.py
Display correct data from kstat arcstats for evict_skips,
which is currently repeating the data from mutex_misses.
Fixes #6882

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com>
Closes #6882 
Closes #6883
2018-01-30 10:27:31 -06:00
Scot W. Stevenson d486dee89e Minor code cleanups in arc_python.py
Remove unused library re and associated variable kstat_pobj. Add note
to documentation at start of program about required support for old
versions of Python. Change variable "format" (which is a built-in
function) to "fmt".

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com>
Closes #6869
2018-01-30 10:27:31 -06:00
Scot W. Stevenson 7de8fb33a2 Fix arc_summary.py -d crash with Python3
Prevents arc_summary.py crashing when called with parameter -d or
long form --description with Python3.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com>
Closes #6849 
Closes #6850
2018-01-30 10:27:31 -06:00
Scot W. Stevenson 904c03672b Sort output of tunables in arc_summary.py
Sort list of tunables printed by _tunable_summary()
alphabetically

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com>
Closes #6828
2018-01-30 10:27:30 -06:00
Scot W. Stevenson 03955e3488 Add documentation strings to arc_summary.py
Include docstrings (PEP8, PEP257) for module and all functions.
Separately, remove outdated section in comment at start of
module. Separately, remove unused global constant "usetunable".

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com>
Closes #6818
2018-01-30 10:27:30 -06:00
Scot W. Stevenson 88e4e0d5dd Rewrite fHits() in arc_summary.py with SI units
Complete rewrite of fHits(). Move units from non-standard English
abbreviations to SI units, thereby avoiding confusion because of
"long scale" and "short scale" numbers. Remove unused parameter
"Decimal". Add function string. Aim to confirm to PEP8.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com>
Closes #6815
2018-01-30 10:27:30 -06:00
Scot W. Stevenson 03f638a8ef Minor code cleanup in arc_summary.py
Simplify and inline single-use function div1(); inline twice-used
function div2(); add function comment to zfs_header(); replace
variable "unused" in get_Kstat() with "_" following convention.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com>
Closes #6802
2018-01-30 10:27:30 -06:00
Scot W. Stevenson 5dc25de668 Rewrite of function fBytes() in arc_summary.py
Replace if-elif-else construction with shorter loop;
remove unused parameter "Decimal"; centralize format
string; add function documentation string; conform to
PEP8.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com>
Closes #6784
2018-01-30 10:27:30 -06:00
David Quigley 53e5890cff Fix bug in distclean which removes needed files
Running distclean removes the following files because of an error
in Makefile.am

deleted:    tests/zfs-tests/include/commands.cfg
deleted:    tests/zfs-tests/include/libtest.shlib
deleted:    tests/zfs-tests/include/math.shlib
deleted:    tests/zfs-tests/include/properties.shlib
deleted:    tests/zfs-tests/include/zpool_script.shlib

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: David Quigley <david.quigley@intel.com>
Closes #6636
2018-01-30 10:27:30 -06:00
Gvozden Neskovic a94447ddf3 dmu_objset: release bonus buffer in failure path
Reported by kmemleak during testing of a new patch:

```
unreferenced object 0xffff9f1c12e38800 (size 1024):
  comm "z_upgrade", pid 17842, jiffies 4296870904 (age 8746.268s)
  backtrace:
    kmemleak_alloc+0x7a/0x100
    __kmalloc_node+0x26c/0x510
    range_tree_create+0x39/0xa0 [zfs]
    dmu_zfetch_init+0x73/0xe0 [zfs]
    dnode_create+0x12c/0x3b0 [zfs]
    dnode_hold_impl+0x1096/0x1130 [zfs]
    dnode_hold+0x23/0x30 [zfs]
    dmu_bonus_hold_impl+0x6b/0x370 [zfs]
    dmu_bonus_hold+0x1e/0x30 [zfs]
    dmu_objset_space_upgrade+0x114/0x310 [zfs]
    dmu_objset_userobjspace_upgrade_cb+0xd8/0x150 [zfs]
    dmu_objset_upgrade_task_cb+0x136/0x1e0 [zfs]    
    kthread+0x119/0x150
```

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes #6575
2018-01-30 10:27:30 -06:00
Chunwei Chen 7192ec7942 Fix zfs_ioc_pool_sync should not use fnvlist
Use fnvlist on user input would allow user to easily panic zfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #6529
2018-01-30 10:27:30 -06:00
Gvozden Neskovic 06acbbc429 vdev_mirror: load balancing fixes
vdev_queue:
- Track the last position of each vdev, including the io size,
  in order to detect linear access of the following zio.
- Remove duplicate `vq_lastoffset`

vdev_mirror:
- Correctly calculate the zio offset (signedness issue)
- Deprecate `vdev_queue_register_lastoffset()`
- Add `VDEV_LABEL_START_SIZE` to zio offset of leaf vdevs

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes #6461
2018-01-30 10:27:30 -06:00
BtbN 6116bbd744 Use /sbin/openrc-run for openrc init scripts
Using /sbin/runscript is deprecated and throws a QA warning
when still used in init scripts.

Reviewed-by: bunder2015 <omfgbunder@gmail.com>
Signed-off-by: BtbN <btbn@btbn.de>
Closes #6519
2018-01-30 10:27:30 -06:00
Tony Hutter a803eacf26 Tag zfs-0.7.5
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2017-12-18 10:57:47 -08:00
Brian Behlendorf 504bfc8b49 Fix multihost stale cache file import
When the multihost property is enabled it should be impossible to
import an active pool even using the force (-f) option.  This patch
prevents a forced import from succeeding when importing with a
stale cache file.

The root cause of the problem is that the kernel modules trusted
the hostid provided in configuration.  This is always correct when
the configuration is generated by scanning for the pool.  However,
when using an existing cache file the hostid could be stale which
would result in the activity check being skipped.

Resolve the issue by always using the hostid read from the label
configuration where the best uberblock was found.

Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6933
Closes #6971
2017-12-18 10:31:01 -08:00
Olaf Faaland 53a8cbd70e Fix ZTS MMP tests and ztest -M behavior
Quote "$MMP_IMPORT_MSG" when it is passed as an argument, as it is a
multi-word string.  Some tests were passing when they should not have,
because the grep was only testing for the first word.

Correct the message expected when no hostid is set and the test attempts
to enable multihost.  It did not match the actual output in that
situation.

Disable ztest_reguid() when ztest is invoked with the -M option.  If
ztest performs a reguid, a concurrent import attempt may fail with the
error "one or more devices is currently unavailable" if the guid sum is
calculated on the original device guids but compared against the guid
sum ztest wrote based on the new device guids.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6666
2017-12-18 10:14:39 -08:00
David Qian 505b97ae20 Enable QAT support in zfs-dkms RPM
Enable QAT accelerated gzip compression in zfs-dkms RPM package when
environment variant ICP_ROOT is set to QAT drive source code folder
and QAT hardware presence.  Otherwise, use default gzip compression.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: David Qian <david.qian@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6932
2017-12-18 10:02:19 -08:00
Lalufu 30a64ebaed Add zfs-import.target services in spec file
Add missing zfs-import.target to list of systemd services in zfs
RPM spec file.

Reviewed-by: Niklas Wagner <Skaro@Skaronator.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ralf Ertzinger <ralf@skytale.net>
Issue #6953
Closes #6955
2017-12-18 09:45:01 -08:00
Antonio Russo da16fc5739 Enable zfs-import.target in systemd preset (#6968)
Cherry picked line from PR #6822, this enables the new
target introduced in PR #6764.

Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
2017-12-18 09:43:55 -08:00
Tony Hutter 3c7fa6ca33 Tag zfs-0.7.4
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2017-12-07 10:25:36 -08:00
Tony Hutter 36e0ddb744 Revert "Long hold the dataset during upgrade"
This reverts commit a5c8119eba.

The commit (which was modified to remove encryption) was hitting
ASSERT(dsl_pool_config_held(dmu_objset_pool(os))) in
dmu_objset_upgrade() during automated testing.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2017-12-06 13:25:40 -06:00
Giuseppe Di Natale cf21b5b5b2 Allow test-runner to filter test groups by tag
Enable test-runner to accept a list of tags to identify
which test groups the user wishes to run.

Also allow test-runner to perform multiple iterations
of a test run.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6788
2017-12-06 13:25:40 -06:00
Brian Behlendorf 1030f807ba Fix NFS sticky bit permission denied error
When zfs_sticky_remove_access() was originally adapted for Linux
a typo was made which altered the intended behavior.  As described
in the block comment, the intended behavior is that permission
should be granted when the entry is a regular file and you have
write access.  That is, S_ISREG should have been used instead of
S_ISDIR.

Restricting permission to regular files made good sense for older
systems where setting the bit on executable files would instruct
the system to save the program's text segment on the swap device.

On modern systems this behavior has been replaced by the sticky
bit acting as a restricted deletion flag and the plain file
restriction has been relaxed.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6889
Closes #6910
2017-12-04 17:22:47 -08:00
JKDingwall d45702bcfa Add /usr/bin/env to COPY_EXEC_LIST initramfs hook
5dc1ff29 changed the user space program to mount a zfs snapshot
from /bin/sh to /usr/bin/env.  If the executable is not present
in the initramfs then snapshots cannot be automounted.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: James Dingwall <james.dingwall@zynstra.com>
Closes #5360
Closes #6913
Conflicts:
	contrib/initramfs/hooks/zfs
2017-12-04 17:22:36 -08:00
Brian Behlendorf ddd20dbe0b Fix 'zpool create|add' replication level check
When the pool configuration contains a hole due to a previous device
removal ignore this top level vdev.  Failure to do so will result in
the current configuration being assessed to have a non-uniform
replication level and the expected warning will be disabled.

The zpool_add_010_pos test case was extended to cover this scenario.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6907
Closes #6911
2017-12-04 17:21:39 -08:00
Brian Behlendorf 4a98780933 Preserve itx alloc size for zio_data_buf_free()
Using zio_data_buf_alloc() to allocate the itx's may be unsafe
because the itx->itx_lr.lrc_reclen field is not constant from
allocation to free.  Using a different itx->itx_lr.lrc_reclen
size in zio_data_buf_free() can result in the allocation being
returned to the wrong kmem cache.

This issue can be avoided entirely by storing the allocation size
in itx->itx_size and using that for zio_data_buf_free().

Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6912
2017-12-04 17:21:39 -08:00
LOLi 6db8f1a0d1 Fix 'zfs get {user|group}objused@' functionality
Fix a regression accidentally introduced in 1b81ab4 that prevents
'zfs get {user|group}objused@' from correctly reporting the requested
value.

Update "userspace_003_pos.ksh" and "groupspace_003_pos.ksh" to verify
this functionality.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6908
2017-12-04 17:21:39 -08:00
Mark Wright e06711412b Linux 4.14 compat: CONFIG_GCC_PLUGIN_RANDSTRUCT
Fix build errors with gcc 7.2.0 on Gentoo with kernel 4.14
built with CONFIG_GCC_PLUGIN_RANDSTRUCT=y such as:

module/nvpair/nvpair.c:2810:2:error:
positional initialization of field in ?struct? declared with
'designated_init' attribute [-Werror=designated-init]
  nvs_native_nvlist,
  ^~~~~~~~~~~~~~~~~

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mark Wright <gienah@gentoo.org>
Closes #5390
Closes #6903
2017-12-04 17:21:39 -08:00
Richard Laager 68ba1d2fa9 initramfs: Honor canmount=off
The initramfs script was not honoring canmount=off.  With this change,
it does.  If the administrator has asked that a filesystem not be
mounted, that should be honored.

As an exception, the initramfs script ignores canmount=off on the
rootfs.  The rootfs should not have canmount=off set either.  However,
mounting it anyway seems harmless because it is being asked for
explicitly.  The point of this exception is to avoid the risk of
breaking existing systems, just in case someone has canmount=off set on
their rootfs.

The initramfs still mounts filesystems with canmount=noauto.  This is
necessary because it is typical to set that on the rootfs so that it can
be cloned.  Without canmount=noauto, the clones' duplicate mountpoints
would conflict.

This is the remainder of the fix for:
https://github.com/zfsonlinux/pkg-zfs/issues/221

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #6897
2017-12-04 17:21:38 -08:00
Richard Laager 4e11137989 initramfs: Honor mountpoint=none/legacy
For filesystems that are children of the rootfs, when mountpoint=none or
mountpoint=legacy, the initrafms script would assume a mountpoint based
on the dataset path.  Given that the rootfs should have mountpoint=/ and
mountpoint inheritance is is the default behavior of ZFS, this behavior
seems unnecessary.  In any event, it turns mountpoint=none into a no-op.
That removes this option from the administrator, and if someone uses it,
it does not work as expected.  Worse yet, if the mountpoint directory
does not exist (which is the typical case for mountpoint=none), the
mounting and thus the boot process will fail.  For the case of
mountpoint=legacy, the assumed mountpoint may not be the correct value
set in /etc/fstab.

This change makes the initramfs script not mount the filesystem in
either case.  For mountpoint=none, this means we are correctly honoring
the setting.  For mountpoint=legacy, there are two scenarios:  If
canmount=on, the filesystem will be mounted by the normal mechanisms
later in the boot process.  If canmount=noauto, the filesystem will not
be mounted at all, unless the administrator has done something special.
If they're not doing something special and they want it mounted by the
initramfs, they can simply not set mountpoint=legacy.

This is part of the fix for:
https://github.com/zfsonlinux/pkg-zfs/issues/221

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #6897
2017-12-04 17:21:38 -08:00
DeHackEd be9be1cc3e zpool(8): Fix "zpool import -t"
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: DHE <git@dehacked.net>
Closes #6894
2017-12-04 17:21:38 -08:00
George G eab4536081 Fix column alignment with long zpool names
`zpool status` normally aligns NAME/STATE/etc columns:

    NAME                       STATE     READ WRITE CKSUM
    dummy                      ONLINE       0     0     0
      mirror-0                 ONLINE       0     0     0
        /tmp/dummy-long-1.bin  ONLINE       0     0     0
        /tmp/dummy-long-2.bin  ONLINE       0     0     0
      mirror-1                 ONLINE       0     0     0
        /tmp/dummy-long-3.bin  ONLINE       0     0     0
        /tmp/dummy-long-4.bin  ONLINE       0     0     0

However, if the zpool name is longer than the zvol names, alignment
issues arise:

    NAME                  STATE     READ WRITE CKSUM
    dummy-very-very-long-zpool-name  ONLINE       0     0     0
      mirror-0            ONLINE       0     0     0
        /tmp/dummy-1.bin  ONLINE       0     0     0
        /tmp/dummy-2.bin  ONLINE       0     0     0
      mirror-1            ONLINE       0     0     0
        /tmp/dummy-3.bin  ONLINE       0     0     0
        /tmp/dummy-4.bin  ONLINE       0     0     0

`zpool iostat` and `zpool import` are also affected:

                  capacity     operations     bandwidth
    pool        alloc   free   read  write   read  write
    ----------  -----  -----  -----  -----  -----  -----
    dummy        104K  1.97G      0      0    152  9.84K
    dummy-very-very-long-zpool-name   152K  1.97G      0      1    144  13.1K
    ----------  -----  -----  -----  -----  -----  -----

    dummy-very-very-long-zpool-name  ONLINE
      mirror-0            ONLINE
        /tmp/dummy-1.bin  ONLINE
        /tmp/dummy-2.bin  ONLINE
      mirror-1            ONLINE
        /tmp/dummy-3.bin  ONLINE
        /tmp/dummy-4.bin  ONLINE

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Gaydarov <git@gg7.io>
Closes #6786
2017-12-04 17:21:38 -08:00
Brian Behlendorf 954516cec1 Emit history events for 'zpool create'
History commands and events were being suppressed for the
'zpool create' command since the history object did not
yet exist.  Create the object earlier so this history
doesn't get lost.

Split the pool_destroy event in to pool_destroy and
pool_export so they may be distinguished.

Updated events_001_pos and events_002_pos test cases.  They
now check for the expected history events and were reworked
to be more reliable.

Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6712
Closes #6486
Conflicts:
	tests/zfs-tests/tests/functional/events/events_002_pos.ksh
2017-12-04 17:21:03 -08:00
Brian Behlendorf 841cb5ee2a Fix dirty check in dmu_offset_next()
The correct way to determine if a dnode is dirty is to check
if any of the dn->dn_dirty_link's are active.  Relying solely
on the dn->dn_dirtyctx can result in the dnode being mistakenly
reported as clean.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3125 
Closes #6867
2017-11-21 13:11:29 -06:00
Brian Behlendorf d4cf31275b Disable automatic dependencies in zfs-test package
All of the ZTS test scripts specify /bin/ksh as the interpreter.
Unfortunately, as of Fedora 27 only /usr/bin/ksh is provided by
the package manager.  Rather than change all the scripts to
accommodate the latest Fedora disable automatic dependencies
for the zfs-test package.  Functionally this will not cause
any problems since /bin is a symlink to /usr/bin.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6868
2017-11-21 13:11:29 -06:00
LOLi fedc1d96a8 Fix truncate(2) mtime and ctime handling
On Linux, ftruncate(2) always changes the file timestamps, even if the
file size is not changed. However, in case of a successfull
truncate(2), the timestamps are updated only if the file size changes.
This translates to the VFS calling the ZFS Posix Layer "setattr"
function (zpl_setattr) with ATTR_MTIME and ATTR_CTIME unconditionally
set on the iattr mask only when doing a ftruncate(2), while the
truncate(2) is left to the filesystem implementation to be dealt with.

This behaviour is consistent with POSIX:2004/SUSv3 specifications
where there's no explicit requirement for file size changes to update
the timestamps only for ftruncate(2):

http://pubs.opengroup.org/onlinepubs/009695399/functions/truncate.html
http://pubs.opengroup.org/onlinepubs/009695399/functions/ftruncate.html

This has been later updated in POSIX:2008/SUSv4 where, for both
truncate(2)/ftruncate(2), there's no mention of this size change
requirement:

http://austingroupbugs.net/view.php?id=489
http://pubs.opengroup.org/onlinepubs/9699919799/functions/truncate.html
http://pubs.opengroup.org/onlinepubs/9699919799/functions/ftruncate.html

Unfortunately the Linux VFS is still calling into the ZPL without
ATTR_MTIME/ATTR_CTIME set in the truncate(2) case: we fix this by
explicitly updating the timestamps when detecting the ATTR_SIZE bit,
which is always set in do_truncate(), on the iattr mask.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6811
Closes #6819
2017-11-21 13:11:29 -06:00
benrubson 59511072b4 OpenZFS 7531 - Assign correct flags to prefetched buffers
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Authored by: abraunegg <alex.braunegg@gmail.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/7531
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/468008cb
2017-11-21 13:11:29 -06:00
Arkadiusz Bubała a5c8119eba Long hold the dataset during upgrade
If the receive or rollback is performed while filesystem is upgrading
the objset may be evicted in `dsl_dataset_clone_swap_sync_impl`. This
will lead to NULL pointer dereference when upgrade tries to access
evicted objset.

This commit adds long hold of dataset during whole upgrade process.
The receive and rollback will return an EBUSY error until the
upgrade is not finished.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Arkadiusz Bubała <arkadiusz.bubala@open-e.com>
Closes #5295
Closes #6837
2017-11-21 13:03:21 -06:00
Tim Chase d7881a6dca Handle compressed buffers in __dbuf_hold_impl()
In __dbuf_hold_impl(), if a buffer is currently syncing and is still
referenced from db_data, a copy is made in case it is dirtied again in
the txg.  Previously, the buffer for the copy was simply allocated with
arc_alloc_buf() which doesn't handle compressed or encrypted buffers
(which are a special case of a compressed buffer).  The result was
typically an invalid memory access because the newly-allocated buffer
was of the uncompressed size.

This commit fixes the problem by handling the 2 compressed cases,
encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and
arc_alloc_compressed_buf().

Although using the proper allocation functions fixes the invalid memory
access by allocating a buffer of the compressed size, another unrelated
issue made it impossible to properly detect compressed buffers in the
first place.  The header's compression flag was set to ZIO_COMPRESS_OFF
in arc_write() when it was possible that an attached buffer was actually
compressed.  This commit adds logic to only set ZIO_COMPRESS_OFF in
the non-ZIO_RAW case which wil handle both cases of compressed buffers
(encrypted or unencrypted).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #5742
Closes #6797
2017-11-21 13:01:30 -06:00
LOLi 951e62169e Fix undefined %{systemd_svcs} in RPM scriptlets
This allows RPM-based systems to properly control package installation
and removal when using systemd.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6838 
Closes #6841
2017-11-20 16:48:26 -06:00
wli5 9add19b37d Bug fix in qat_compress.c when compressed size is < 4KB
When the 128KB block is compressed to less than 4KB, the pointer
to the Footer is not in the end of the compressed buffer, that's
because the Header offset was added twice for this case. So there
is a gap between the Footer and the compressed buffer.
1. Always compute the Footer pointer address from the start of the
last page.
2. Remove the un-used workaroud code which has been verified fixed
with the latest driver and this fix.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Weigang Li <weigang.li@intel.com>
Closes #6827
2017-11-20 16:48:26 -06:00
Brian Behlendorf b2d633202d Disable automatic dependencies in DKMS package
By default additional dependencies are generated automatically for
packages.  This is normally a good thing because it helps ensure
things just work.  It doesn't make sense for the DKMS package which
requires minimal dependencies that can be easily listed.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6467 
Closes #6835
2017-11-20 16:48:25 -06:00
Brian Behlendorf 414f4a9c54 Initramfs fixes
* initramfs: Fix inconsistent whitespace
* initramfs: Fix a spelling error
* initramfs: Set elevator=noop on the rpool's disks

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #6807
2017-11-20 16:23:33 -06:00
Antonio Russo 1c4f5e7d92 systemd zfs-import.target and documentation
zfs-import-{cache,scan}.service must complete before any mounting of
filesystems can occur. To simplify this dependency, create a target
that is reached After (in the systemd sense) the pool is imported.

Additionally, recommend that legacy zfs mounts use the option

x-systemd.requires=zfs-import.target

to codify this requirement.

Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #6764
2017-11-20 16:20:08 -06:00
abraunegg 246e515cf8 Update zfs module parameters man5
Update zfs module parameters man5 with missing parameter details
for multiple tunings.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Alex Braunegg <alex.braunegg@gmail.com>
Closes #6785
2017-11-20 16:19:54 -06:00
Brian Behlendorf 2d41e75e52 Fix status command options in zpool(8)
The 'zpool status' command supports the -P option for printing full
path names.  It does not support the -p parsable option for printing
exact values.
    
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6792 
Closes #6794
2017-11-20 16:19:23 -06:00
Fabian-Gruenbichler d834d6811b arcstat: flush stdout / outfile after each line
Otherwise, if arcstat gets interrupted before the desired number of
iterations is reached, the output file will be empty (both if set via
'-o' or via shell redirection).

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #6775
2017-11-20 16:19:23 -06:00
Giuseppe Di Natale 029a1b0c20 Ensure arc_size_break is filled in arc_summary.py
Use mfu_size and mru_size pulled from the arcstats
kstat file to calculate the mfu and mru percentages
for arc size breakdown.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: AndCycle <andcycle@andcycle.idv.tw>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #5526 
Closes #6770
2017-11-20 16:19:23 -06:00
Giuseppe Di Natale c45254b0ec Correct flake8 errors after STYLE builder update
Fix new flake8 errors related to bare excepts and ambiguous
variable names due to a STYLE builder update.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6776
2017-11-20 16:19:23 -06:00
wli5 318fdeb51f Support integration with new QAT products
Support integration with new QAT products: Intel(R) C62x Chipset,
or Atom(R) C3000 Processor Product Family SoC:
1. Detect new file name in auto-conf.
2. Change MAX_INSTANCES to 48.
3. Change "num_inst" to U16 to clean a build warning.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Weigang Li <weigang.li@intel.com>
Closes #6767
2017-11-20 16:19:23 -06:00
Olaf Faaland d3d20bf442 Reimplement vdev_random_leaf and rename it
Rename it as mmp_random_leaf() since it is defined in mmp.c.

The earlier implementation could end up spinning forever if a pool had a
vdev marked writeable, none of whose children were writeable.  It also
did not guarantee that if a writeable leaf vdev existed, it would be
found.

Reimplement to recursively walk the device tree to select the leaf.  It
searches the entire tree, so that a return value of (NULL) indicates
there were no usable leaves in the pool; all were either not writeable
or had pending mmp writes.

It still chooses the starting child randomly at each level of the tree,
so if the pool's devices are healthy, the mmp writes go to random leaves
with an even distribution.  This was verified by testing using
zfs_multihost_history enabled.

Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6631 
Closes #6665
2017-11-20 16:19:23 -06:00
Tony Hutter 99598264fc Tag zfs-0.7.3
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2017-10-18 11:00:26 -07:00
Neal Gompa (ニール・ゴンパ) abe30b7b40 Add DKMS package on Debian-based distributions
* config/deb.am: Enable building DKMS packages for Debian
* rpm/generic/zfs-dkms.spec.in: Adjust spec to be Debian-compatible
  * Condition kernel-devel Req to RPM distros
  * Adjust the DKMS Req to have a minimum of a version only
  * Ensure that --rpm_safe_upgrade isn't used on non-RPM distros
* config/deb.am: Drop CONFIG_KERNEL and CONFIG_USER guards
* Makefile.am: Add pkg-dkms target

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Neal Gompa <ngompa@datto.com>
Closes #6044
Closes #6731
2017-10-17 16:49:19 -07:00
Tobin Harding f90ee0ca3d Fix function documentation to correctly mirror code
Currently the function documentation states that two strings are
allocated, this is outdated. Only one char ** parameter is passed
into the function now, clearly only a pointer to a single string
is returned and needs to be free'd.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Closes #6754
2017-10-17 16:49:14 -07:00
Brian Behlendorf 4ed955e280 Increase default zloop.sh vdev size
The default 128M vdev size used by zloop.sh isn't always large
enough and can result in ENOSPC failures which suspend the pool.
Increase the default size to 512M and provide a -s option which
can be used to specify an alternate size.

This does increase the free space requirements to run zloop.sh.
However, since the vdevs are sparse 4x the space is not required.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6758
2017-10-17 16:49:08 -07:00
Damian Wojsław 1721f13e76 Typo in dsl_dataset.h
The parameters dsl_dataset_t *os in function prototype should be
renamed to dsl_dataset_t *ds.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Damian Wojsław <damian@wojslaw.pl>
Closes #6756
Closes #6273
2017-10-17 16:49:03 -07:00
Brian Behlendorf 6e893ef62a Fix chattr/cleanup failure
The chattr cleanup step may fail to delete the user if there is still
an active process running as that user.  Retry the userdel when this
occurs to eliminate spurious false positves.

  ERROR: userdel quser1 exited 8
  userdel: user quser1 is currently used by process 26814

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6749
2017-10-17 16:48:58 -07:00
Brian Behlendorf e0eaaf8144 Fixes for SPARC support
The current code base almost compiles on SPARC, but a few fixes are
required for the code to compile (and work efficiently). Code in this
PR comes from OpenZFS project which was initially dropped when porting
the crypto framework.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pengcheng Xu <i@jsteward.moe>
Closes #6733
Closes #6738
Closes #6750
2017-10-16 10:57:55 -07:00
Antonio Russo cb8a074dcb Explicitly depend on icp module in initramfs hook
Automatic dependency resolution is unreliable on many systems.
Follow suit with existing code, and explicitly include icp
in module dependencies.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes #6751
2017-10-16 10:57:55 -07:00
aun c3ac4ccabb Fix boot from ZFS issues
* Correct ZFS snapshot listing
* Disable "lvm is not available" message on quiet boot

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alar Aun <spamtoaun@gmail.com>
Closes #6700
Closes #6747
2017-10-16 10:57:55 -07:00
Fabian Grünbichler 8d688ce66a Skip FREEOBJECTS for objects which can't exist
When sending an incremental stream based on a snapshot, the receiving
side must have the same base snapshot.  Thus we do not need to send
FREEOBJECTS records for any objects past the maximum one which exists
locally.

This allows us to send incremental streams (again) to older ZFS
implementations (e.g. ZoL < 0.7) which actually try to free all objects
in a FREEOBJECTS record, instead of bailing out early.

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #5699
Closes #6507
Closes #6616
2017-10-16 10:57:55 -07:00
Fabian Grünbichler b544fe4123 Free objects when receiving full stream as clone
All objects after the last written or freed object are not supposed to
exist after receiving the stream.  Free them accordingly, as if a
freeobjects record for them had been included in the stream.

Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #5699
Closes #6507
Closes #6616
2017-10-16 10:57:55 -07:00
LOLi 926c6ec453 Fix intra-pool resumable 'zfs send -t <token>'
Because resuming from a token requires "guid" -> "snapshot" mapping
we have to walk the whole dataset hierarchy to find the right snapshot
to send; when both source and destination exists, for an incremental
resumable stream, libzfs gets confused and picks up the wrong snapshot
to send from: this results in attempting to send

   "destination@snap1 -> source@snap2"

instead of

   "source@snap1 -> source@snap2"

which fails with a "Invalid cross-device link" error (EXDEV).

Fix this by adjusting the logic behind dataset traversal in
zfs_iter_children() to pick the right snapshot to send from.

Additionally update dry-run 'zfs send -t' to print its output to
stderr: this is consistent with other dry-run commands.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6618
Closes #6619
Closes #6623
2017-10-16 10:57:55 -07:00
Brian Behlendorf 91b2f6ab1c Fix ARC behavior on 32-bit systems
With the addition of the ABD changes consumption of the virtual
address space has been greatly reduced.  This exposed an issue on
CONFIG_HIGHMEM systems where free memory was being calculated
incorrectly.  Functionally this didn't cause any major problems
prior to ABD because a lack of available virtual address space
was used as an indicator of low memory.

This patch makes the following changes to address the issue and
in the process realigns the code further with OpenZFS.  There
are no substantive changes in behavior for 64-bit systems.

* Added CONFIG_HIGHMEM case to the arc_all_memory() and
  arc_free_memory() functions to only consider low memory pages
  on CONFIG_HIGHMEM systems.

* The arc_free_memory() function was updated to return bytes
  instead of pages to be consistent with the other helper
  functions.  In user space we make up some reasonable values
  since currently only testing is performed in this context.

* Adds three new values to the arcstats kstat to provide visibility
  in to the ARC's assessment of the memory situation:
  memory_all_bytes, memory_free_bytes, and memory_available_bytes.

* Added kmem_reap() call to arc_available_memory() for 32-bit
  builds to realign code with OpenZFS.

* Reduced size of test file in /async_destroy_001_pos.ksh to
  speed up test case.  Multiple txgs are still required.

* Move vdevs used by zpool_clear_001_pos and zpool_upgrade_002_pos
  to TEST_BASE_DIR location to speed up test cases.

Reviewed-by: David Quigley <david.quigley@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5352
Closes #6734
2017-10-16 10:57:55 -07:00
privb0x23 851a7cd833 Fix inclusion of libgcc_s.so on Void
On Void Linux (x86_64 musl) libgcc_s.so is located in "/usr/lib"
so it is not found by dracut and it produces an error.

Add a simple additional path check for "/usr/lib/libgcc_s.so*"
and install it in the initramfs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: privb0x23 <privb0x23@users.noreply.github.com>
Closes #6715
2017-10-16 10:57:55 -07:00
Tobin Harding 83d4d1a784 Use bitwise '&' instead of logical '&&'
Make two instances of the same change. Change bitwise AND (&) to logical
AND (&&).

Currently the code uses a bitwise AND between two boolean values.

In the first instance;

The first operand is a flag that has been bitwise combined with a bit
mask to get a boolean value as to whether a file has group write
permissions set.

The second operand used is a struct member that is intended as a
boolean flag not a bit mask.

In the second instance the argument is the same except with world write
permissions instead of group write (S_IWOTH, S_IWGRP).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Closes #6684
Closes #6722
2017-10-16 10:57:55 -07:00
Tobin Harding 80cc2f6111 Remove unnecessary equality check
Currently `if` statement includes an assignment (from a function return
value) and a equality check. The parenthesis are in the incorrect place,
currently the code clobbers the function return value because of this.

We can fix this by simplifying the `if` statement.

`if (foo != 0)`

can be more succinctly expressed as

`if (foo)`

Remove the equality check, add parenthesis to correct the statement.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Closes #6685
Close #6719
2017-10-16 10:57:55 -07:00
Isaac Huang b97948276d Use linear abd in vdev_copy_uberblocks()
The vdev_copy_uberblocks() function should use abd_alloc_linear() to
allocate ub_abd, because abd_to_buf(ub_abd)) is used later.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Isaac Huang <he.huang@intel.com>
Closes #6718
Closes #6713
2017-10-16 10:57:55 -07:00
Ned Bass 4cfc086e4d receive_freeobjects() skips freeing some objects
When receiving a FREEOBJECTS record, receive_freeobjects()
incorrectly skips a freed object in some cases. Specifically, this
happens when the first object in the range to be freed doesn't exist,
but the second object does. This leaves an object allocated on disk
on the receiving side which is unallocated on the sending side, which
may cause receiving subsequent incremental streams to fail.

The bug was caused by an incorrect increment of the object index
variable when current object being freed doesn't exist.  The
increment is incorrect because incrementing the object index is
handled by a call to dmu_object_next() in the increment portion of
the for loop statement.

Add test case that exposes this bug.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #6694
Closes #6695
2017-10-16 10:57:55 -07:00
chrisrd 25d232f407 Scale the dbuf cache with arc_c
Commit d3c2ae1 introduced a dbuf cache with a default size of the
minimum of 100M or 1/32 maximum ARC size. (These figures may be adjusted
using dbuf_cache_max_bytes and dbuf_cache_max_shift.) The dbuf cache
is counted as metadata for the purposes of ARC size calculations.

On a 1GB box the ARC maximum size defaults to c_max 493M which gives a
dbuf cache default minimum size of 15.4M, and the ARC metadata defaults
to minimum 16M. I.e. the dbuf cache is an significant proportion of the
minimum metadata size. With other overheads involved this actually means
the ARC metadata doesn't get down to the minimum.

This patch dynamically scales the dbuf cache to the target ARC size
instead of statically scaling it to the maximum ARC size. (The scale is
still set by dbuf_cache_max_shift and the maximum size is still fixed by
dbuf_cache_max_bytes.) Using the target ARC size rather than the current
ARC size is done to help the ARC reach the target rather than simply
focusing on the current size.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Issue #6506
Closes #6561
2017-10-16 10:57:54 -07:00
Tony Hutter edd7c24623 Tag zfs-0.7.2
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2017-09-22 11:14:01 -07:00
Giuseppe Di Natale bef6a8bc3a Correct cppcheck errors (#6662)
ZFS buildbot STYLE builder was moved to Ubuntu 17.04
which has a newer version of cppcheck. Handle the
new cppcheck errors.

uu_* functions removed in this commit were unused
and effectively dead code. They are now retired.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6653
2017-09-20 12:59:21 -07:00
Brian Behlendorf 266b181e75 Increase default arc_c_min
Increase the default arc_c_min value to which whichever is larger,
either 32M or 1/32 of total system memory.  This is advantageous for
systems with more than 1G of memory where performance issues may
occur when the ARC is allowed to collapse below a minimum size.
At the same time we want to use the bare minimum value which is
still functional so the filesystem can be used in very low memory
environments.

Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6659
2017-09-20 10:25:54 -07:00
Brian Behlendorf c474f5e9a7 Export symbol dmu_tx_mark_netfree()
This symbol is needed by Lustre for the same reason it was needed
by the ZPL.  It should have been exported when the original patch
was merged.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6660
2017-09-20 10:25:54 -07:00
Brian Behlendorf 4e6a9e4598 ZTS fix slog_replay_volume.ksh failure
The slog_replay_volume.ksh test case will fail when the pool is
layered on files in a filesystem which does not support discard.
Avoid this issue by creating the pool using DISKS which will
either be loopback device or real disk.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6654
2017-09-20 10:25:54 -07:00
Brian Behlendorf 661907e6bc Linux 4.14 compat: IO acct, global_page_state, etc (#6655)
generic_start_io_acct/generic_end_io_acct in the master
branch of the linux kernel requires that the request_queue
be provided.

Move the logic from freemem in the spl to arc_free_memory
in arc.c. Do this so we can take advantage of global_page_state
interface checks in zfs.

Upstream kernel replaced struct block_device with
struct gendisk in struct bio. Determine if the
function bio_set_dev exists during configure
and have zfs use that if it exists.

bio_set_dev https://github.com/torvalds/linux/commit/74d4699
global_node_page_state https://github.com/torvalds/linux/commit/75ef718
io acct https://github.com/torvalds/linux/commit/d62e26b

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6635

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2017-09-19 14:24:34 -07:00
Gaurav Kumar d3e7d981d4 Modifying XATTRs doesnt change the ctime
Changing any metadata, should modify the ctime.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: gaurkuma <gauravk.18@gmail.com>
Closes #3644
Closes #6586
2017-09-13 16:05:18 -07:00
Brian Behlendorf a2a0440918 Fix volume WR_INDIRECT log replay (#6620)
The portion of the zvol_replay_write() handler responsible for
replaying indirect log records for some reason never existed.
As a result indirect log records were not being correctly replayed.

This went largely unnoticed since the majority of zvol log records
were of the type WR_COPIED or WR_NEED_COPY prior to OpenZFS 7578.

This patch updates zvol_replay_write() to correctly handle these
log records and adds a new test case which verifies volume replay
to prevent any regression.  The existing test case which verified
replay on filesystem was renamed slog_replay_fs.ksh for clarity.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6603
2017-09-13 16:04:16 -07:00
Giuseppe Di Natale 45d1abc74d Improved dnode allocation and dmu_hold_impl() (#6611)
Refactor dmu_object_alloc_dnsize() and dnode_hold_impl() to simplify the
code, fix errors introduced by commit dbeb879 (PR #6117) interacting
badly with large dnodes, and improve performance.

* When allocating a new dnode in dmu_object_alloc_dnsize(), update the
percpu object ID for the core's metadnode chunk immediately.  This
eliminates most lock contention when taking the hold and creating the
dnode.

* Correct detection of the chunk boundary to work properly with large
dnodes.

* Separate the dmu_hold_impl() code for the FREE case from the code for
the ALLOCATED case to make it easier to read.

* Fully populate the dnode handle array immediately after reading a
block of the metadnode from disk.  Subsequently the dnode handle array
provides enough information to determine which dnode slots are in use
and which are free.

* Add several kstats to allow the behavior of the code to be examined.

* Verify dnode packing in large_dnode_008_pos.ksh.  Since the test is
purely creates, it should leave very few holes in the metadnode.

* Add test large_dnode_009_pos.ksh, which performs concurrent creates
and deletes, to complement existing test which does only creates.

With the above fixes, there is very little contention in a test of about
200,000 racing dnode allocations produced by tests 'large_dnode_008_pos'
and 'large_dnode_009_pos'.

name                            type data
dnode_hold_dbuf_hold            4    0
dnode_hold_dbuf_read            4    0
dnode_hold_alloc_hits           4    3804690
dnode_hold_alloc_misses         4    216
dnode_hold_alloc_interior       4    3
dnode_hold_alloc_lock_retry     4    0
dnode_hold_alloc_lock_misses    4    0
dnode_hold_alloc_type_none      4    0
dnode_hold_free_hits            4    203105
dnode_hold_free_misses          4    4
dnode_hold_free_lock_misses     4    0
dnode_hold_free_lock_retry      4    0
dnode_hold_free_overflow        4    0
dnode_hold_free_refcount        4    57
dnode_hold_free_txg             4    0
dnode_allocate                  4    203154
dnode_reallocate                4    0
dnode_buf_evict                 4    23918
dnode_alloc_next_chunk          4    4887
dnode_alloc_race                4    0
dnode_alloc_next_block          4    18

The performance is slightly improved for concurrent creates with
16+ threads, and unchanged for low thread counts.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
2017-09-13 15:46:15 -07:00
dbavatar 89950722c6 Linux 4.8+ compatibility fix for vm stats
vm_node_stat must be used instead of vm_zone_stat. Unfortunately the
old code still compiles potentially leading to silent failure of
arc_evictable_memory()

AKAMAI: CR 3816601: Regression in zfs dropcache test

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Closes #6528
2017-09-13 14:21:59 -07:00
LOLi 4810a108e8 Disable mount(8) canonical paths in do_mount()
By default the mount(8) command, as invoked by 'zfs mount', will try
to resolve any path parameter in its canonical form: this could lead
to mount failures when the cwd contains a symlink having the same name
of the dataset being mounted.

Fix this by explicitly disabling mount(8) path canonicalization.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #1791 
Closes #6429 
Closes #6437
2017-08-21 16:46:55 -07:00
LOLi ae5b4a05ff Fix range locking in ZIL commit codepath
Since OpenZFS 7578 (1b7c1e5) if we have a ZVOL with logbias=throughput
we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr
offset and length to the offset and length of the BIO from
zvol_write()->zvol_log_write(): these offset and length are later used
to take a range lock in zillog->zl_get_data function: zvol_get_data().

Now suppose we have a ZVOL with blocksize=8K and push 4K writes to
offset 0: we will only be range-locking 0-4096. This means the
ASSERTion we make in dbuf_unoverride() is no longer valid because now
dmu_sync() is called from zilog's get_data functions holding a partial
lock on the dbuf.

Fix this by taking a range lock on the whole block in zvol_get_data().

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6238 
Closes #6315 
Closes #6356 
Closes #6477
2017-08-21 16:46:54 -07:00
LOLi 3468fdbd34 Fix remounting snapshots read-write
It's not enough to preserve/restore MS_RDONLY on the superblock flags
to avoid remounting a snapshot read-write: be explicit about our
intentions to the VFS layer so the readonly bit is updated correctly
in do_remount_sb().

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6510
Closes #6515
2017-08-21 16:46:52 -07:00
Brian Behlendorf fb3f1fdbd6 Fix ZTS grow_pool/setup
The addition of the large_dnode_008_pos test case, which runs
right before this one, exposed some racy behavior in grow_pool
setup.sh on the Ubuntu kmemleak builder.  Before creating
partitions on a device destroying any existing ones.

  ERROR: set_partition 1  100mb loop0 exited 1

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6499 
Closes #6516
2017-08-21 16:41:22 -07:00
sckobras 426563be70 vdev_id: implement slot numbering by port id
With HPE hardware and hpsa-driven SAS adapters, only a single phy is
reported, but no individual per-port phys (ie. no phy* entry below
port_dir), which breaks topology detection in the current sas_handler
code. Instead, slot information can be derived directly from the port
number. This change implements a new slot keyword "port" similar to
"id" and "lun", and assumes a default phy/port of 0 if no individual
phy entry can be found. It allows to use the "sas_direct" topology with
current HPE Dxxxx and Apollo 45xx JBODs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Daniel Kobras <d.kobras@science-computing.de>
Closes #6484
2017-08-21 16:41:22 -07:00
Chunwei Chen aec4318870 Fix NULL pointer when O_SYNC read in snapshot
When doing read on a file open with O_SYNC, it will trigger zil_commit.
However for snapshot, there's no zil, so we shouldn't be doing that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes #6478 
Closes #6494
2017-08-21 16:41:22 -07:00
sanjeevbagewadi 2d9b57d39f zio_dva_throttle_done() should allow zinjected ZIO
If fault injection is enabled, the ZIO_FLAG_IO_RETRY could be set by
zio_handle_device_injection() to generate the FMA events and update
stats. Hence, ignore the flag and process such zios.

A better fix would be to add another flag in the zio_t to indicate that
the zio is failed because of a zinject rule. However, considering the
fact that we do this in debug bits, we could do with the crude check
using the global flag zio_injection_enabled which is set to 1 when
zinject records are added.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Closes #6383 
Closes #6384
2017-08-21 16:41:22 -07:00
Fabian-Gruenbichler 4bdb8fcfa8 Man page fixes
* ztest.1 man page: fix typo
* zfs-module-parameters.5 man page: fix grammar

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #6492
2017-08-21 16:41:22 -07:00
gaurkuma 58c1c40a5e Crash in dbuf_evict_one with DTRACE_PROBE
Update the dbuf__evict__one() tracepoint so that it can safely
handle a NULL dmu_buf_impl_t pointer.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>    
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: gaurkuma <gauravk.18@gmail.com>
Closes #6463
2017-08-21 16:41:22 -07:00
Tony Hutter 751575fe6f Tag zfs-0.7.1
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2017-08-08 13:14:32 -07:00
Brian Behlendorf 751941e248 Fix dnode allocation race
When performing concurrent object allocations using the new
multi-threaded allocator and large dnodes it's possible to
allocate overlapping large dnodes.

This case should have been handled by detecting an error
returned by dnode_hold_impl().  But that logic only checked
the returned dnp was not-NULL, and the dnp variable was not
reset to NULL when retrying.  Resolve this issue by properly
checking the return value of dnode_hold_impl().

Additionally, it was possible that dnode_hold_impl() would
misreport a dnode as free when it was in fact in use.  This
could occurs for two reasons:

* The per-slot zrl_lock must be held over the entire critical
  section which includes the alloc/free until the new dnode
  is assigned to children_dnodes.  Additionally, all of the
  zrl_lock's in the range must be held to protect moving
  dnodes.

* The dn->dn_ot_type cannot be solely relied upon to check
  the type.  When allocating a new dnode its type will be
  DMU_OT_NONE after dnode_create().  Only latter when
  dnode_allocate() is called will it transition to the new
  type.  This means there's a window when allocating where
  it can mistaken for a free dnode.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6414
Closes #6439
2017-08-08 10:17:33 -07:00
Ned Bass ef605a5517 Add debug log entries for failed receive records
Log contents of a receive record if an error occurs while writing
it out to the pool. This may help determine the cause when backup
streams are rejected as invalid.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #6465
2017-08-08 10:17:23 -07:00
Karsten Kretschmer 8eb6dcec7d dracut: Install commands required for vdev_id
The vdev_id script requires awk, grep, and head.  Use dracut_install to
ensure that these commands are available in the initrd environment.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Karsten Kretschmer <kkretschmer@gmail.com>
Closes #6443
Closes #6452
2017-08-07 09:37:30 -07:00
Tony Hutter 07cbcd5089 Only record zio->io_delay on reads and writes
While investigating https://github.com/zfsonlinux/zfs/issues/6425 I
noticed that ioctl ZIOs were not setting zio->io_delay correctly.  They
would set the start time in zio_vdev_io_start(), but never set the end
time in zio_vdev_io_done(), since ioctls skip it and go straight to
zio_done().  This was causing spurious "delayed IO" events to appear,
which would eventually get rate-limited and displayed as
"Missed events" messages in zed.

To get around the problem, this patch only sets zio->io_delay for read
and write ZIOs, since that's all we care about anyway.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #6425
Closes #6440
2017-08-02 11:37:18 -07:00
Giuseppe Di Natale 12acabe2a4 mmp_on_uberblocks: Use kstat for uberblock counts
Use kstat to get a more accurate count of uberblock updates.
Using a loop with zdb can potentially miss some uberblocks.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6407
Closes #6419
2017-08-02 11:21:33 -07:00
LOLi 20c88dc3ef Fix volmode=none property behavior at import time
At import time spa_import() calls zvol_create_minors() directly: with
the current implementation we have no way to avoid device node
creation when volmode=none.

Fix this by enforcing volmode=none directly in zvol_alloc().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6426
2017-08-02 11:21:14 -07:00
Brian Behlendorf 0c8fedeb35 Fix aarch64 build
Add aarch64 to the list of architecture which do not sanitize the
LDFLAGS from the environment.  See fb963d33 for details.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6424
2017-08-02 11:20:50 -07:00
Giuseppe Di Natale affb7141d7 Disable zfs_send_007_pos
Test case zfs_send_007_pos regularly is killed
by test-runner during zfs-tests on buildbot. Disable
it for now until further investigation can be done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes #6422
2017-08-02 11:20:32 -07:00
bunder2015 e0031d86b7 Correct man page generation
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes #6409
Closes #6411
2017-07-28 11:01:53 -07:00
3196 changed files with 88809 additions and 404236 deletions
-10
View File
@@ -1,10 +0,0 @@
root = true
[*]
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
[*.{c,h}]
tab_width = 8
indent_style = tab
+66 -122
View File
@@ -1,12 +1,10 @@
# Contributing to OpenZFS # Contributing to ZFS on Linux
<p align="center"> <p align="center"><img src="http://zfsonlinux.org/images/zfs-linux.png"/></p>
<img alt="OpenZFS Logo"
src="https://openzfs.github.io/openzfs-docs/_static/img/logo/480px-Open-ZFS-Secondary-Logo-Colour-halfsize.png"/>
</p>
*First of all, thank you for taking the time to contribute!* *First of all, thank you for taking the time to contribute!*
By using the following guidelines, you can help us make OpenZFS even better. By using the following guidelines, you can help us make ZFS on Linux even
better.
## Table Of Contents ## Table Of Contents
[What should I know before I get [What should I know before I get
@@ -29,22 +27,19 @@ started?](#what-should-i-know-before-i-get-started)
* [Commit Message Formats](#commit-message-formats) * [Commit Message Formats](#commit-message-formats)
* [New Changes](#new-changes) * [New Changes](#new-changes)
* [OpenZFS Patch Ports](#openzfs-patch-ports) * [OpenZFS Patch Ports](#openzfs-patch-ports)
* [Coverity Defect Fixes](#coverity-defect-fixes)
* [Signed Off By](#signed-off-by)
Helpful resources Helpful resources
* [OpenZFS Documentation](https://openzfs.github.io/openzfs-docs/) * [ZFS on Linux wiki](https://github.com/zfsonlinux/zfs/wiki)
* [OpenZFS Developer Resources](http://open-zfs.org/wiki/Developer_resources) * [OpenZFS Documentation](http://open-zfs.org/wiki/Developer_resources)
* [Git and GitHub for beginners](https://openzfs.github.io/openzfs-docs/Developer%20Resources/Git%20and%20GitHub%20for%20beginners.html)
## What should I know before I get started? ## What should I know before I get started?
### Get ZFS ### Get ZFS
You can build zfs packages by following [these You can build zfs packages by following [these
instructions](https://openzfs.github.io/openzfs-docs/Developer%20Resources/Building%20ZFS.html), instructions](https://github.com/zfsonlinux/zfs/wiki/Building-ZFS),
or install stable packages from [your distribution's or install stable packages from [your distribution's
repository](https://openzfs.github.io/openzfs-docs/Getting%20Started/index.html). repository](https://github.com/zfsonlinux/zfs/wiki/Getting-Started).
### Debug ZFS ### Debug ZFS
A variety of methods and tools are available to aid ZFS developers. A variety of methods and tools are available to aid ZFS developers.
@@ -53,30 +48,28 @@ configure option should be set. This will enable additional correctness
checks and all the ASSERTs to help quickly catch potential issues. checks and all the ASSERTs to help quickly catch potential issues.
In addition, there are numerous utilities and debugging files which In addition, there are numerous utilities and debugging files which
provide visibility into the inner workings of ZFS. The most useful provide visibility in to the inner workings of ZFS. The most useful
of these tools are discussed in detail on the [Troubleshooting of these tools are discussed in detail on the [debugging ZFS wiki
page](https://openzfs.github.io/openzfs-docs/Basic%20Concepts/Troubleshooting.html). page](https://github.com/zfsonlinux/zfs/wiki/Debugging).
### Where can I ask for help? ### Where can I ask for help?
The [zfs-discuss mailing The [mailing list](https://github.com/zfsonlinux/zfs/wiki/Mailing-Lists)
list](https://openzfs.github.io/openzfs-docs/Project%20and%20Community/Mailing%20Lists.html) is the best place to ask for help.
or IRC are the best places to ask for help. Please do not file
support requests on the GitHub issue tracker.
## How Can I Contribute? ## How Can I Contribute?
### Reporting Bugs ### Reporting Bugs
*Please* contact us via the [zfs-discuss mailing *Please* contact us via the [mailing
list](https://openzfs.github.io/openzfs-docs/Project%20and%20Community/Mailing%20Lists.html) list](https://github.com/zfsonlinux/zfs/wiki/Mailing-Lists) if you aren't
or IRC if you aren't certain that you are experiencing a bug. certain that you are experiencing a bug.
If you run into an issue, please search our [issue If you run into an issue, please search our [issue
tracker](https://github.com/openzfs/zfs/issues) *first* to ensure the tracker](https://github.com/zfsonlinux/zfs/issues) *first* to ensure the
issue hasn't been reported before. Open a new issue only if you haven't issue hasn't been reported before. Open a new issue only if you haven't
found anything similar to your issue. found anything similar to your issue.
You can open a new issue and search existing issues using the public [issue You can open a new issue and search existing issues using the public [issue
tracker](https://github.com/openzfs/zfs/issues). tracker](https://github.com/zfsonlinux/zfs/issues).
#### When opening a new issue, please include the following information at the top of the issue: #### When opening a new issue, please include the following information at the top of the issue:
* What distribution (with version) you are using. * What distribution (with version) you are using.
@@ -108,13 +101,13 @@ information like:
* Stack traces which may be logged to `dmesg`. * Stack traces which may be logged to `dmesg`.
### Suggesting Enhancements ### Suggesting Enhancements
OpenZFS is a widely deployed production filesystem which is under active ZFS on Linux is a widely deployed production filesystem which is under
development. The team's primary focus is on fixing known issues, improving active development. The team's primary focus is on fixing known issues,
performance, and adding compelling new features. improving performance, and adding compelling new features.
You can view the list of proposed features You can view the list of proposed features
by filtering the issue tracker by the ["Type: Feature" by filtering the issue tracker by the ["Feature"
label](https://github.com/openzfs/zfs/issues?q=is%3Aopen+is%3Aissue+label%3A%22Type%3A+Feature%22). label](https://github.com/zfsonlinux/zfs/issues?q=is%3Aopen+is%3Aissue+label%3AFeature).
If you have an idea for a feature first check this list. If your idea already If you have an idea for a feature first check this list. If your idea already
appears then add a +1 to the top most comment, this helps us gauge interest appears then add a +1 to the top most comment, this helps us gauge interest
in that feature. in that feature.
@@ -123,11 +116,8 @@ Otherwise, open a new issue and describe your proposed feature. Why is this
feature needed? What problem does it solve? feature needed? What problem does it solve?
### Pull Requests ### Pull Requests
* All pull requests must be based on the current master branch and apply
#### General without conflicts.
* All pull requests, except backports and releases, must be based on the current master branch
and should apply without conflicts.
* Please attempt to limit pull requests to a single commit which resolves * Please attempt to limit pull requests to a single commit which resolves
one specific issue. one specific issue.
* Make sure your commit messages are in the correct format. See the * Make sure your commit messages are in the correct format. See the
@@ -139,28 +129,16 @@ logically independent patches which build on each other. This makes large
changes easier to review and approve which speeds up the merging process. changes easier to review and approve which speeds up the merging process.
* Try to keep pull requests simple. Simple code with comments is much easier * Try to keep pull requests simple. Simple code with comments is much easier
to review and approve. to review and approve.
* All proposed changes must be approved by an OpenZFS organization member.
* If you have an idea you'd like to discuss or which requires additional testing, consider opening it as a draft pull request.
Once everything is in good shape and the details have been worked out you can remove its draft status.
Any required reviews can then be finalized and the pull request merged.
#### Tests and Benchmarks
* Every pull request will by tested by the buildbot on multiple platforms by running the [zfs-tests.sh and zloop.sh](
https://openzfs.github.io/openzfs-docs/Developer%20Resources/Building%20ZFS.html#running-zloop-sh-and-zfs-tests-sh) test suites.
* To verify your changes conform to the [style guidelines](
https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#style-guides
), please run `make checkstyle` and resolve any warnings.
* Static code analysis of each pull request is performed by the buildbot; run `make lint` to check your changes.
* Test cases should be provided when appropriate. * Test cases should be provided when appropriate.
This includes making sure new features have adequate code coverage.
* If your pull request improves performance, please include some benchmarks. * If your pull request improves performance, please include some benchmarks.
* The pull request must pass all required [ZFS * The pull request must pass all required [ZFS
Buildbot](http://build.zfsonlinux.org/) builders before Buildbot](http://build.zfsonlinux.org/) builders before
being accepted. If you are experiencing intermittent TEST being accepted. If you are experiencing intermittent TEST
builder failures, you may be experiencing a [test suite builder failures, you may be experiencing a [test suite
issue](https://github.com/openzfs/zfs/issues?q=is%3Aissue+is%3Aopen+label%3A%22Type%3A+Test+Suite%22). issue](https://github.com/zfsonlinux/zfs/issues?q=is%3Aissue+is%3Aopen+label%3A%22Test+Suite%22).
There are also various [buildbot options](https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.html) There are also various [buildbot options](https://github.com/zfsonlinux/zfs/wiki/Buildbot-Options)
to control how changes are tested. to control how changes are tested.
* All proposed changes must be approved by a ZFS on Linux organization member.
### Testing ### Testing
All help is appreciated! If you're in a position to run the latest code All help is appreciated! If you're in a position to run the latest code
@@ -170,41 +148,16 @@ range of realistic workloads, configurations and architectures we're better
able quickly identify and resolve potential issues. able quickly identify and resolve potential issues.
Users can also run the [ZFS Test Users can also run the [ZFS Test
Suite](https://github.com/openzfs/zfs/tree/master/tests) on their systems Suite](https://github.com/zfsonlinux/zfs/tree/master/tests) on their systems
to verify ZFS is behaving as intended. to verify ZFS is behaving as intended.
## Style Guides ## Style Guides
### Repository Structure
OpenZFS uses a standardised branching structure.
- The "development and main branch", is the branch all development should be based on.
- "Release branches" contain the latest released code for said version.
- "Staging branches" contain selected commits prior to being released.
**Branch Names:**
- Development and Main branch: `master`
- Release branches: `zfs-$VERSION-release`
- Staging branches: `zfs-$VERSION-staging`
`$VERSION` should be replaced with the `major.minor` version number.
_(This is the version number without the `.patch` version at the end)_
### Coding Conventions ### Coding Conventions
We currently use [C Style and Coding Standards for We currently use [C Style and Coding Standards for
SunOS](http://www.cis.upenn.edu/%7Elee/06cse480/data/cstyle.ms.pdf) as our SunOS](http://www.cis.upenn.edu/%7Elee/06cse480/data/cstyle.ms.pdf) as our
coding convention. coding convention.
This repository has an `.editorconfig` file. If your editor [supports
editorconfig](https://editorconfig.org/#download), it will
automatically respect most of this project's whitespace preferences.
Additionally, Git can help warn on whitespace problems as well:
```
git config --local core.whitespace trailing-space,space-before-tab,indent-with-non-tab,-tab-in-indent
```
### Commit Message Formats ### Commit Message Formats
#### New Changes #### New Changes
Commit messages for new changes must meet the following guidelines: Commit messages for new changes must meet the following guidelines:
@@ -214,10 +167,18 @@ first line in the commit message.
please summarize important information such as why the proposed please summarize important information such as why the proposed
approach was chosen or a brief description of the bug you are resolving. approach was chosen or a brief description of the bug you are resolving.
Each line of the body must be 72 characters or less. Each line of the body must be 72 characters or less.
* The last line must be a `Signed-off-by:` tag. See the * The last line must be a `Signed-off-by:` tag with the developer's
[Signed Off By](#signed-off-by) section for more information. name followed by their email. This is the developer's certification
that they have the right to submit the patch for inclusion into
the code base and indicates agreement to the [Developer's Certificate
of Origin](https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin).
Code without a proper signoff cannot be merged.
An example commit message for new changes is provided below. Git can append the `Signed-off-by` line to your commit messages. Simply
provide the `-s` or `--signoff` option when performing a `git commit`.
For more information about writing commit messages, visit [How to Write
a Git Commit Message](https://chris.beams.io/posts/git-commit/).
An example commit message is provided below.
``` ```
This line is a brief summary of your change This line is a brief summary of your change
@@ -230,52 +191,35 @@ attempting to solve.
Signed-off-by: Contributor <contributor@email.com> Signed-off-by: Contributor <contributor@email.com>
``` ```
#### Coverity Defect Fixes #### OpenZFS Patch Ports
If you are submitting a fix to a If you are porting an OpenZFS patch, the commit message must meet
[Coverity defect](https://scan.coverity.com/projects/zfsonlinux-zfs), the following guidelines:
the commit message should meet the following guidelines: * The first line must be the summary line from the OpenZFS commit.
* Provides a subject line in the format of It must begin with `OpenZFS dddd - ` where `dddd` is the OpenZFS issue number.
`Fix coverity defects: CID dddd, dddd...` where `dddd` represents * Provides a `Authored by:` line to attribute the patch to the original author.
each CID fixed by the commit. * Provides the `Reviewed by:` and `Approved by:` lines from the original
* Provides a body which lists each Coverity defect and how it was corrected. OpenZFS commit.
* The last line must be a `Signed-off-by:` tag. See the * Provides a `Ported-by:` line with the developer's name followed by
[Signed Off By](#signed-off-by) section for more information. their email.
* Provides a `OpenZFS-issue:` line which is a link to the original illumos
issue.
* Provides a `OpenZFS-commit:` line which links back to the original OpenZFS
commit.
* If necessary, provide some porting notes to describe any deviations from
the original OpenZFS commit.
An example Coverity defect fix commit message is provided below. An example OpenZFS patch port commit message is provided below.
``` ```
Fix coverity defects: CID 12345, 67890 OpenZFS 1234 - Summary from the original OpenZFS commit
CID 12345: Logically dead code (DEADCODE) Authored by: Original Author <original@email.com>
Reviewed by: Reviewer One <reviewer1@email.com>
Reviewed by: Reviewer Two <reviewer2@email.com>
Approved by: Approver One <approver1@email.com>
Ported-by: ZFS Contributor <contributor@email.com>
Removed the if(var != 0) block because the condition could never be Provide some porting notes here if necessary.
satisfied.
CID 67890: Resource Leak (RESOURCE_LEAK) OpenZFS-issue: https://www.illumos.org/issues/1234
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/abcd1234
Ensure free is called after allocating memory in function().
Signed-off-by: Contributor <contributor@email.com>
``` ```
#### Signed Off By
A line tagged as `Signed-off-by:` must contain the developer's
name followed by their email. This is the developer's certification
that they have the right to submit the patch for inclusion into
the code base and indicates agreement to the [Developer's Certificate
of Origin](https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin).
Code without a proper signoff cannot be merged.
Git can append the `Signed-off-by` line to your commit messages. Simply
provide the `-s` or `--signoff` option when performing a `git commit`.
For more information about writing commit messages, visit [How to Write
a Git Commit Message](https://chris.beams.io/posts/git-commit/).
#### Co-authored By
If someone else had part in your pull request, please add the following to the commit:
`Co-authored-by: Name <gitregistered@email.address>`
This is useful if their authorship was lost during squashing, rebasing, etc.,
but may be used in any situation where there are co-authors.
The email address used here should be the same as on the GitHub profile of said user.
If said user does not have their email address public, please use the following instead:
`Co-authored-by: Name <[username]@users.noreply.github.com>`
+46
View File
@@ -0,0 +1,46 @@
<!--
Thank you for reporting an issue.
*IMPORTANT* - Please search our issue tracker *before* making a new issue.
If you cannot find a similar issue, then create a new issue.
https://github.com/zfsonlinux/zfs/issues
*IMPORTANT* - This issue tracker is for *bugs* and *issues* only.
Please search the wiki and the mailing list archives before asking
questions on the mailing list.
https://github.com/zfsonlinux/zfs/wiki/Mailing-Lists
Please fill in as much of the template as possible.
-->
### System information
<!-- add version after "|" character -->
Type | Version/Name
--- | ---
Distribution Name |
Distribution Version |
Linux Kernel |
Architecture |
ZFS Version |
SPL Version |
<!--
Commands to find ZFS/SPL versions:
modinfo zfs | grep -iw version
modinfo spl | grep -iw version
-->
### Describe the problem you're observing
### Describe how to reproduce the problem
### Include any warning/errors/backtraces from the system logs
<!--
*IMPORTANT* - Please mark logs and text output from terminal commands
or else Github will not display them correctly.
An example is provided below.
Example:
```
this is an example how log text should be marked (wrap it with ```)
```
-->
-55
View File
@@ -1,55 +0,0 @@
---
name: Bug report
about: Create a report to help us improve OpenZFS
title: ''
labels: 'Type: Defect'
assignees: ''
---
<!-- Please fill out the following template, which will help other contributors address your issue. -->
<!--
Thank you for reporting an issue.
*IMPORTANT* - Please check our issue tracker before opening a new issue.
Additional valuable information can be found in the OpenZFS documentation
and mailing list archives.
Please fill in as much of the template as possible.
-->
### System information
<!-- add version after "|" character -->
Type | Version/Name
--- | ---
Distribution Name |
Distribution Version |
Kernel Version |
Architecture |
OpenZFS Version |
<!--
Command to find OpenZFS version:
zfs version
Commands to find kernel version:
uname -r # Linux
freebsd-version -r # FreeBSD
-->
### Describe the problem you're observing
### Describe how to reproduce the problem
### Include any warning/errors/backtraces from the system logs
<!--
*IMPORTANT* - Please mark logs and text output from terminal commands
or else Github will not display them correctly.
An example is provided below.
Example:
```
this is an example how log text should be marked (wrap it with ```)
```
-->
-14
View File
@@ -1,14 +0,0 @@
blank_issues_enabled: false
contact_links:
- name: OpenZFS Questions
url: https://github.com/openzfs/zfs/discussions/new
about: Ask the community for help
- name: OpenZFS Community Support Mailing list (Linux)
url: https://zfsonlinux.topicbox.com/groups/zfs-discuss
about: Get community support for OpenZFS on Linux
- name: FreeBSD Community Support Mailing list
url: https://lists.freebsd.org/mailman/listinfo/freebsd-fs
about: Get community support for OpenZFS on FreeBSD
- name: OpenZFS on IRC
url: https://web.libera.chat/#openzfs
about: Use IRC to get community support for OpenZFS
-33
View File
@@ -1,33 +0,0 @@
---
name: Feature request
about: Suggest a feature for OpenZFS
title: ''
labels: 'Type: Feature'
assignees: ''
---
<!--
Thank you for suggesting a feature.
Please check our issue tracker before opening a new feature request.
Filling out the following template will help other contributors better understand your proposed feature.
-->
### Describe the feature would like to see added to OpenZFS
<!--
Provide a clear and concise description of the feature.
-->
### How will this feature improve OpenZFS?
<!--
What problem does this feature solve?
-->
### Additional context
<!--
Any additional information you can add about the proposal?
-->
+10 -13
View File
@@ -1,25 +1,22 @@
<!--- Please fill out the following template, which will help other contributors review your Pull Request. -->
<!--- Provide a general summary of your changes in the Title above --> <!--- Provide a general summary of your changes in the Title above -->
<!--- <!---
Documentation on ZFS Buildbot options can be found at Documentation on ZFS Buildbot options can be found at
https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.html https://github.com/zfsonlinux/zfs/wiki/Buildbot-Options
--> -->
### Description
<!--- Describe your changes in detail -->
### Motivation and Context ### Motivation and Context
<!--- Why is this change required? What problem does it solve? --> <!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. --> <!--- If it fixes an open issue, please link to the issue here. -->
### Description
<!--- Describe your changes in detail -->
### How Has This Been Tested? ### How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. --> <!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran to --> <!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. --> <!--- see how your change affects other areas of the code, etc. -->
<!--- If your change is a performance enhancement, please provide benchmarks here. --> <!--- If your change is a performance enhancement, please provide benchmarks here. -->
<!--- Please think about using the draft PR feature if appropriate -->
### Types of changes ### Types of changes
<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: --> <!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
@@ -28,15 +25,15 @@ https://openzfs.github.io/openzfs-docs/Developer%20Resources/Buildbot%20Options.
- [ ] Performance enhancement (non-breaking change which improves efficiency) - [ ] Performance enhancement (non-breaking change which improves efficiency)
- [ ] Code cleanup (non-breaking change which makes code smaller or more readable) - [ ] Code cleanup (non-breaking change which makes code smaller or more readable)
- [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Library ABI change (libzfs, libzfs\_core, libnvpair, libuutil and libzfsbootenv)
- [ ] Documentation (a change to man pages or other documentation) - [ ] Documentation (a change to man pages or other documentation)
### Checklist: ### Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. --> <!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! --> <!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
- [ ] My code follows the OpenZFS [code style requirements](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#coding-conventions). - [ ] My code follows the ZFS on Linux code style requirements.
- [ ] I have updated the documentation accordingly. - [ ] I have updated the documentation accordingly.
- [ ] I have read the [**contributing** document](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md). - [ ] I have read the **CONTRIBUTING** document.
- [ ] I have added [tests](https://github.com/openzfs/zfs/tree/master/tests) to cover my changes. - [ ] I have added tests to cover my changes.
- [ ] I have run the ZFS Test Suite with this change applied. - [ ] All new and existing tests passed.
- [ ] All commit messages are properly formatted and contain [`Signed-off-by`](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#signed-off-by). - [ ] All commit messages are properly formatted and contain `Signed-off-by`.
- [ ] Change has been approved by a ZFS on Linux member.
+17 -12
View File
@@ -1,25 +1,30 @@
codecov: codecov:
notify: notify:
require_ci_to_pass: false # always post require_ci_to_pass: no
after_n_builds: 2 # user and kernel
coverage: coverage:
precision: 0 # 0 decimals of precision precision: 2
round: nearest # Round to nearest precision point round: down
range: "50...90" # red -> yellow -> green range: "50...100"
status: status:
project: project:
default: default:
threshold: 1% # allow 1% coverage variance threshold: 1%
patch: patch:
default: default:
threshold: 1% # allow 1% coverage variance threshold: 1%
parsers:
gcov:
branch_detection:
conditional: yes
loop: yes
method: no
macro: no
comment: comment:
layout: "reach, diff, flags, footer" layout: "header, sunburst, diff"
behavior: once # update if exists; post new; skip if deleted behavior: default
require_changes: yes # only post when coverage changes require_changes: no
# ignore: Please place any ignores in config/ax_code_coverage.m4 instead
-13
View File
@@ -1,13 +0,0 @@
# Configuration for probot-no-response - https://github.com/probot/no-response
# Number of days of inactivity before an Issue is closed for lack of response
daysUntilClose: 31
# Label requiring a response
responseRequiredLabel: "Status: Feedback requested"
# Comment to post when closing an Issue for lack of response. Set to `false` to disable
closeComment: >
This issue has been automatically closed because there has been no response
to our request for more information from the original author. With only the
information that is currently in the issue, we don't have enough information
to take action. Please reach out if you have or find the answers we need so
that we can investigate further.
-26
View File
@@ -1,26 +0,0 @@
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 365
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 90
# Limit to only `issues` or `pulls`
only: issues
# Issues with these labels will never be considered stale
exemptLabels:
- "Type: Feature"
- "Bot: Not Stale"
- "Status: Work in Progress"
# Set to true to ignore issues in a project (defaults to false)
exemptProjects: true
# Set to true to ignore issues in a milestone (defaults to false)
exemptMilestones: true
# Set to true to ignore issues with an assignee (defaults to false)
exemptAssignees: true
# Label to use when marking an issue as stale
staleLabel: "Status: Stale"
# Comment to post when marking an issue as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as "stale" because it has not had
any activity for a while. It will be closed in 90 days if no further activity occurs.
Thank you for your contributions.
# Limit the number of actions per hour, from 1-30. Default is 30
limitPerRun: 6
+3
View File
@@ -0,0 +1,3 @@
preprocessorErrorDirective:./module/zfs/vdev_raidz_math_avx512f.c:243
preprocessorErrorDirective:./module/zfs/vdev_raidz_math_sse2.c:266
-50
View File
@@ -1,50 +0,0 @@
name: checkstyle
on:
push:
pull_request:
jobs:
checkstyle:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install --yes -qq build-essential autoconf libtool gawk alien fakeroot linux-headers-$(uname -r)
sudo apt-get install --yes -qq zlib1g-dev uuid-dev libattr1-dev libblkid-dev libselinux-dev libudev-dev libssl-dev python-dev python-setuptools python-cffi python3 python3-dev python3-setuptools python3-cffi
# packages for tests
sudo apt-get install --yes -qq parted lsscsi ksh attr acl nfs-kernel-server fio
sudo apt-get install --yes -qq mandoc cppcheck pax-utils devscripts
sudo -E pip --quiet install flake8
- name: Prepare
run: |
sh ./autogen.sh
./configure
make -j$(nproc)
- name: Checkstyle
run: |
make checkstyle
- name: Lint
run: |
make lint
- name: CheckABI
id: CheckABI
run: |
sudo docker run -v $(pwd):/source ghcr.io/openzfs/libabigail make checkabi
- name: StoreABI
if: failure() && steps.CheckABI.outcome == 'failure'
run: |
sudo docker run -v $(pwd):/source ghcr.io/openzfs/libabigail make storeabi
- name: Prepare artifacts
if: failure() && steps.CheckABI.outcome == 'failure'
run: |
find -name *.abi | tar -cf abi_files.tar -T -
- uses: actions/upload-artifact@v3
if: failure() && steps.CheckABI.outcome == 'failure'
with:
name: New ABI files (use only if you're sure about interface changes)
path: abi_files.tar
@@ -1,83 +0,0 @@
name: zfs-tests-functional
on:
push:
pull_request:
jobs:
tests-functional-ubuntu:
strategy:
fail-fast: false
matrix:
os: [20.04]
runs-on: ubuntu-${{ matrix.os }}
steps:
- uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install --yes -qq build-essential autoconf libtool gdb lcov \
git alien fakeroot wget curl bc fio acl \
sysstat mdadm lsscsi parted gdebi attr dbench watchdog ksh \
nfs-kernel-server samba rng-tools xz-utils \
zlib1g-dev uuid-dev libblkid-dev libselinux-dev \
xfslibs-dev libattr1-dev libacl1-dev libudev-dev libdevmapper-dev \
libssl-dev libffi-dev libaio-dev libelf-dev libmount-dev \
libpam0g-dev pamtester python-dev python-setuptools python-cffi \
python3 python3-dev python3-setuptools python3-cffi python3-packaging \
libcurl4-openssl-dev
- name: Autogen.sh
run: |
sh autogen.sh
- name: Configure
run: |
./configure --enable-debug --enable-debuginfo
- name: Make
run: |
make --no-print-directory -s pkg-utils pkg-kmod
- name: Install
run: |
sudo dpkg -i *.deb
# Update order of directories to search for modules, otherwise
# Ubuntu will load kernel-shipped ones.
sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
sudo depmod
sudo modprobe zfs
# Workaround for cloud-init bug
# see https://github.com/openzfs/zfs/issues/12644
FILE=/lib/udev/rules.d/10-cloud-init-hook-hotplug.rules
if [ -r "${FILE}" ]; then
HASH=$(md5sum "${FILE}" | awk '{ print $1 }')
if [ "${HASH}" = "121ff0ef1936cd2ef65aec0458a35772" ]; then
# Just shove a zd* exclusion right above the hotplug hook...
sudo sed -i -e s/'LABEL="cloudinit_hook"'/'KERNEL=="zd*", GOTO="cloudinit_end"\n&'/ "${FILE}"
sudo udevadm control --reload-rules
fi
fi
# Workaround to provide additional free space for testing.
# https://github.com/actions/virtual-environments/issues/2840
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- name: Tests
run: |
/usr/share/zfs/zfs-tests.sh -vR -s 3G
timeout-minutes: 330
- name: Prepare artifacts
if: failure()
run: |
RESULTS_PATH=$(readlink -f /var/tmp/test_results/current)
sudo dmesg > $RESULTS_PATH/dmesg
sudo cp /var/log/syslog $RESULTS_PATH/
sudo chmod +r $RESULTS_PATH/*
# Replace ':' in dir names, actions/upload-artifact doesn't support it
for f in $(find /var/tmp/test_results -name '*:*'); do mv "$f" "${f//:/__}"; done
- uses: actions/upload-artifact@v3
if: failure()
with:
name: Test logs Ubuntu-${{ matrix.os }}
path: /var/tmp/test_results/20*/
if-no-files-found: ignore
-79
View File
@@ -1,79 +0,0 @@
name: zfs-tests-sanity
on:
push:
pull_request:
jobs:
tests:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install --yes -qq build-essential autoconf libtool gdb lcov \
git alien fakeroot wget curl bc fio acl \
sysstat mdadm lsscsi parted gdebi attr dbench watchdog ksh \
nfs-kernel-server samba rng-tools xz-utils \
zlib1g-dev uuid-dev libblkid-dev libselinux-dev \
xfslibs-dev libattr1-dev libacl1-dev libudev-dev libdevmapper-dev \
libssl-dev libffi-dev libaio-dev libelf-dev libmount-dev \
libpam0g-dev pamtester python-dev python-setuptools python-cffi \
python3 python3-dev python3-setuptools python3-cffi python3-packaging \
libcurl4-openssl-dev
- name: Autogen.sh
run: |
sh autogen.sh
- name: Configure
run: |
./configure --enable-debug --enable-debuginfo
- name: Make
run: |
make --no-print-directory -s pkg-utils pkg-kmod
- name: Install
run: |
sudo dpkg -i *.deb
# Update order of directories to search for modules, otherwise
# Ubuntu will load kernel-shipped ones.
sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
sudo depmod
sudo modprobe zfs
# Workaround for cloud-init bug
# see https://github.com/openzfs/zfs/issues/12644
FILE=/lib/udev/rules.d/10-cloud-init-hook-hotplug.rules
if [ -r "${FILE}" ]; then
HASH=$(md5sum "${FILE}" | awk '{ print $1 }')
if [ "${HASH}" = "121ff0ef1936cd2ef65aec0458a35772" ]; then
# Just shove a zd* exclusion right above the hotplug hook...
sudo sed -i -e s/'LABEL="cloudinit_hook"'/'KERNEL=="zd*", GOTO="cloudinit_end"\n&'/ "${FILE}"
sudo udevadm control --reload-rules
fi
fi
# Workaround to provide additional free space for testing.
# https://github.com/actions/virtual-environments/issues/2840
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- name: Tests
run: |
/usr/share/zfs/zfs-tests.sh -vR -s 3G -r sanity
timeout-minutes: 330
- name: Prepare artifacts
if: failure()
run: |
RESULTS_PATH=$(readlink -f /var/tmp/test_results/current)
sudo dmesg > $RESULTS_PATH/dmesg
sudo cp /var/log/syslog $RESULTS_PATH/
sudo chmod +r $RESULTS_PATH/*
# Replace ':' in dir names, actions/upload-artifact doesn't support it
for f in $(find /var/tmp/test_results -name '*:*'); do mv "$f" "${f//:/__}"; done
- uses: actions/upload-artifact@v3
if: failure()
with:
name: Test logs
path: /var/tmp/test_results/20*/
if-no-files-found: ignore
-67
View File
@@ -1,67 +0,0 @@
name: zloop
on:
push:
pull_request:
jobs:
tests:
runs-on: ubuntu-20.04
env:
TEST_DIR: /var/tmp/zloop
steps:
- uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install --yes -qq build-essential autoconf libtool gdb \
git alien fakeroot \
zlib1g-dev uuid-dev libblkid-dev libselinux-dev \
xfslibs-dev libattr1-dev libacl1-dev libudev-dev libdevmapper-dev \
libssl-dev libffi-dev libaio-dev libelf-dev libmount-dev \
libpam0g-dev \
python-dev python-setuptools python-cffi python-packaging \
python3 python3-dev python3-setuptools python3-cffi python3-packaging
- name: Autogen.sh
run: |
sh autogen.sh
- name: Configure
run: |
./configure --enable-debug --enable-debuginfo
- name: Make
run: |
make --no-print-directory -s pkg-utils pkg-kmod
- name: Install
run: |
sudo dpkg -i *.deb
# Update order of directories to search for modules, otherwise
# Ubuntu will load kernel-shipped ones.
sudo sed -i.bak 's/updates/extra updates/' /etc/depmod.d/ubuntu.conf
sudo depmod
sudo modprobe zfs
- name: Tests
run: |
sudo mkdir -p $TEST_DIR
# run for 20 minutes to have a total runner time of 30 minutes
sudo /usr/share/zfs/zloop.sh -t 1200 -l -m1 -- -T 120 -P 60
- name: Prepare artifacts
if: failure()
run: |
sudo chmod +r -R $TEST_DIR/
- uses: actions/upload-artifact@v3
if: failure()
with:
name: Logs
path: |
/var/tmp/zloop/*/
!/var/tmp/zloop/*/vdev/
if-no-files-found: ignore
- uses: actions/upload-artifact@v3
if: failure()
with:
name: Pool files
path: |
/var/tmp/zloop/*/vdev/
if-no-files-found: ignore
+2 -11
View File
@@ -14,7 +14,6 @@
# Normal rules # Normal rules
# #
*.[oa] *.[oa]
*.o.ur-safe
*.lo *.lo
*.la *.la
*.mod.c *.mod.c
@@ -22,8 +21,6 @@
*.swp *.swp
*.gcno *.gcno
*.gcda *.gcda
*.pyc
*.pyo
.deps .deps
.libs .libs
.dirstamp .dirstamp
@@ -36,7 +33,6 @@ Makefile.in
# Top level generated files specific to this top level dir # Top level generated files specific to this top level dir
# #
/bin /bin
/build
/configure /configure
/config.log /config.log
/config.status /config.status
@@ -45,6 +41,8 @@ Makefile.in
/zfs_config.h.in /zfs_config.h.in
/zfs.release /zfs.release
/stamp-h1 /stamp-h1
/.script-config
/zfs-script-config.sh
/aclocal.m4 /aclocal.m4
/autom4te.cache /autom4te.cache
@@ -61,10 +59,3 @@ cscope.*
*.tar.gz *.tar.gz
*.patch *.patch
*.orig *.orig
*.log
*.tmp
venv
*.so
*.so.debug
*.so.full
+1 -1
View File
@@ -1,3 +1,3 @@
[submodule "scripts/zfs-images"] [submodule "scripts/zfs-images"]
path = scripts/zfs-images path = scripts/zfs-images
url = https://github.com/openzfs/zfs-images url = https://github.com/zfsonlinux/zfs-images
+90 -303
View File
@@ -1,308 +1,95 @@
MAINTAINERS: Brian Behlendorf is the principal developer of the ZFS on Linux port.
He works full time as a computer scientist at Lawrence Livermore
National Laboratory on the ZFS and Lustre filesystems. However,
this port would not have been possible without the help of many
others who have contributed their time, effort, and insight.
Brian Behlendorf <behlendorf1@llnl.gov> Brian Behlendorf <behlendorf1@llnl.gov>
Tony Hutter <hutter2@llnl.gov>
PAST MAINTAINERS: First and foremost the hard working ZFS developers at Sun/Oracle.
They are responsible for the bulk of the code in this project and
without their efforts there never would have been a ZFS filesystem.
Ned Bass <bass6@llnl.gov> The ZFS Development Team at Sun/Oracle
CONTRIBUTORS: Next all the developers at KQ Infotech who implemented a prototype
ZFS Posix Layer (ZPL). Their implementation provided an excellent
reference for adding the ZPL functionality.
Aaron Fineman <abyxcos@gmail.com> Anand Mitra <mitra@kqinfotech.com>
Adam Leventhal <ahl@delphix.com> Anurag Agarwal <anurag@kqinfotech.com>
Adam Stevko <adam.stevko@gmail.com> Neependra Khare <neependra@kqinfotech.com>
Ahmed G <ahmedg@delphix.com> Prasad Joshi <prasad@kqinfotech.com>
Akash Ayare <aayare@delphix.com> Rohan Puri <rohan@kqinfotech.com>
Alan Somers <asomers@gmail.com> Sandip Divekar <sandipd@kqinfotech.com>
Alar Aun <spamtoaun@gmail.com> Shoaib <shoaib@kqinfotech.com>
Albert Lee <trisk@nexenta.com> Shrirang <shrirang@kqinfotech.com>
Alec Salazar <alec.j.salazar@gmail.com>
Alejandro R. Sedeño <asedeno@mit.edu> Additionally the following individuals have all made contributions
Alek Pinchuk <alek@nexenta.com> to the project and deserve to be acknowledged.
Alex Braunegg <alex.braunegg@gmail.com>
Alex McWhirter <alexmcwhirter@triadic.us> Albert Lee <trisk@nexenta.com>
Alex Reece <alex@delphix.com> Alejandro R. Sedeño <asedeno@mit.edu>
Alex Wilson <alex.wilson@joyent.com> Alex Zhuravlev <bzzz@whamcloud.com>
Alex Zhuravlev <alexey.zhuravlev@intel.com> Alexander Eremin <a.eremin@nexenta.com>
Alexander Eremin <a.eremin@nexenta.com> Alexander Stetsenko <ams@nexenta.com>
Alexander Motin <mav@freebsd.org> Alexey Shvetsov <alexxy@gentoo.org>
Alexander Pyhalov <apyhalov@gmail.com> Andreas Dilger <adilger@whamcloud.com>
Alexander Stetsenko <ams@nexenta.com> Andrew Reid <ColdCanuck@nailedtotheperch.com>
Alexey Shvetsov <alexxy@gentoo.org> Andrew Stormont <andrew.stormont@nexenta.com>
Alexey Smirnoff <fling@member.fsf.org> Andrew Tselischev <andrewtselischev@gmail.com>
Allan Jude <allanjude@freebsd.org> Andriy Gapon <avg@FreeBSD.org>
AndCycle <andcycle@andcycle.idv.tw> Aniruddha Shankar <k@191a.net>
Andreas Buschmann <andreas.buschmann@tech.net.de> Bill Pijewski <wdp@joyent.com>
Andreas Dilger <adilger@intel.com> Chris Dunlap <cdunlap@llnl.gov>
Andrew Barnes <barnes333@gmail.com> Chris Dunlop <chris@onthe.net.au>
Andrew Hamilton <ahamilto@tjhsst.edu> Chris Siden <chris.siden@delphix.com>
Andrew Reid <ColdCanuck@nailedtotheperch.com> Chris Wedgwood <cw@f00f.org>
Andrew Stormont <andrew.stormont@nexenta.com> Christian Kohlschütter <christian@kohlschutter.com>
Andrew Tselischev <andrewtselischev@gmail.com> Christopher Siden <chris.siden@delphix.com>
Andrey Vesnovaty <andrey.vesnovaty@gmail.com> Craig Sanders <github@taz.net.au>
Andriy Gapon <avg@freebsd.org> Cyril Plisko <cyril.plisko@mountall.com>
Andy Bakun <github@thwartedefforts.org> Dan McDonald <danmcd@nexenta.com>
Aniruddha Shankar <k@191a.net> Daniel Verite <daniel@verite.pro>
Antonio Russo <antonio.e.russo@gmail.com> Darik Horn <dajhorn@vanadac.com>
Arkadiusz Bubała <arkadiusz.bubala@open-e.com> Eric Schrock <Eric.Schrock@delphix.com>
Arne Jansen <arne@die-jansens.de> Etienne Dechamps <etienne.dechamps@ovh.net>
Aron Xu <happyaron.xu@gmail.com> Fajar A. Nugraha <github@fajar.net>
Bart Coddens <bart.coddens@gmail.com> Frederik Wessels <wessels147@gmail.com>
Basil Crow <basil.crow@delphix.com> Garrett D'Amore <garrett@nexenta.com>
Huang Liu <liu.huang@zte.com.cn> George Wilson <george.wilson@delphix.com>
Ben Allen <bsallen@alcf.anl.gov> Gordon Ross <gwr@nexenta.com>
Ben Rubson <ben.rubson@gmail.com> Gregor Kopka <mailfrom-github.com@kopka.net>
Benjamin Albrecht <git@albrecht.io> Gunnar Beutner <gunnar@beutner.name>
Bill McGonigle <bill-github.com-public1@bfccomputing.com> James H <james@kagisoft.co.uk>
Bill Pijewski <wdp@joyent.com> Javen Wu <wu.javen@gmail.com>
Boris Protopopov <boris.protopopov@nexenta.com> Jeremy Gill <jgill@parallax-innovations.com>
Brad Lewis <brad.lewis@delphix.com> Jorgen Lundman <lundman@lundman.net>
Brian Behlendorf <behlendorf1@llnl.gov> KORN Andras <korn@elan.rulez.org>
Brian J. Murrell <brian@sun.com> Kyle Fuller <inbox@kylefuller.co.uk>
Caleb James DeLisle <calebdelisle@lavabit.com> Manuel Amador (Rudd-O) <rudd-o@rudd-o.com>
Cao Xuewen <cao.xuewen@zte.com.cn> Martin Matuska <mm@FreeBSD.org>
Carlo Landmeter <clandmeter@gmail.com> Massimo Maggi <massimo@mmmm.it>
Carlos Alberto Lopez Perez <clopez@igalia.com> Matthew Ahrens <mahrens@delphix.com>
Chaoyu Zhang <zhang.chaoyu@zte.com.cn> Michael Martin <mgmartin.mgm@gmail.com>
Chen Can <chen.can2@zte.com.cn> Mike Harsch <mike@harschsystems.com>
Chen Haiquan <oc@yunify.com> Ned Bass <bass6@llnl.gov>
Chip Parker <aparker@enthought.com> Oleg Stepura <oleg@stepura.com>
Chris Burroughs <chris.burroughs@gmail.com> P.SCH <p88@yahoo.com>
Chris Dunlap <cdunlap@llnl.gov> Pawel Jakub Dawidek <pawel@dawidek.net>
Chris Dunlop <chris@onthe.net.au> Prakash Surya <surya1@llnl.gov>
Chris Siden <chris.siden@delphix.com> Prasad Joshi <pjoshi@stec-inc.com>
Chris Wedgwood <cw@f00f.org> Ricardo M. Correia <Ricardo.M.Correia@Sun.COM>
Chris Williamson <chris.williamson@delphix.com> Richard Laager <rlaager@wiktel.com>
Chris Zubrzycki <github@mid-earth.net> Richard Lowe <richlowe@richlowe.net>
Christ Schlacta <aarcane@aarcane.info> Richard Yao <ryao@cs.stonybrook.edu>
Christer Ekholm <che@chrekh.se> Rohan Puri <rohan.puri15@gmail.com>
Christian Kohlschütter <christian@kohlschutter.com> Shampavman <sham.pavman@nexenta.com>
Christian Neukirchen <chneukirchen@gmail.com> Simon Klinkert <klinkert@webgods.de>
Christian Schwarz <me@cschwarz.com> Suman Chakravartula <suman@gogrid.com>
Christopher Voltz <cjunk@voltz.ws> Tim Haley <Tim.Haley@Sun.COM>
Chunwei Chen <david.chen@nutanix.com> Turbo Fredriksson <turbo@bayour.com>
Clemens Fruhwirth <clemens@endorphin.org> Xin Li <delphij@FreeBSD.org>
Coleman Kane <ckane@colemankane.org> Yuxuan Shui <yshuiv7@gmail.com>
Colin Ian King <colin.king@canonical.com> Zachary Bedell <zac@thebedells.org>
Craig Loomis <cloomis@astro.princeton.edu> nordaux <nordaux@gmail.com>
Craig Sanders <github@taz.net.au>
Cyril Plisko <cyril.plisko@infinidat.com>
DHE <git@dehacked.net>
Damian Wojsław <damian@wojslaw.pl>
Dan Kimmel <dan.kimmel@delphix.com>
Dan McDonald <danmcd@nexenta.com>
Dan Swartzendruber <dswartz@druber.com>
Dan Vatca <dan.vatca@gmail.com>
Daniel Hoffman <dj.hoffman@delphix.com>
Daniel Verite <daniel@verite.pro>
Daniil Lunev <d.lunev.mail@gmail.com>
Darik Horn <dajhorn@vanadac.com>
Dave Eddy <dave@daveeddy.com>
David Lamparter <equinox@diac24.net>
David Qian <david.qian@intel.com>
David Quigley <david.quigley@intel.com>
Debabrata Banerjee <dbanerje@akamai.com>
Denys Rtveliashvili <denys@rtveliashvili.name>
Derek Dai <daiderek@gmail.com>
Dimitri John Ledkov <xnox@ubuntu.com>
Dmitry Khasanov <pik4ez@gmail.com>
Dominik Hassler <hadfl@omniosce.org>
Dominik Honnef <dominikh@fork-bomb.org>
Don Brady <don.brady@delphix.com>
Dr. András Korn <korn-github.com@elan.rulez.org>
Eli Rosenthal <eli.rosenthal@delphix.com>
Eric Desrochers <eric.desrochers@canonical.com>
Eric Dillmann <eric@jave.fr>
Eric Schrock <Eric.Schrock@delphix.com>
Etienne Dechamps <etienne@edechamps.fr>
Evan Susarret <evansus@gmail.com>
Fabian Grünbichler <f.gruenbichler@proxmox.com>
Fajar A. Nugraha <github@fajar.net>
Fan Yong <fan.yong@intel.com>
Feng Sun <loyou85@gmail.com>
Frederik Wessels <wessels147@gmail.com>
Frédéric Vanniere <f.vanniere@planet-work.com>
Garrett D'Amore <garrett@nexenta.com>
Garrison Jensen <garrison.jensen@gmail.com>
Gary Mills <gary_mills@fastmail.fm>
Gaurav Kumar <gauravk.18@gmail.com>
GeLiXin <ge.lixin@zte.com.cn>
George Amanakis <g_amanakis@yahoo.com>
George Melikov <mail@gmelikov.ru>
George Wilson <gwilson@delphix.com>
Georgy Yakovlev <ya@sysdump.net>
Giuseppe Di Natale <guss80@gmail.com>
Gordan Bobic <gordan@redsleeve.org>
Gordon Ross <gwr@nexenta.com>
Gregor Kopka <gregor@kopka.net>
Grischa Zengel <github.zfsonlinux@zengel.info>
Gunnar Beutner <gunnar@beutner.name>
Gvozden Neskovic <neskovic@gmail.com>
Hajo Möller <dasjoe@gmail.com>
Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Håkan Johansson <f96hajo@chalmers.se>
Igor Kozhukhov <ikozhukhov@gmail.com>
Igor Lvovsky <ilvovsky@gmail.com>
Isaac Huang <he.huang@intel.com>
JK Dingwall <james@dingwall.me.uk>
Jacek Fefliński <feflik@gmail.com>
James Cowgill <james.cowgill@mips.com>
James Lee <jlee@thestaticvoid.com>
James Pan <jiaming.pan@yahoo.com>
Jan Engelhardt <jengelh@inai.de>
Jan Kryl <jan.kryl@nexenta.com>
Jan Sanislo <oystr@cs.washington.edu>
Jason King <jason.brian.king@gmail.com>
Jason Zaman <jasonzaman@gmail.com>
Javen Wu <wu.javen@gmail.com>
Jeremy Gill <jgill@parallax-innovations.com>
Jeremy Jones <jeremy@delphix.com>
Jerry Jelinek <jerry.jelinek@joyent.com>
Jinshan Xiong <jinshan.xiong@intel.com>
Joe Stein <joe.stein@delphix.com>
John Albietz <inthecloud247@gmail.com>
John Eismeier <john.eismeier@gmail.com>
John L. Hammond <john.hammond@intel.com>
John Layman <jlayman@sagecloud.com>
John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
John Wren Kennedy <john.kennedy@delphix.com>
Johnny Stenback <github@jstenback.com>
Jorgen Lundman <lundman@lundman.net>
Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Joshua M. Clulow <josh@sysmgr.org>
Justin Bedő <cu@cua0.org>
Justin Lecher <jlec@gentoo.org>
Justin T. Gibbs <gibbs@FreeBSD.org>
Jörg Thalheim <joerg@higgsboson.tk>
KORN Andras <korn@elan.rulez.org>
Kamil Domański <kamil@domanski.co>
Karsten Kretschmer <kkretschmer@gmail.com>
Kash Pande <kash@tripleback.net>
Keith M Wesolowski <wesolows@foobazco.org>
Kevin Tanguy <kevin.tanguy@ovh.net>
KireinaHoro <i@jsteward.moe>
Kjeld Schouten-Lebbing <kjeld@schouten-lebbing.nl>
Kohsuke Kawaguchi <kk@kohsuke.org>
Kyle Blatter <kyleblatter@llnl.gov>
Kyle Fuller <inbox@kylefuller.co.uk>
Loli <ezomori.nozomu@gmail.com>
Lars Johannsen <laj@it.dk>
Li Dongyang <dongyang.li@anu.edu.au>
Li Wei <W.Li@Sun.COM>
Lukas Wunner <lukas@wunner.de>
Madhav Suresh <madhav.suresh@delphix.com>
Manoj Joseph <manoj.joseph@delphix.com>
Manuel Amador (Rudd-O) <rudd-o@rudd-o.com>
Marcel Huber <marcelhuberfoo@gmail.com>
Marcel Telka <marcel.telka@nexenta.com>
Marcel Wysocki <maci.stgn@gmail.com>
Mark Shellenbaum <Mark.Shellenbaum@Oracle.COM>
Mark Wright <markwright@internode.on.net>
Martin Matuska <mm@FreeBSD.org>
Massimo Maggi <me@massimo-maggi.eu>
Matt Johnston <matt@fugro-fsi.com.au>
Matt Kemp <matt@mattikus.com>
Matthew Ahrens <matt@delphix.com>
Matthew Thode <mthode@mthode.org>
Matus Kral <matuskral@me.com>
Max Grossman <max.grossman@delphix.com>
Maximilian Mehnert <maximilian.mehnert@gmx.de>
Michael Gebetsroither <michael@mgeb.org>
Michael Kjorling <michael@kjorling.se>
Michael Martin <mgmartin.mgm@gmail.com>
Michael Niewöhner <foss@mniewoehner.de>
Mike Gerdts <mike.gerdts@joyent.com>
Mike Harsch <mike@harschsystems.com>
Mike Leddy <mike.leddy@gmail.com>
Mike Swanson <mikeonthecomputer@gmail.com>
Milan Jurik <milan.jurik@xylab.cz>
Morgan Jones <mjones@rice.edu>
Moritz Maxeiner <moritz@ucworks.org>
Nathaniel Clark <Nathaniel.Clark@misrule.us>
Nathaniel Wesley Filardo <nwf@cs.jhu.edu>
Nav Ravindranath <nav@delphix.com>
Neal Gompa (ニール・ゴンパ) <ngompa13@gmail.com>
Ned Bass <bass6@llnl.gov>
Neependra Khare <neependra@kqinfotech.com>
Neil Stockbridge <neil@dist.ro>
Nick Garvey <garvey.nick@gmail.com>
Nikolay Borisov <n.borisov.lkml@gmail.com>
Olaf Faaland <faaland1@llnl.gov>
Oleg Drokin <green@linuxhacker.ru>
Oleg Stepura <oleg@stepura.com>
Patrik Greco <sikevux@sikevux.se>
Paul B. Henson <henson@acm.org>
Paul Dagnelie <pcd@delphix.com>
Paul Zuchowski <pzuchowski@datto.com>
Pavel Boldin <boldin.pavel@gmail.com>
Pavel Zakharov <pavel.zakharov@delphix.com>
Pawel Jakub Dawidek <pjd@FreeBSD.org>
Pedro Giffuni <pfg@freebsd.org>
Peng <peng.hse@xtaotech.com>
Peter Ashford <ashford@accs.com>
Prakash Surya <prakash.surya@delphix.com>
Prasad Joshi <prasadjoshi124@gmail.com>
Ralf Ertzinger <ralf@skytale.net>
Randall Mason <ClashTheBunny@gmail.com>
Remy Blank <remy.blank@pobox.com>
Ricardo M. Correia <ricardo.correia@oracle.com>
Rich Ercolani <rincebrain@gmail.com>
Richard Elling <Richard.Elling@RichardElling.com>
Richard Laager <rlaager@wiktel.com>
Richard Lowe <richlowe@richlowe.net>
Richard Sharpe <rsharpe@samba.org>
Richard Yao <ryao@gentoo.org>
Rohan Puri <rohan.puri15@gmail.com>
Romain Dolbeau <romain.dolbeau@atos.net>
Roman Strashkin <roman.strashkin@nexenta.com>
Ruben Kerkhof <ruben@rubenkerkhof.com>
Saso Kiselkov <saso.kiselkov@nexenta.com>
Scot W. Stevenson <scot.stevenson@gmail.com>
Sean Eric Fagan <sef@ixsystems.com>
Sebastian Gottschall <s.gottschall@dd-wrt.com>
Sen Haerens <sen@senhaerens.be>
Serapheim Dimitropoulos <serapheim@delphix.com>
Seth Forshee <seth.forshee@canonical.com>
Shampavman <sham.pavman@nexenta.com>
Shen Yan <shenyanxxxy@qq.com>
Simon Guest <simon.guest@tesujimath.org>
Simon Klinkert <simon.klinkert@gmail.com>
Sowrabha Gopal <sowrabha.gopal@delphix.com>
Stanislav Seletskiy <s.seletskiy@gmail.com>
Steffen Müthing <steffen.muething@iwr.uni-heidelberg.de>
Stephen Blinick <stephen.blinick@delphix.com>
Steve Dougherty <sdougherty@barracuda.com>
Steven Burgess <sburgess@dattobackup.com>
Steven Hartland <smh@freebsd.org>
Steven Johnson <sjohnson@sakuraindustries.com>
Stian Ellingsen <stian@plaimi.net>
Suman Chakravartula <schakrava@gmail.com>
Sydney Vanda <sydney.m.vanda@intel.com>
Sören Tempel <soeren+git@soeren-tempel.net>
Thijs Cramer <thijs.cramer@gmail.com>
Tim Chase <tim@chase2k.com>
Tim Connors <tconnors@rather.puzzling.org>
Tim Crawford <tcrawford@datto.com>
Tim Haley <Tim.Haley@Sun.COM>
Tobin Harding <me@tobin.cc>
Tom Caputi <tcaputi@datto.com>
Tom Matthews <tom@axiom-partners.com>
Tom Prince <tom.prince@ualberta.net>
Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Tony Hutter <hutter2@llnl.gov>
Toomas Soome <tsoome@me.com>
Trey Dockendorf <treydock@gmail.com>
Turbo Fredriksson <turbo@bayour.com>
Tyler J. Stachecki <stachecki.tyler@gmail.com>
Vitaut Bajaryn <vitaut.bayaryn@gmail.com>
Weigang Li <weigang.li@intel.com>
Will Andrews <will@freebsd.org>
Will Rouesnel <w.rouesnel@gmail.com>
Wolfgang Bumiller <w.bumiller@proxmox.com>
Xin Li <delphij@FreeBSD.org>
Ying Zhu <casualfisher@gmail.com>
YunQiang Su <syq@debian.org>
Yuri Pankov <yuri.pankov@gmail.com>
Yuxuan Shui <yshuiv7@gmail.com>
Zachary Bedell <zac@thebedells.org>
-2
View File
@@ -1,2 +0,0 @@
The [OpenZFS Code of Conduct](https://openzfs.org/wiki/Code_of_Conduct)
applies to spaces associated with the OpenZFS project, including GitHub.
+27 -25
View File
@@ -1,31 +1,33 @@
Refer to the git commit log for authoritative copyright attribution. The majority of the code in the ZFS on Linux port comes from OpenSolaris
which has been released under the terms of the CDDL open source license.
This includes the core ZFS code, libavl, libnvpair, libefi, libunicode,
and libutil. The original OpenSolaris source can be downloaded from:
The original ZFS source code was obtained from Open Solaris which was http://dlc.sun.com/osol/on/downloads/b121/on-src.tar.bz2
released under the terms of the CDDL open source license. Additional
changes have been included from OpenZFS and the Illumos project which
are similarly licensed. These projects can be found on Github at:
* https://github.com/illumos/illumos-gate Files which do not originate from OpenSolaris are noted in the file header
* https://github.com/openzfs/openzfs and attributed properly. These exceptions include, but are not limited
to, the vdev_disk.c and zvol.c implementation which are licensed under
the CDDL.
The zpios test code is originally derived from the Lustre pios test code
which is licensed under the GPLv2. As such the heavily modified zpios
kernel test code also remains licensed under the GPLv2.
The latest stable and development versions of this port can be downloaded
from the official ZFS on Linux site located at:
http://zfsonlinux.org/
This ZFS on Linux port was produced at the Lawrence Livermore National
Laboratory (LLNL) under Contract No. DE-AC52-07NA27344 (Contract 44)
between the U.S. Department of Energy (DOE) and Lawrence Livermore
National Security, LLC (LLNS) for the operation of LLNL. It has been
approved for release under LLNL-CODE-403049.
Unless otherwise noted, all files in this distribution are released Unless otherwise noted, all files in this distribution are released
under the Common Development and Distribution License (CDDL). under the Common Development and Distribution License (CDDL).
Exceptions are noted within the associated source files. See the file
OPENSOLARIS.LICENSE for more information.
Exceptions are noted within the associated source files headers and Refer to the git commit log for authoritative copyright attribution.
by including a THIRDPARTYLICENSE file with the license terms. A few
notable exceptions and their respective licenses include:
* Skein Checksum Implementation: module/icp/algs/skein/THIRDPARTYLICENSE
* AES Implementation: module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman
* AES Implementation: module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl
* PBKDF2 Implementation: lib/libzfs/THIRDPARTYLICENSE.openssl
* SPL Implementation: module/os/linux/spl/THIRDPARTYLICENSE.gplv2
* GCM Implementation: module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams
* GCM Implementation: module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl
* GHASH Implementation: module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams
* GHASH Implementation: module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl
This product includes software developed by the OpenSSL Project for use
in the OpenSSL Toolkit (http://www.openssl.org/)
See the LICENSE and NOTICE for more information.
+24
View File
@@ -0,0 +1,24 @@
This work was produced at the Lawrence Livermore National Laboratory
(LLNL) under Contract No. DE-AC52-07NA27344 (Contract 44) between
the U.S. Department of Energy (DOE) and Lawrence Livermore National
Security, LLC (LLNS) for the operation of LLNL.
This work was prepared as an account of work sponsored by an agency of
the United States Government. Neither the United States Government nor
Lawrence Livermore National Security, LLC nor any of their employees,
makes any warranty, express or implied, or assumes any liability or
responsibility for the accuracy, completeness, or usefulness of any
information, apparatus, product, or process disclosed, or represents
that its use would not infringe privately-owned rights.
Reference herein to any specific commercial products, process, or
services by trade name, trademark, manufacturer or otherwise does
not necessarily constitute or imply its endorsement, recommendation,
or favoring by the United States Government or Lawrence Livermore
National Security, LLC. The views and opinions of authors expressed
herein do not necessarily state or reflect those of the United States
Government or Lawrence Livermore National Security, LLC, and shall
not be used for advertising or product endorsement purposes.
The precise terms and conditions for copying, distribution, and
modification are specified in the file OPENSOLARIS.LICENSE.
+8 -10
View File
@@ -1,10 +1,8 @@
Meta: 1 Meta: 1
Name: zfs Name: zfs
Branch: 1.0 Branch: 1.0
Version: 2.1.9 Version: 0.7.13
Release: 1 Release: 1
Release-Tags: relext Release-Tags: relext
License: CDDL License: CDDL
Author: OpenZFS Author: OpenZFS on Linux
Linux-Maximum: 6.1
Linux-Minimum: 3.10
+32 -179
View File
@@ -1,69 +1,32 @@
include $(top_srcdir)/config/Shellcheck.am
ACLOCAL_AMFLAGS = -I config ACLOCAL_AMFLAGS = -I config
SUBDIRS = include include config/rpm.am
if BUILD_LINUX include config/deb.am
SUBDIRS += rpm include config/tgz.am
endif
SUBDIRS = include rpm
if CONFIG_USER if CONFIG_USER
SUBDIRS += man scripts lib tests cmd etc contrib SUBDIRS += udev etc man scripts lib tests cmd contrib
if BUILD_LINUX
SUBDIRS += udev
endif
endif endif
if CONFIG_KERNEL if CONFIG_KERNEL
SUBDIRS += module SUBDIRS += module
extradir = $(prefix)/src/zfs-$(VERSION) extradir = @prefix@/src/zfs-$(VERSION)
extra_HEADERS = zfs.release.in zfs_config.h.in extra_HEADERS = zfs.release.in zfs_config.h.in
if BUILD_LINUX kerneldir = @prefix@/src/zfs-$(VERSION)/$(LINUX_VERSION)
kerneldir = $(prefix)/src/zfs-$(VERSION)/$(LINUX_VERSION)
nodist_kernel_HEADERS = zfs.release zfs_config.h module/$(LINUX_SYMBOLS) nodist_kernel_HEADERS = zfs.release zfs_config.h module/$(LINUX_SYMBOLS)
endif endif
endif
AUTOMAKE_OPTIONS = foreign AUTOMAKE_OPTIONS = foreign
EXTRA_DIST = autogen.sh copy-builtin EXTRA_DIST = autogen.sh copy-builtin
EXTRA_DIST += config/config.awk config/rpm.am config/deb.am config/tgz.am EXTRA_DIST += config/config.awk config/rpm.am config/deb.am config/tgz.am
EXTRA_DIST += AUTHORS CODE_OF_CONDUCT.md COPYRIGHT LICENSE META NEWS NOTICE EXTRA_DIST += META DISCLAIMER COPYRIGHT README.markdown OPENSOLARIS.LICENSE
EXTRA_DIST += README.md RELEASES.md
EXTRA_DIST += module/lua/README.zfs module/os/linux/spl/README.md
# Include all the extra licensing information for modules
EXTRA_DIST += module/icp/algs/skein/THIRDPARTYLICENSE
EXTRA_DIST += module/icp/algs/skein/THIRDPARTYLICENSE.descrip
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.gladman.descrip
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl
EXTRA_DIST += module/icp/asm-x86_64/aes/THIRDPARTYLICENSE.openssl.descrip
EXTRA_DIST += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams
EXTRA_DIST += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.cryptogams.descrip
EXTRA_DIST += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl
EXTRA_DIST += module/icp/asm-x86_64/modes/THIRDPARTYLICENSE.openssl.descrip
EXTRA_DIST += module/os/linux/spl/THIRDPARTYLICENSE.gplv2
EXTRA_DIST += module/os/linux/spl/THIRDPARTYLICENSE.gplv2.descrip
EXTRA_DIST += module/zfs/THIRDPARTYLICENSE.cityhash
EXTRA_DIST += module/zfs/THIRDPARTYLICENSE.cityhash.descrip
@CODE_COVERAGE_RULES@ @CODE_COVERAGE_RULES@
GITREV = include/zfs_gitrev.h
PHONY = gitrev
gitrev:
$(AM_V_GEN)$(top_srcdir)/scripts/make_gitrev.sh $(GITREV)
all: gitrev
# Double-colon rules are allowed; there are multiple independent definitions.
maintainer-clean-local::
-$(RM) $(GITREV)
distclean-local:: distclean-local::
-$(RM) -R autom4te*.cache build -$(RM) -R autom4te*.cache
-find . \( -name SCCS -o -name BitKeeper -o -name .svn -o -name CVS \ -find . \( -name SCCS -o -name BitKeeper -o -name .svn -o -name CVS \
-o -name .pc -o -name .hg -o -name .git \) -prune -o \ -o -name .pc -o -name .hg -o -name .git \) -prune -o \
\( -name '*.orig' -o -name '*.rej' -o -name '*~' \ \( -name '*.orig' -o -name '*.rej' -o -name '*~' \
@@ -74,173 +37,63 @@ distclean-local::
-o -name '*.gcno' \) \ -o -name '*.gcno' \) \
-type f -print | xargs $(RM) -type f -print | xargs $(RM)
all-local:
-[ -x ${top_builddir}/scripts/zfs-tests.sh ] && \
${top_builddir}/scripts/zfs-tests.sh -c
dist-hook: dist-hook:
$(AM_V_GEN)$(top_srcdir)/scripts/make_gitrev.sh -D $(distdir) $(GITREV) sed -i 's/Release:[[:print:]]*/Release: $(RELEASE)/' \
$(SED) ${ac_inplace} -e 's/Release:[[:print:]]*/Release: $(RELEASE)/' \
$(distdir)/META $(distdir)/META
if BUILD_LINUX checkstyle: cstyle shellcheck flake8 commitcheck
# For compatibility, create a matching spl-x.y.z directly which contains
# symlinks to the updated header and object file locations. These
# compatibility links will be removed in the next major release.
if CONFIG_KERNEL
install-data-hook:
rm -rf $(DESTDIR)$(prefix)/src/spl-$(VERSION) && \
mkdir $(DESTDIR)$(prefix)/src/spl-$(VERSION) && \
cd $(DESTDIR)$(prefix)/src/spl-$(VERSION) && \
ln -s ../zfs-$(VERSION)/include/spl include && \
ln -s ../zfs-$(VERSION)/$(LINUX_VERSION) $(LINUX_VERSION) && \
ln -s ../zfs-$(VERSION)/zfs_config.h.in spl_config.h.in && \
ln -s ../zfs-$(VERSION)/zfs.release.in spl.release.in && \
cd $(DESTDIR)$(prefix)/src/zfs-$(VERSION)/$(LINUX_VERSION) && \
ln -fs zfs_config.h spl_config.h && \
ln -fs zfs.release spl.release
endif
endif
PHONY += codecheck
codecheck: cstyle shellcheck checkbashisms flake8 mancheck testscheck vcscheck zstdcheck
PHONY += checkstyle
checkstyle: codecheck commitcheck
PHONY += commitcheck
commitcheck: commitcheck:
@if git rev-parse --git-dir > /dev/null 2>&1; then \ @if git rev-parse --git-dir > /dev/null 2>&1; then \
${top_srcdir}/scripts/commitcheck.sh; \ scripts/commitcheck.sh; \
fi fi
if HAVE_PARALLEL
cstyle_line = -print0 | parallel -X0 ${top_srcdir}/scripts/cstyle.pl -cpP {}
else
cstyle_line = -exec ${top_srcdir}/scripts/cstyle.pl -cpP {} +
endif
PHONY += cstyle
cstyle: cstyle:
@find ${top_srcdir} -name build -prune \ @find ${top_srcdir} -name '*.[hc]' ! -name 'zfs_config.*' \
-o -type f -name '*.[hc]' \ ! -name '*.mod.c' -type f -exec scripts/cstyle.pl -cpP {} \+
! -name 'zfs_config.*' ! -name '*.mod.c' \
! -name 'opt_global.h' ! -name '*_if*.h' \
! -name 'zstd_compat_wrapper.h' \
! -path './module/zstd/lib/*' \
$(cstyle_line)
filter_executable = -exec test -x '{}' \; -print shellcheck:
@if type shellcheck > /dev/null 2>&1; then \
SHELLCHECKDIRS = cmd contrib etc scripts tests shellcheck --exclude=SC1090 --format=gcc scripts/paxcheck.sh \
SHELLCHECKSCRIPTS = autogen.sh scripts/zloop.sh \
scripts/zfs-tests.sh \
PHONY += checkabi storeabi scripts/zfs.sh \
scripts/commitcheck.sh \
checklibabiversion: $$(find cmd/zed/zed.d/*.sh -type f) \
libabiversion=`abidw -v | $(SED) 's/[^0-9]//g'`; \ $$(find cmd/zpool/zpool.d/* -executable); \
if test $$libabiversion -lt "200"; then \
/bin/echo -e "\n" \
"*** Please use libabigail 2.0.0 version or newer;\n" \
"*** otherwise results are not consistent!\n" \
"(or see https://github.com/openzfs/libabigail-docker )\n"; \
exit 1; \
fi;
checkabi: checklibabiversion lib
$(MAKE) -C lib checkabi
storeabi: checklibabiversion lib
$(MAKE) -C lib storeabi
PHONY += mancheck
mancheck:
${top_srcdir}/scripts/mancheck.sh ${top_srcdir}/man ${top_srcdir}/tests/test-runner/man
if BUILD_LINUX
stat_fmt = -c '%A %n'
else
stat_fmt = -f '%Sp %N'
endif
PHONY += testscheck
testscheck:
@find ${top_srcdir}/tests/zfs-tests -type f \
\( -name '*.ksh' -not ${filter_executable} \) -o \
\( -name '*.kshlib' ${filter_executable} \) -o \
\( -name '*.shlib' ${filter_executable} \) -o \
\( -name '*.cfg' ${filter_executable} \) | \
xargs -r stat ${stat_fmt} | \
awk '{c++; print} END {if(c>0) exit 1}'
PHONY += vcscheck
vcscheck:
@if git rev-parse --git-dir > /dev/null 2>&1; then \
git ls-files . --exclude-standard --others | \
awk '{c++; print} END {if(c>0) exit 1}' ; \
fi fi
PHONY += zstdcheck
zstdcheck:
@$(MAKE) -C module/zstd checksymbols
PHONY += lint
lint: cppcheck paxcheck lint: cppcheck paxcheck
CPPCHECKDIRS = cmd lib module cppcheck:
PHONY += cppcheck @if type cppcheck > /dev/null 2>&1; then \
cppcheck: $(CPPCHECKDIRS) cppcheck --quiet --force --error-exitcode=2 --inline-suppr \
@if test -n "$(CPPCHECK)"; then \ --suppressions-list=.github/suppressions.txt \
set -e ; for dir in $(CPPCHECKDIRS) ; do \ -UHAVE_SSE2 -UHAVE_AVX512F -UHAVE_UIO_ZEROCOPY \
$(MAKE) -C $$dir cppcheck ; \ -UHAVE_DNLC ${top_srcdir}; \
done \
else \
echo "skipping cppcheck because cppcheck is not installed"; \
fi fi
PHONY += paxcheck
paxcheck: paxcheck:
@if type scanelf > /dev/null 2>&1; then \ @if type scanelf > /dev/null 2>&1; then \
${top_srcdir}/scripts/paxcheck.sh ${top_builddir}; \ scripts/paxcheck.sh ${top_srcdir}; \
else \
echo "skipping paxcheck because scanelf is not installed"; \
fi fi
PHONY += flake8
flake8: flake8:
@if type flake8 > /dev/null 2>&1; then \ @if type flake8 > /dev/null 2>&1; then \
flake8 ${top_srcdir}; \ flake8 ${top_srcdir}; \
else \
echo "skipping flake8 because flake8 is not installed"; \
fi fi
PHONY += ctags
ctags: ctags:
$(RM) tags $(RM) tags
find $(top_srcdir) -name '.?*' -prune \ find $(top_srcdir) -name .git -prune -o -name '*.[hc]' | xargs ctags
-o -type f -name '*.[hcS]' -print | xargs ctags -a
PHONY += etags
etags: etags:
$(RM) TAGS $(RM) TAGS
find $(top_srcdir) -name '.?*' -prune \ find $(top_srcdir) -name .pc -prune -o -name '*.[hc]' | xargs etags -a
-o -type f -name '*.[hcS]' -print | xargs etags -a
PHONY += cscopelist
cscopelist:
find $(top_srcdir) -name '.?*' -prune \
-o -type f -name '*.[hc]' -print >cscope.files
PHONY += tags
tags: ctags etags tags: ctags etags
PHONY += pkg pkg-dkms pkg-kmod pkg-utils
pkg: @DEFAULT_PACKAGE@ pkg: @DEFAULT_PACKAGE@
pkg-dkms: @DEFAULT_PACKAGE@-dkms pkg-dkms: @DEFAULT_PACKAGE@-dkms
pkg-kmod: @DEFAULT_PACKAGE@-kmod pkg-kmod: @DEFAULT_PACKAGE@-kmod
pkg-utils: @DEFAULT_PACKAGE@-utils pkg-utils: @DEFAULT_PACKAGE@-utils
include config/rpm.am
include config/deb.am
include config/tgz.am
.PHONY: $(PHONY)
-3
View File
@@ -1,3 +0,0 @@
Descriptions of all releases can be found on github:
https://github.com/openzfs/zfs/releases
-16
View File
@@ -1,16 +0,0 @@
This work was produced under the auspices of the U.S. Department of Energy by
Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
This work was prepared as an account of work sponsored by an agency of the
United States Government. Neither the United States Government nor Lawrence
Livermore National Security, LLC, nor any of their employees makes any warranty,
expressed or implied, or assumes any legal liability or responsibility for the
accuracy, completeness, or usefulness of any information, apparatus, product, or
process disclosed, or represents that its use would not infringe privately owned
rights. Reference herein to any specific commercial product, process, or service
by trade name, trademark, manufacturer, or otherwise does not necessarily
constitute or imply its endorsement, recommendation, or favoring by the United
States Government or Lawrence Livermore National Security, LLC. The views and
opinions of authors expressed herein do not necessarily state or reflect those
of the United States Government or Lawrence Livermore National Security, LLC,
and shall not be used for advertising or product endorsement purposes.
View File
+19
View File
@@ -0,0 +1,19 @@
![img](http://zfsonlinux.org/images/zfs-linux.png)
ZFS on Linux is an advanced file system and volume manager which was originally
developed for Solaris and is now maintained by the OpenZFS community.
[![codecov](https://codecov.io/gh/zfsonlinux/zfs/branch/master/graph/badge.svg)](https://codecov.io/gh/zfsonlinux/zfs)
# Official Resources
* [Site](http://zfsonlinux.org)
* [Wiki](https://github.com/zfsonlinux/zfs/wiki)
* [Mailing lists](https://github.com/zfsonlinux/zfs/wiki/Mailing-Lists)
* [OpenZFS site](http://open-zfs.org/)
# Installation
Full documentation for installing ZoL on your favorite Linux distribution can
be found at [our site](http://zfsonlinux.org/).
# Contribute & Develop
We have a separate document with [contribution guidelines](./.github/CONTRIBUTING.md).
-35
View File
@@ -1,35 +0,0 @@
![img](https://openzfs.github.io/openzfs-docs/_static/img/logo/480px-Open-ZFS-Secondary-Logo-Colour-halfsize.png)
OpenZFS is an advanced file system and volume manager which was originally
developed for Solaris and is now maintained by the OpenZFS community.
This repository contains the code for running OpenZFS on Linux and FreeBSD.
[![codecov](https://codecov.io/gh/openzfs/zfs/branch/master/graph/badge.svg)](https://codecov.io/gh/openzfs/zfs)
[![coverity](https://scan.coverity.com/projects/1973/badge.svg)](https://scan.coverity.com/projects/openzfs-zfs)
# Official Resources
* [Documentation](https://openzfs.github.io/openzfs-docs/) - for using and developing this repo
* [ZoL Site](https://zfsonlinux.org) - Linux release info & links
* [Mailing lists](https://openzfs.github.io/openzfs-docs/Project%20and%20Community/Mailing%20Lists.html)
* [OpenZFS site](https://openzfs.org/) - for conference videos and info on other platforms (illumos, OSX, Windows, etc)
# Installation
Full documentation for installing OpenZFS on your favorite operating system can
be found at the [Getting Started Page](https://openzfs.github.io/openzfs-docs/Getting%20Started/index.html).
# Contribute & Develop
We have a separate document with [contribution guidelines](./.github/CONTRIBUTING.md).
We have a [Code of Conduct](./CODE_OF_CONDUCT.md).
# Release
OpenZFS is released under a CDDL license.
For more details see the NOTICE, LICENSE and COPYRIGHT files; `UCRL-CODE-235197`
# Supported Kernels
* The `META` file contains the officially recognized supported Linux kernel versions.
* Supported FreeBSD versions are any supported branches and releases starting from 12.2-RELEASE.
-37
View File
@@ -1,37 +0,0 @@
OpenZFS uses the MAJOR.MINOR.PATCH versioning scheme described here:
* MAJOR - Incremented at the discretion of the OpenZFS developers to indicate
a particularly noteworthy feature or change. An increase in MAJOR number
does not indicate any incompatible on-disk format change. The ability
to import a ZFS pool is controlled by the feature flags enabled on the
pool and the feature flags supported by the installed OpenZFS version.
Increasing the MAJOR version is expected to be an infrequent occurrence.
* MINOR - Incremented to indicate new functionality such as a new feature
flag, pool/dataset property, zfs/zpool sub-command, new user/kernel
interface, etc. MINOR releases may introduce incompatible changes to the
user space library APIs (libzfs.so). Existing user/kernel interfaces are
considered to be stable to maximize compatibility between OpenZFS releases.
Additions to the user/kernel interface are backwards compatible.
* PATCH - Incremented when applying documentation updates, important bug
fixes, minor performance improvements, and kernel compatibility patches.
The user space library APIs and user/kernel interface are considered to
be stable. PATCH releases for a MAJOR.MINOR are published as needed.
Two release branches are maintained for OpenZFS, they are:
* OpenZFS LTS - A designated MAJOR.MINOR release with periodic PATCH
releases that incorporate important changes backported from newer OpenZFS
releases. This branch is intended for use in environments using an
LTS, enterprise, or similarly managed kernel (RHEL, Ubuntu LTS, Debian).
Minor changes to support these distribution kernels will be applied as
needed. New kernel versions released after the OpenZFS LTS release are
not supported. LTS releases will receive patches for at least 2 years.
The current LTS release is OpenZFS 2.1.
* OpenZFS current - Tracks the newest MAJOR.MINOR release. This branch
includes support for the latest OpenZFS features and recently releases
kernels. When a new MINOR release is tagged the previous MINOR release
will no longer be maintained (unless it is an LTS release). New MINOR
releases are planned to occur roughly annually.
+52 -8
View File
@@ -1,15 +1,17 @@
#!/bin/sh #!/bin/sh
### prepare ### prepare
#TEST_PREPARE_WATCHDOG="yes" #TEST_PREPARE_WATCHDOG="no"
#TEST_PREPARE_SHARES="yes"
### SPLAT
#TEST_SPLAT_SKIP="yes"
#TEST_SPLAT_OPTIONS="-acvx"
### ztest ### ztest
#TEST_ZTEST_SKIP="yes" #TEST_ZTEST_SKIP="yes"
#TEST_ZTEST_TIMEOUT=1800 #TEST_ZTEST_TIMEOUT=1800
#TEST_ZTEST_DIR="/var/tmp/" #TEST_ZTEST_DIR="/var/tmp/"
#TEST_ZTEST_OPTIONS="-V" #TEST_ZTEST_OPTIONS="-V"
#TEST_ZTEST_CORE_DIR="/mnt/zloop"
### zimport ### zimport
#TEST_ZIMPORT_SKIP="yes" #TEST_ZIMPORT_SKIP="yes"
@@ -29,13 +31,9 @@
### zfs-tests.sh ### zfs-tests.sh
#TEST_ZFSTESTS_SKIP="yes" #TEST_ZFSTESTS_SKIP="yes"
#TEST_ZFSTESTS_DIR="/mnt/"
#TEST_ZFSTESTS_DISKS="vdb vdc vdd" #TEST_ZFSTESTS_DISKS="vdb vdc vdd"
#TEST_ZFSTESTS_DISKSIZE="8G" #TEST_ZFSTESTS_DISKSIZE="8G"
#TEST_ZFSTESTS_ITERS="1"
#TEST_ZFSTESTS_OPTIONS="-vx"
#TEST_ZFSTESTS_RUNFILE="linux.run" #TEST_ZFSTESTS_RUNFILE="linux.run"
#TEST_ZFSTESTS_TAGS="functional"
### zfsstress ### zfsstress
#TEST_ZFSSTRESS_SKIP="yes" #TEST_ZFSSTRESS_SKIP="yes"
@@ -44,7 +42,53 @@
#TEST_ZFSSTRESS_RUNTIME=300 #TEST_ZFSSTRESS_RUNTIME=300
#TEST_ZFSSTRESS_POOL="tank" #TEST_ZFSSTRESS_POOL="tank"
#TEST_ZFSSTRESS_FS="fish" #TEST_ZFSSTRESS_FS="fish"
#TEST_ZFSSTRESS_FSOPT="-o overlay=on"
#TEST_ZFSSTRESS_VDEV="/var/tmp/vdev" #TEST_ZFSSTRESS_VDEV="/var/tmp/vdev"
#TEST_ZFSSTRESS_DIR="/$TEST_ZFSSTRESS_POOL/$TEST_ZFSSTRESS_FS" #TEST_ZFSSTRESS_DIR="/$TEST_ZFSSTRESS_POOL/$TEST_ZFSSTRESS_FS"
#TEST_ZFSSTRESS_OPTIONS="" #TEST_ZFSSTRESS_OPTIONS=""
### per-builder customization
#
# BB_NAME=builder-name <distribution-version-architecture-type>
# - distribution=Amazon,Debian,Fedora,RHEL,SUSE,Ubuntu
# - version=x.y
# - architecture=x86_64,i686,arm,aarch64
# - type=build,test
#
case "$BB_NAME" in
Amazon*)
# ZFS enabled xfstests fails to build
TEST_XFSTESTS_SKIP="yes"
;;
CentOS-7*)
# ZFS enabled xfstests fails to build
TEST_XFSTESTS_SKIP="yes"
;;
CentOS-6*)
;;
Debian*)
;;
Fedora*)
;;
RHEL*)
;;
SUSE*)
;;
Ubuntu-16.04*)
# ZFS enabled xfstests fails to build
TEST_XFSTESTS_SKIP="yes"
;;
Ubuntu*)
;;
*)
;;
esac
###
#
# Disable the following test suites on 32-bit systems.
#
if [ $(getconf LONG_BIT) = "32" ]; then
TEST_ZTEST_SKIP="yes"
TEST_XFSTESTS_SKIP="yes"
TEST_ZFSSTRESS_SKIP="yes"
fi
+1 -1
View File
@@ -1,4 +1,4 @@
#!/bin/sh #!/bin/sh
autoreconf -fiv || exit 1 autoreconf -fiv
rm -Rf autom4te.cache rm -Rf autom4te.cache
+3 -27
View File
@@ -1,27 +1,3 @@
include $(top_srcdir)/config/Shellcheck.am SUBDIRS = zfs zpool zdb zhack zinject zstreamdump ztest zpios
SUBDIRS += mount_zfs fsck_zfs zvol_id vdev_id arcstat dbufstat zed
SUBDIRS = zfs zpool zdb zhack zinject zstream ztest SUBDIRS += arc_summary raidz_test zgenhostid
SUBDIRS += fsck_zfs vdev_id raidz_test zfs_ids_to_path
SUBDIRS += zpool_influxdb
CPPCHECKDIRS = zfs zpool zdb zhack zinject zstream ztest
CPPCHECKDIRS += raidz_test zfs_ids_to_path zpool_influxdb
# TODO: #12084: SHELLCHECKDIRS = fsck_zfs vdev_id zpool
SHELLCHECKDIRS = fsck_zfs zpool
if USING_PYTHON
SUBDIRS += arcstat arc_summary dbufstat
endif
if BUILD_LINUX
SUBDIRS += mount_zfs zed zgenhostid zvol_id zvol_wait
CPPCHECKDIRS += mount_zfs zed zgenhostid zvol_id
SHELLCHECKDIRS += zed
endif
PHONY = cppcheck
cppcheck: $(CPPCHECKDIRS)
set -e ; for dir in $(CPPCHECKDIRS) ; do \
$(MAKE) -C $$dir cppcheck ; \
done
-1
View File
@@ -1 +0,0 @@
arc_summary
+1 -13
View File
@@ -1,13 +1 @@
bin_SCRIPTS = arc_summary dist_bin_SCRIPTS = arc_summary.py
CLEANFILES = arc_summary
EXTRA_DIST = arc_summary2 arc_summary3
if USING_PYTHON_2
SCRIPT = arc_summary2
else
SCRIPT = arc_summary3
endif
arc_summary: $(SCRIPT)
cp $< $@
@@ -1,4 +1,4 @@
#!/usr/bin/env python2 #!/usr/bin/python
# #
# $Id: arc_summary.pl,v 388:e27800740aa2 2011-07-08 02:53:29Z jhell $ # $Id: arc_summary.pl,v 388:e27800740aa2 2011-07-08 02:53:29Z jhell $
# #
@@ -35,14 +35,12 @@
# Note some of this code uses older code (eg getopt instead of argparse, # Note some of this code uses older code (eg getopt instead of argparse,
# subprocess.Popen() instead of subprocess.run()) because we need to support # subprocess.Popen() instead of subprocess.run()) because we need to support
# some very old versions of Python. # some very old versions of Python.
#
"""Print statistics on the ZFS Adjustable Replacement Cache (ARC) """Print statistics on the ZFS Adjustable Replacement Cache (ARC)
Provides basic information on the ARC, its efficiency, the L2ARC (if present), Provides basic information on the ARC, its efficiency, the L2ARC (if present),
the Data Management Unit (DMU), Virtual Devices (VDEVs), and tunables. See the the Data Management Unit (DMU), Virtual Devices (VDEVs), and tunables. See the
in-source documentation and code at in-source documentation and code at
https://github.com/openzfs/zfs/blob/master/module/zfs/arc.c for details. https://github.com/zfsonlinux/zfs/blob/master/module/zfs/arc.c for details.
""" """
import getopt import getopt
@@ -54,64 +52,46 @@ import errno
from subprocess import Popen, PIPE from subprocess import Popen, PIPE
from decimal import Decimal as D from decimal import Decimal as D
if sys.platform.startswith('freebsd'):
# Requires py27-sysctl on FreeBSD
import sysctl
def is_value(ctl):
return ctl.type != sysctl.CTLTYPE_NODE
def load_kstats(namespace):
"""Collect information on a specific subsystem of the ARC"""
base = 'kstat.zfs.misc.%s.' % namespace
fmt = lambda kstat: (kstat.name, D(kstat.value))
kstats = sysctl.filter(base)
return [fmt(kstat) for kstat in kstats if is_value(kstat)]
def load_tunables():
ctls = sysctl.filter('vfs.zfs')
return dict((ctl.name, ctl.value) for ctl in ctls if is_value(ctl))
elif sys.platform.startswith('linux'):
def load_kstats(namespace):
"""Collect information on a specific subsystem of the ARC"""
kstat = 'kstat.zfs.misc.%s.%%s' % namespace
path = '/proc/spl/kstat/zfs/%s' % namespace
with open(path) as f:
entries = [line.strip().split() for line in f][2:] # Skip header
return [(kstat % name, D(value)) for name, _, value in entries]
def load_tunables():
basepath = '/sys/module/zfs/parameters'
tunables = {}
for name in os.listdir(basepath):
if not name:
continue
path = '%s/%s' % (basepath, name)
with open(path) as f:
value = f.read()
tunables[name] = value.strip()
return tunables
show_tunable_descriptions = False show_tunable_descriptions = False
alternate_tunable_layout = False alternate_tunable_layout = False
def handle_Exception(ex_cls, ex, tb):
if ex is IOError:
if ex.errno == errno.EPIPE:
sys.exit()
if ex is KeyboardInterrupt:
sys.exit()
sys.excepthook = handle_Exception
def get_Kstat(): def get_Kstat():
"""Collect information on the ZFS subsystem from the /proc virtual """Collect information on the ZFS subsystem from the /proc virtual
file system. The name "kstat" is a holdover from the Solaris utility file system. The name "kstat" is a holdover from the Solaris utility
of the same name. of the same name.
""" """
def load_proc_kstats(fn, namespace):
"""Collect information on a specific subsystem of the ARC"""
kstats = [line.strip() for line in open(fn)]
del kstats[0:2]
for kstat in kstats:
kstat = kstat.strip()
name, _, value = kstat.split()
Kstat[namespace + name] = D(value)
Kstat = {} Kstat = {}
Kstat.update(load_kstats('arcstats')) load_proc_kstats('/proc/spl/kstat/zfs/arcstats',
Kstat.update(load_kstats('zfetchstats')) 'kstat.zfs.misc.arcstats.')
Kstat.update(load_kstats('vdev_cache_stats')) load_proc_kstats('/proc/spl/kstat/zfs/zfetchstats',
'kstat.zfs.misc.zfetchstats.')
load_proc_kstats('/proc/spl/kstat/zfs/vdev_cache_stats',
'kstat.zfs.misc.vdev_cache_stats.')
return Kstat return Kstat
@@ -213,39 +193,17 @@ def get_arc_summary(Kstat):
deleted = Kstat["kstat.zfs.misc.arcstats.deleted"] deleted = Kstat["kstat.zfs.misc.arcstats.deleted"]
mutex_miss = Kstat["kstat.zfs.misc.arcstats.mutex_miss"] mutex_miss = Kstat["kstat.zfs.misc.arcstats.mutex_miss"]
evict_skip = Kstat["kstat.zfs.misc.arcstats.evict_skip"] evict_skip = Kstat["kstat.zfs.misc.arcstats.evict_skip"]
evict_l2_cached = Kstat["kstat.zfs.misc.arcstats.evict_l2_cached"]
evict_l2_eligible = Kstat["kstat.zfs.misc.arcstats.evict_l2_eligible"]
evict_l2_eligible_mfu = Kstat["kstat.zfs.misc.arcstats.evict_l2_eligible_mfu"]
evict_l2_eligible_mru = Kstat["kstat.zfs.misc.arcstats.evict_l2_eligible_mru"]
evict_l2_ineligible = Kstat["kstat.zfs.misc.arcstats.evict_l2_ineligible"]
evict_l2_skip = Kstat["kstat.zfs.misc.arcstats.evict_l2_skip"]
# ARC Misc. # ARC Misc.
output["arc_misc"] = {} output["arc_misc"] = {}
output["arc_misc"]["deleted"] = fHits(deleted) output["arc_misc"]["deleted"] = fHits(deleted)
output["arc_misc"]["mutex_miss"] = fHits(mutex_miss) output["arc_misc"]['mutex_miss'] = fHits(mutex_miss)
output["arc_misc"]["evict_skips"] = fHits(evict_skip) output["arc_misc"]['evict_skips'] = fHits(evict_skip)
output["arc_misc"]["evict_l2_skip"] = fHits(evict_l2_skip)
output["arc_misc"]["evict_l2_cached"] = fBytes(evict_l2_cached)
output["arc_misc"]["evict_l2_eligible"] = fBytes(evict_l2_eligible)
output["arc_misc"]["evict_l2_eligible_mfu"] = {
'per': fPerc(evict_l2_eligible_mfu, evict_l2_eligible),
'num': fBytes(evict_l2_eligible_mfu),
}
output["arc_misc"]["evict_l2_eligible_mru"] = {
'per': fPerc(evict_l2_eligible_mru, evict_l2_eligible),
'num': fBytes(evict_l2_eligible_mru),
}
output["arc_misc"]["evict_l2_ineligible"] = fBytes(evict_l2_ineligible)
# ARC Sizing # ARC Sizing
arc_size = Kstat["kstat.zfs.misc.arcstats.size"] arc_size = Kstat["kstat.zfs.misc.arcstats.size"]
mru_size = Kstat["kstat.zfs.misc.arcstats.mru_size"] mru_size = Kstat["kstat.zfs.misc.arcstats.mru_size"]
mfu_size = Kstat["kstat.zfs.misc.arcstats.mfu_size"] mfu_size = Kstat["kstat.zfs.misc.arcstats.mfu_size"]
meta_limit = Kstat["kstat.zfs.misc.arcstats.arc_meta_limit"]
meta_size = Kstat["kstat.zfs.misc.arcstats.arc_meta_used"]
dnode_limit = Kstat["kstat.zfs.misc.arcstats.arc_dnode_limit"]
dnode_size = Kstat["kstat.zfs.misc.arcstats.dnode_size"]
target_max_size = Kstat["kstat.zfs.misc.arcstats.c_max"] target_max_size = Kstat["kstat.zfs.misc.arcstats.c_max"]
target_min_size = Kstat["kstat.zfs.misc.arcstats.c_min"] target_min_size = Kstat["kstat.zfs.misc.arcstats.c_min"]
target_size = Kstat["kstat.zfs.misc.arcstats.c"] target_size = Kstat["kstat.zfs.misc.arcstats.c"]
@@ -270,22 +228,6 @@ def get_arc_summary(Kstat):
'per': fPerc(target_size, target_max_size), 'per': fPerc(target_size, target_max_size),
'num': fBytes(target_size), 'num': fBytes(target_size),
} }
output['arc_sizing']['meta_limit'] = {
'per': fPerc(meta_limit, target_max_size),
'num': fBytes(meta_limit),
}
output['arc_sizing']['meta_size'] = {
'per': fPerc(meta_size, meta_limit),
'num': fBytes(meta_size),
}
output['arc_sizing']['dnode_limit'] = {
'per': fPerc(dnode_limit, meta_limit),
'num': fBytes(dnode_limit),
}
output['arc_sizing']['dnode_size'] = {
'per': fPerc(dnode_size, dnode_limit),
'num': fBytes(dnode_size),
}
# ARC Hash Breakdown # ARC Hash Breakdown
output['arc_hash_break'] = {} output['arc_hash_break'] = {}
@@ -352,26 +294,8 @@ def _arc_summary(Kstat):
sys.stdout.write("\tDeleted:\t\t\t\t%s\n" % arc['arc_misc']['deleted']) sys.stdout.write("\tDeleted:\t\t\t\t%s\n" % arc['arc_misc']['deleted'])
sys.stdout.write("\tMutex Misses:\t\t\t\t%s\n" % sys.stdout.write("\tMutex Misses:\t\t\t\t%s\n" %
arc['arc_misc']['mutex_miss']) arc['arc_misc']['mutex_miss'])
sys.stdout.write("\tEviction Skips:\t\t\t\t%s\n" % sys.stdout.write("\tEvict Skips:\t\t\t\t%s\n" %
arc['arc_misc']['evict_skips']) arc['arc_misc']['evict_skips'])
sys.stdout.write("\tEviction Skips Due to L2 Writes:\t%s\n" %
arc['arc_misc']['evict_l2_skip'])
sys.stdout.write("\tL2 Cached Evictions:\t\t\t%s\n" %
arc['arc_misc']['evict_l2_cached'])
sys.stdout.write("\tL2 Eligible Evictions:\t\t\t%s\n" %
arc['arc_misc']['evict_l2_eligible'])
sys.stdout.write("\tL2 Eligible MFU Evictions:\t%s\t%s\n" % (
arc['arc_misc']['evict_l2_eligible_mfu']['per'],
arc['arc_misc']['evict_l2_eligible_mfu']['num'],
)
)
sys.stdout.write("\tL2 Eligible MRU Evictions:\t%s\t%s\n" % (
arc['arc_misc']['evict_l2_eligible_mru']['per'],
arc['arc_misc']['evict_l2_eligible_mru']['num'],
)
)
sys.stdout.write("\tL2 Ineligible Evictions:\t\t%s\n" %
arc['arc_misc']['evict_l2_ineligible'])
sys.stdout.write("\n") sys.stdout.write("\n")
# ARC Sizing # ARC Sizing
@@ -409,26 +333,6 @@ def _arc_summary(Kstat):
arc['arc_size_break']['frequently_used_cache_size']['num'], arc['arc_size_break']['frequently_used_cache_size']['num'],
) )
) )
sys.stdout.write("\tMetadata Size (Hard Limit):\t%s\t%s\n" % (
arc['arc_sizing']['meta_limit']['per'],
arc['arc_sizing']['meta_limit']['num'],
)
)
sys.stdout.write("\tMetadata Size:\t\t\t%s\t%s\n" % (
arc['arc_sizing']['meta_size']['per'],
arc['arc_sizing']['meta_size']['num'],
)
)
sys.stdout.write("\tDnode Size (Hard Limit):\t%s\t%s\n" % (
arc['arc_sizing']['dnode_limit']['per'],
arc['arc_sizing']['dnode_limit']['num'],
)
)
sys.stdout.write("\tDnode Size:\t\t\t%s\t%s\n" % (
arc['arc_sizing']['dnode_size']['per'],
arc['arc_sizing']['dnode_size']['num'],
)
)
sys.stdout.write("\n") sys.stdout.write("\n")
@@ -707,11 +611,6 @@ def get_l2arc_summary(Kstat):
l2_writes_done = Kstat["kstat.zfs.misc.arcstats.l2_writes_done"] l2_writes_done = Kstat["kstat.zfs.misc.arcstats.l2_writes_done"]
l2_writes_error = Kstat["kstat.zfs.misc.arcstats.l2_writes_error"] l2_writes_error = Kstat["kstat.zfs.misc.arcstats.l2_writes_error"]
l2_writes_sent = Kstat["kstat.zfs.misc.arcstats.l2_writes_sent"] l2_writes_sent = Kstat["kstat.zfs.misc.arcstats.l2_writes_sent"]
l2_mfu_asize = Kstat["kstat.zfs.misc.arcstats.l2_mfu_asize"]
l2_mru_asize = Kstat["kstat.zfs.misc.arcstats.l2_mru_asize"]
l2_prefetch_asize = Kstat["kstat.zfs.misc.arcstats.l2_prefetch_asize"]
l2_bufc_data_asize = Kstat["kstat.zfs.misc.arcstats.l2_bufc_data_asize"]
l2_bufc_metadata_asize = Kstat["kstat.zfs.misc.arcstats.l2_bufc_metadata_asize"]
l2_access_total = (l2_hits + l2_misses) l2_access_total = (l2_hits + l2_misses)
output['l2_health_count'] = (l2_writes_error + l2_cksum_bad + l2_io_error) output['l2_health_count'] = (l2_writes_error + l2_cksum_bad + l2_io_error)
@@ -734,7 +633,7 @@ def get_l2arc_summary(Kstat):
output["io_errors"] = fHits(l2_io_error) output["io_errors"] = fHits(l2_io_error)
output["l2_arc_size"] = {} output["l2_arc_size"] = {}
output["l2_arc_size"]["adaptive"] = fBytes(l2_size) output["l2_arc_size"]["adative"] = fBytes(l2_size)
output["l2_arc_size"]["actual"] = { output["l2_arc_size"]["actual"] = {
'per': fPerc(l2_asize, l2_size), 'per': fPerc(l2_asize, l2_size),
'num': fBytes(l2_asize) 'num': fBytes(l2_asize)
@@ -743,26 +642,6 @@ def get_l2arc_summary(Kstat):
'per': fPerc(l2_hdr_size, l2_size), 'per': fPerc(l2_hdr_size, l2_size),
'num': fBytes(l2_hdr_size), 'num': fBytes(l2_hdr_size),
} }
output["l2_arc_size"]["mfu_asize"] = {
'per': fPerc(l2_mfu_asize, l2_asize),
'num': fBytes(l2_mfu_asize),
}
output["l2_arc_size"]["mru_asize"] = {
'per': fPerc(l2_mru_asize, l2_asize),
'num': fBytes(l2_mru_asize),
}
output["l2_arc_size"]["prefetch_asize"] = {
'per': fPerc(l2_prefetch_asize, l2_asize),
'num': fBytes(l2_prefetch_asize),
}
output["l2_arc_size"]["bufc_data_asize"] = {
'per': fPerc(l2_bufc_data_asize, l2_asize),
'num': fBytes(l2_bufc_data_asize),
}
output["l2_arc_size"]["bufc_metadata_asize"] = {
'per': fPerc(l2_bufc_metadata_asize, l2_asize),
'num': fBytes(l2_bufc_metadata_asize),
}
output["l2_arc_evicts"] = {} output["l2_arc_evicts"] = {}
output["l2_arc_evicts"]['lock_retries'] = fHits(l2_evict_lock_retry) output["l2_arc_evicts"]['lock_retries'] = fHits(l2_evict_lock_retry)
@@ -827,7 +706,7 @@ def _l2arc_summary(Kstat):
sys.stdout.write("\n") sys.stdout.write("\n")
sys.stdout.write("L2 ARC Size: (Adaptive)\t\t\t\t%s\n" % sys.stdout.write("L2 ARC Size: (Adaptive)\t\t\t\t%s\n" %
arc["l2_arc_size"]["adaptive"]) arc["l2_arc_size"]["adative"])
sys.stdout.write("\tCompressed:\t\t\t%s\t%s\n" % ( sys.stdout.write("\tCompressed:\t\t\t%s\t%s\n" % (
arc["l2_arc_size"]["actual"]["per"], arc["l2_arc_size"]["actual"]["per"],
arc["l2_arc_size"]["actual"]["num"], arc["l2_arc_size"]["actual"]["num"],
@@ -838,36 +717,11 @@ def _l2arc_summary(Kstat):
arc["l2_arc_size"]["head_size"]["num"], arc["l2_arc_size"]["head_size"]["num"],
) )
) )
sys.stdout.write("\tMFU Alloc. Size:\t\t%s\t%s\n" % (
arc["l2_arc_size"]["mfu_asize"]["per"],
arc["l2_arc_size"]["mfu_asize"]["num"],
)
)
sys.stdout.write("\tMRU Alloc. Size:\t\t%s\t%s\n" % (
arc["l2_arc_size"]["mru_asize"]["per"],
arc["l2_arc_size"]["mru_asize"]["num"],
)
)
sys.stdout.write("\tPrefetch Alloc. Size:\t\t%s\t%s\n" % (
arc["l2_arc_size"]["prefetch_asize"]["per"],
arc["l2_arc_size"]["prefetch_asize"]["num"],
)
)
sys.stdout.write("\tData (buf content) Alloc. Size:\t%s\t%s\n" % (
arc["l2_arc_size"]["bufc_data_asize"]["per"],
arc["l2_arc_size"]["bufc_data_asize"]["num"],
)
)
sys.stdout.write("\tMetadata (buf content) Size:\t%s\t%s\n" % (
arc["l2_arc_size"]["bufc_metadata_asize"]["per"],
arc["l2_arc_size"]["bufc_metadata_asize"]["num"],
)
)
sys.stdout.write("\n") sys.stdout.write("\n")
if arc["l2_arc_evicts"]['lock_retries'] != '0' or \ if arc["l2_arc_evicts"]['lock_retries'] != '0' or \
arc["l2_arc_evicts"]["reading"] != '0': arc["l2_arc_evicts"]["reading"] != '0':
sys.stdout.write("L2 ARC Evictions:\n") sys.stdout.write("L2 ARC Evicts:\n")
sys.stdout.write("\tLock Retries:\t\t\t\t%s\n" % sys.stdout.write("\tLock Retries:\t\t\t\t%s\n" %
arc["l2_arc_evicts"]['lock_retries']) arc["l2_arc_evicts"]['lock_retries'])
sys.stdout.write("\tUpon Reading:\t\t\t\t%s\n" % sys.stdout.write("\tUpon Reading:\t\t\t\t%s\n" %
@@ -1025,7 +879,14 @@ def _tunable_summary(Kstat):
global show_tunable_descriptions global show_tunable_descriptions
global alternate_tunable_layout global alternate_tunable_layout
tunables = load_tunables() names = os.listdir("/sys/module/zfs/parameters/")
values = {}
for name in names:
with open("/sys/module/zfs/parameters/" + name) as f:
value = f.read()
values[name] = value.strip()
descriptions = {} descriptions = {}
if show_tunable_descriptions: if show_tunable_descriptions:
@@ -1063,17 +924,22 @@ def _tunable_summary(Kstat):
sys.stderr.write("Tunable descriptions will be disabled.\n") sys.stderr.write("Tunable descriptions will be disabled.\n")
sys.stdout.write("ZFS Tunables:\n") sys.stdout.write("ZFS Tunables:\n")
names.sort()
if alternate_tunable_layout: if alternate_tunable_layout:
fmt = "\t%s=%s\n" fmt = "\t%s=%s\n"
else: else:
fmt = "\t%-50s%s\n" fmt = "\t%-50s%s\n"
for name in sorted(tunables.keys()): for name in names:
if not name:
continue
if show_tunable_descriptions and name in descriptions: if show_tunable_descriptions and name in descriptions:
sys.stdout.write("\t# %s\n" % descriptions[name]) sys.stdout.write("\t# %s\n" % descriptions[name])
sys.stdout.write(fmt % (name, tunables[name])) sys.stdout.write(fmt % (name, values[name]))
unSub = [ unSub = [
@@ -1099,7 +965,7 @@ def zfs_header():
def usage(): def usage():
"""Print usage information""" """Print usage information"""
sys.stdout.write("Usage: arc_summary [-h] [-a] [-d] [-p PAGE]\n\n") sys.stdout.write("Usage: arc_summary.py [-h] [-a] [-d] [-p PAGE]\n\n")
sys.stdout.write("\t -h, --help : " sys.stdout.write("\t -h, --help : "
"Print this help message and exit\n") "Print this help message and exit\n")
sys.stdout.write("\t -a, --alternate : " sys.stdout.write("\t -a, --alternate : "
@@ -1112,10 +978,10 @@ def usage():
"should be an integer between 1 and " + "should be an integer between 1 and " +
str(len(unSub)) + "\n\n") str(len(unSub)) + "\n\n")
sys.stdout.write("Examples:\n") sys.stdout.write("Examples:\n")
sys.stdout.write("\tarc_summary -a\n") sys.stdout.write("\tarc_summary.py -a\n")
sys.stdout.write("\tarc_summary -p 4\n") sys.stdout.write("\tarc_summary.py -p 4\n")
sys.stdout.write("\tarc_summary -ad\n") sys.stdout.write("\tarc_summary.py -ad\n")
sys.stdout.write("\tarc_summary --page=2\n") sys.stdout.write("\tarc_summary.py --page=2\n")
def main(): def main():
@@ -1125,55 +991,48 @@ def main():
global alternate_tunable_layout global alternate_tunable_layout
try: try:
try: opts, args = getopt.getopt(
opts, args = getopt.getopt( sys.argv[1:],
sys.argv[1:], "adp:h", ["alternate", "description", "page=", "help"]
"adp:h", ["alternate", "description", "page=", "help"] )
) except getopt.error as e:
except getopt.error as e: sys.stderr.write("Error: %s\n" % e.msg)
sys.stderr.write("Error: %s\n" % e.msg) usage()
sys.exit(1)
args = {}
for opt, arg in opts:
if opt in ('-a', '--alternate'):
args['a'] = True
if opt in ('-d', '--description'):
args['d'] = True
if opt in ('-p', '--page'):
args['p'] = arg
if opt in ('-h', '--help'):
usage() usage()
sys.exit(1)
args = {}
for opt, arg in opts:
if opt in ('-a', '--alternate'):
args['a'] = True
if opt in ('-d', '--description'):
args['d'] = True
if opt in ('-p', '--page'):
args['p'] = arg
if opt in ('-h', '--help'):
usage()
sys.exit(0)
Kstat = get_Kstat()
alternate_tunable_layout = 'a' in args
show_tunable_descriptions = 'd' in args
pages = []
if 'p' in args:
try:
pages.append(unSub[int(args['p']) - 1])
except IndexError:
sys.stderr.write('the argument to -p must be between 1 and ' +
str(len(unSub)) + '\n')
sys.exit(1)
else:
pages = unSub
zfs_header()
for page in pages:
page(Kstat)
sys.stdout.write("\n")
except IOError as ex:
if (ex.errno == errno.EPIPE):
sys.exit(0) sys.exit(0)
raise
except KeyboardInterrupt: Kstat = get_Kstat()
sys.exit(0)
alternate_tunable_layout = 'a' in args
show_tunable_descriptions = 'd' in args
pages = []
if 'p' in args:
try:
pages.append(unSub[int(args['p']) - 1])
except IndexError:
sys.stderr.write('the argument to -p must be between 1 and ' +
str(len(unSub)) + '\n')
sys.exit(1)
else:
pages = unSub
zfs_header()
for page in pages:
page(Kstat)
sys.stdout.write("\n")
if __name__ == '__main__': if __name__ == '__main__':
-986
View File
@@ -1,986 +0,0 @@
#!/usr/bin/env python3
#
# Copyright (c) 2008 Ben Rockwood <benr@cuddletech.com>,
# Copyright (c) 2010 Martin Matuska <mm@FreeBSD.org>,
# Copyright (c) 2010-2011 Jason J. Hellenthal <jhell@DataIX.net>,
# Copyright (c) 2017 Scot W. Stevenson <scot.stevenson@gmail.com>
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
"""Print statistics on the ZFS ARC Cache and other information
Provides basic information on the ARC, its efficiency, the L2ARC (if present),
the Data Management Unit (DMU), Virtual Devices (VDEVs), and tunables. See
the in-source documentation and code at
https://github.com/openzfs/zfs/blob/master/module/zfs/arc.c for details.
The original introduction to arc_summary can be found at
http://cuddletech.com/?p=454
"""
import argparse
import os
import subprocess
import sys
import time
import errno
# We can't use env -S portably, and we need python3 -u to handle pipes in
# the shell abruptly closing the way we want to, so...
import io
if isinstance(sys.__stderr__.buffer, io.BufferedWriter):
os.execv(sys.executable, [sys.executable, "-u"] + sys.argv)
DESCRIPTION = 'Print ARC and other statistics for OpenZFS'
INDENT = ' '*8
LINE_LENGTH = 72
DATE_FORMAT = '%a %b %d %H:%M:%S %Y'
TITLE = 'ZFS Subsystem Report'
SECTIONS = 'arc archits dmu l2arc spl tunables vdev zil'.split()
SECTION_HELP = 'print info from one section ('+' '.join(SECTIONS)+')'
# Tunables and SPL are handled separately because they come from
# different sources
SECTION_PATHS = {'arc': 'arcstats',
'dmu': 'dmu_tx',
'l2arc': 'arcstats', # L2ARC stuff lives in arcstats
'vdev': 'vdev_cache_stats',
'zfetch': 'zfetchstats',
'zil': 'zil'}
parser = argparse.ArgumentParser(description=DESCRIPTION)
parser.add_argument('-a', '--alternate', action='store_true', default=False,
help='use alternate formatting for tunables and SPL',
dest='alt')
parser.add_argument('-d', '--description', action='store_true', default=False,
help='print descriptions with tunables and SPL',
dest='desc')
parser.add_argument('-g', '--graph', action='store_true', default=False,
help='print graph on ARC use and exit', dest='graph')
parser.add_argument('-p', '--page', type=int, dest='page',
help='print page by number (DEPRECATED, use "-s")')
parser.add_argument('-r', '--raw', action='store_true', default=False,
help='dump all available data with minimal formatting',
dest='raw')
parser.add_argument('-s', '--section', dest='section', help=SECTION_HELP)
ARGS = parser.parse_args()
if sys.platform.startswith('freebsd'):
# Requires py36-sysctl on FreeBSD
import sysctl
VDEV_CACHE_SIZE = 'vdev.cache_size'
def is_value(ctl):
return ctl.type != sysctl.CTLTYPE_NODE
def namefmt(ctl, base='vfs.zfs.'):
# base is removed from the name
cut = len(base)
return ctl.name[cut:]
def load_kstats(section):
base = 'kstat.zfs.misc.{section}.'.format(section=section)
fmt = lambda kstat: '{name} : {value}'.format(name=namefmt(kstat, base),
value=kstat.value)
kstats = sysctl.filter(base)
return [fmt(kstat) for kstat in kstats if is_value(kstat)]
def get_params(base):
ctls = sysctl.filter(base)
return {namefmt(ctl): str(ctl.value) for ctl in ctls if is_value(ctl)}
def get_tunable_params():
return get_params('vfs.zfs')
def get_vdev_params():
return get_params('vfs.zfs.vdev')
def get_version_impl(request):
# FreeBSD reports versions for zpl and spa instead of zfs and spl.
name = {'zfs': 'zpl',
'spl': 'spa'}[request]
mib = 'vfs.zfs.version.{}'.format(name)
version = sysctl.filter(mib)[0].value
return '{} version {}'.format(name, version)
def get_descriptions(_request):
ctls = sysctl.filter('vfs.zfs')
return {namefmt(ctl): ctl.description for ctl in ctls if is_value(ctl)}
elif sys.platform.startswith('linux'):
KSTAT_PATH = '/proc/spl/kstat/zfs'
SPL_PATH = '/sys/module/spl/parameters'
TUNABLES_PATH = '/sys/module/zfs/parameters'
VDEV_CACHE_SIZE = 'zfs_vdev_cache_size'
def load_kstats(section):
path = os.path.join(KSTAT_PATH, section)
with open(path) as f:
return list(f)[2:] # Get rid of header
def get_params(basepath):
"""Collect information on the Solaris Porting Layer (SPL) or the
tunables, depending on the PATH given. Does not check if PATH is
legal.
"""
result = {}
for name in os.listdir(basepath):
path = os.path.join(basepath, name)
with open(path) as f:
value = f.read()
result[name] = value.strip()
return result
def get_spl_params():
return get_params(SPL_PATH)
def get_tunable_params():
return get_params(TUNABLES_PATH)
def get_vdev_params():
return get_params(TUNABLES_PATH)
def get_version_impl(request):
# The original arc_summary called /sbin/modinfo/{spl,zfs} to get
# the version information. We switch to /sys/module/{spl,zfs}/version
# to make sure we get what is really loaded in the kernel
try:
with open("/sys/module/{}/version".format(request)) as f:
return f.read().strip()
except:
return "(unknown)"
def get_descriptions(request):
"""Get the descriptions of the Solaris Porting Layer (SPL) or the
tunables, return with minimal formatting.
"""
if request not in ('spl', 'zfs'):
print('ERROR: description of "{0}" requested)'.format(request))
sys.exit(1)
descs = {}
target_prefix = 'parm:'
# We would prefer to do this with /sys/modules -- see the discussion at
# get_version() -- but there isn't a way to get the descriptions from
# there, so we fall back on modinfo
command = ["/sbin/modinfo", request, "-0"]
# The recommended way to do this is with subprocess.run(). However,
# some installed versions of Python are < 3.5, so we offer them
# the option of doing it the old way (for now)
info = ''
try:
if 'run' in dir(subprocess):
info = subprocess.run(command, stdout=subprocess.PIPE,
universal_newlines=True)
raw_output = info.stdout.split('\0')
else:
info = subprocess.check_output(command,
universal_newlines=True)
raw_output = info.split('\0')
except subprocess.CalledProcessError:
print("Error: Descriptions not available",
"(can't access kernel module)")
sys.exit(1)
for line in raw_output:
if not line.startswith(target_prefix):
continue
line = line[len(target_prefix):].strip()
name, raw_desc = line.split(':', 1)
desc = raw_desc.rsplit('(', 1)[0]
if desc == '':
desc = '(No description found)'
descs[name.strip()] = desc.strip()
return descs
def handle_unraisableException(exc_type, exc_value=None, exc_traceback=None,
err_msg=None, object=None):
handle_Exception(exc_type, object, exc_traceback)
def handle_Exception(ex_cls, ex, tb):
if ex_cls is KeyboardInterrupt:
sys.exit()
if ex_cls is BrokenPipeError:
# It turns out that while sys.exit() triggers an exception
# not handled message on Python 3.8+, os._exit() does not.
os._exit(0)
if ex_cls is OSError:
if ex.errno == errno.ENOTCONN:
sys.exit()
raise ex
if hasattr(sys,'unraisablehook'): # Python 3.8+
sys.unraisablehook = handle_unraisableException
sys.excepthook = handle_Exception
def cleanup_line(single_line):
"""Format a raw line of data from /proc and isolate the name value
part, returning a tuple with each. Currently, this gets rid of the
middle '4'. For example "arc_no_grow 4 0" returns the tuple
("arc_no_grow", "0").
"""
name, _, value = single_line.split()
return name, value
def draw_graph(kstats_dict):
"""Draw a primitive graph representing the basic information on the
ARC -- its size and the proportion used by MFU and MRU -- and quit.
We use max size of the ARC to calculate how full it is. This is a
very rough representation.
"""
arc_stats = isolate_section('arcstats', kstats_dict)
GRAPH_INDENT = ' '*4
GRAPH_WIDTH = 60
arc_size = f_bytes(arc_stats['size'])
arc_perc = f_perc(arc_stats['size'], arc_stats['c_max'])
mfu_size = f_bytes(arc_stats['mfu_size'])
mru_size = f_bytes(arc_stats['mru_size'])
meta_limit = f_bytes(arc_stats['arc_meta_limit'])
meta_size = f_bytes(arc_stats['arc_meta_used'])
dnode_limit = f_bytes(arc_stats['arc_dnode_limit'])
dnode_size = f_bytes(arc_stats['dnode_size'])
info_form = ('ARC: {0} ({1}) MFU: {2} MRU: {3} META: {4} ({5}) '
'DNODE {6} ({7})')
info_line = info_form.format(arc_size, arc_perc, mfu_size, mru_size,
meta_size, meta_limit, dnode_size,
dnode_limit)
info_spc = ' '*int((GRAPH_WIDTH-len(info_line))/2)
info_line = GRAPH_INDENT+info_spc+info_line
graph_line = GRAPH_INDENT+'+'+('-'*(GRAPH_WIDTH-2))+'+'
mfu_perc = float(int(arc_stats['mfu_size'])/int(arc_stats['c_max']))
mru_perc = float(int(arc_stats['mru_size'])/int(arc_stats['c_max']))
arc_perc = float(int(arc_stats['size'])/int(arc_stats['c_max']))
total_ticks = float(arc_perc)*GRAPH_WIDTH
mfu_ticks = mfu_perc*GRAPH_WIDTH
mru_ticks = mru_perc*GRAPH_WIDTH
other_ticks = total_ticks-(mfu_ticks+mru_ticks)
core_form = 'F'*int(mfu_ticks)+'R'*int(mru_ticks)+'O'*int(other_ticks)
core_spc = ' '*(GRAPH_WIDTH-(2+len(core_form)))
core_line = GRAPH_INDENT+'|'+core_form+core_spc+'|'
for line in ('', info_line, graph_line, core_line, graph_line, ''):
print(line)
def f_bytes(byte_string):
"""Return human-readable representation of a byte value in
powers of 2 (eg "KiB" for "kibibytes", etc) to two decimal
points. Values smaller than one KiB are returned without
decimal points. Note "bytes" is a reserved keyword.
"""
prefixes = ([2**80, "YiB"], # yobibytes (yotta)
[2**70, "ZiB"], # zebibytes (zetta)
[2**60, "EiB"], # exbibytes (exa)
[2**50, "PiB"], # pebibytes (peta)
[2**40, "TiB"], # tebibytes (tera)
[2**30, "GiB"], # gibibytes (giga)
[2**20, "MiB"], # mebibytes (mega)
[2**10, "KiB"]) # kibibytes (kilo)
bites = int(byte_string)
if bites >= 2**10:
for limit, unit in prefixes:
if bites >= limit:
value = bites / limit
break
result = '{0:.1f} {1}'.format(value, unit)
else:
result = '{0} Bytes'.format(bites)
return result
def f_hits(hits_string):
"""Create a human-readable representation of the number of hits.
The single-letter symbols used are SI to avoid the confusion caused
by the different "short scale" and "long scale" representations in
English, which use the same words for different values. See
https://en.wikipedia.org/wiki/Names_of_large_numbers and:
https://physics.nist.gov/cuu/Units/prefixes.html
"""
numbers = ([10**24, 'Y'], # yotta (septillion)
[10**21, 'Z'], # zetta (sextillion)
[10**18, 'E'], # exa (quintrillion)
[10**15, 'P'], # peta (quadrillion)
[10**12, 'T'], # tera (trillion)
[10**9, 'G'], # giga (billion)
[10**6, 'M'], # mega (million)
[10**3, 'k']) # kilo (thousand)
hits = int(hits_string)
if hits >= 1000:
for limit, symbol in numbers:
if hits >= limit:
value = hits/limit
break
result = "%0.1f%s" % (value, symbol)
else:
result = "%d" % hits
return result
def f_perc(value1, value2):
"""Calculate percentage and return in human-readable form. If
rounding produces the result '0.0' though the first number is
not zero, include a 'less-than' symbol to avoid confusion.
Division by zero is handled by returning 'n/a'; no error
is called.
"""
v1 = float(value1)
v2 = float(value2)
try:
perc = 100 * v1/v2
except ZeroDivisionError:
result = 'n/a'
else:
result = '{0:0.1f} %'.format(perc)
if result == '0.0 %' and v1 > 0:
result = '< 0.1 %'
return result
def format_raw_line(name, value):
"""For the --raw option for the tunable and SPL outputs, decide on the
correct formatting based on the --alternate flag.
"""
if ARGS.alt:
result = '{0}{1}={2}'.format(INDENT, name, value)
else:
# Right-align the value within the line length if it fits,
# otherwise just separate it from the name by a single space.
fit = LINE_LENGTH - len(INDENT) - len(name)
overflow = len(value) + 1
w = max(fit, overflow)
result = '{0}{1}{2:>{w}}'.format(INDENT, name, value, w=w)
return result
def get_kstats():
"""Collect information on the ZFS subsystem. The step does not perform any
further processing, giving us the option to only work on what is actually
needed. The name "kstat" is a holdover from the Solaris utility of the same
name.
"""
result = {}
for section in SECTION_PATHS.values():
if section not in result:
result[section] = load_kstats(section)
return result
def get_version(request):
"""Get the version number of ZFS or SPL on this machine for header.
Returns an error string, but does not raise an error, if we can't
get the ZFS/SPL version.
"""
if request not in ('spl', 'zfs'):
error_msg = '(ERROR: "{0}" requested)'.format(request)
return error_msg
return get_version_impl(request)
def print_header():
"""Print the initial heading with date and time as well as info on the
kernel and ZFS versions. This is not called for the graph.
"""
# datetime is now recommended over time but we keep the exact formatting
# from the older version of arc_summary in case there are scripts
# that expect it in this way
daydate = time.strftime(DATE_FORMAT)
spc_date = LINE_LENGTH-len(daydate)
sys_version = os.uname()
sys_msg = sys_version.sysname+' '+sys_version.release
zfs = get_version('zfs')
spc_zfs = LINE_LENGTH-len(zfs)
machine_msg = 'Machine: '+sys_version.nodename+' ('+sys_version.machine+')'
spl = get_version('spl')
spc_spl = LINE_LENGTH-len(spl)
print('\n'+('-'*LINE_LENGTH))
print('{0:<{spc}}{1}'.format(TITLE, daydate, spc=spc_date))
print('{0:<{spc}}{1}'.format(sys_msg, zfs, spc=spc_zfs))
print('{0:<{spc}}{1}\n'.format(machine_msg, spl, spc=spc_spl))
def print_raw(kstats_dict):
"""Print all available data from the system in a minimally sorted format.
This can be used as a source to be piped through 'grep'.
"""
sections = sorted(kstats_dict.keys())
for section in sections:
print('\n{0}:'.format(section.upper()))
lines = sorted(kstats_dict[section])
for line in lines:
name, value = cleanup_line(line)
print(format_raw_line(name, value))
# Tunables and SPL must be handled separately because they come from a
# different source and have descriptions the user might request
print()
section_spl()
section_tunables()
def isolate_section(section_name, kstats_dict):
"""From the complete information on all sections, retrieve only those
for one section.
"""
try:
section_data = kstats_dict[section_name]
except KeyError:
print('ERROR: Data on {0} not available'.format(section_data))
sys.exit(1)
section_dict = dict(cleanup_line(l) for l in section_data)
return section_dict
# Formatted output helper functions
def prt_1(text, value):
"""Print text and one value, no indent"""
spc = ' '*(LINE_LENGTH-(len(text)+len(value)))
print('{0}{spc}{1}'.format(text, value, spc=spc))
def prt_i1(text, value):
"""Print text and one value, with indent"""
spc = ' '*(LINE_LENGTH-(len(INDENT)+len(text)+len(value)))
print(INDENT+'{0}{spc}{1}'.format(text, value, spc=spc))
def prt_2(text, value1, value2):
"""Print text and two values, no indent"""
values = '{0:>9} {1:>9}'.format(value1, value2)
spc = ' '*(LINE_LENGTH-(len(text)+len(values)+2))
print('{0}{spc} {1}'.format(text, values, spc=spc))
def prt_i2(text, value1, value2):
"""Print text and two values, with indent"""
values = '{0:>9} {1:>9}'.format(value1, value2)
spc = ' '*(LINE_LENGTH-(len(INDENT)+len(text)+len(values)+2))
print(INDENT+'{0}{spc} {1}'.format(text, values, spc=spc))
# The section output concentrates on important parameters instead of
# being exhaustive (that is what the --raw parameter is for)
def section_arc(kstats_dict):
"""Give basic information on the ARC, MRU and MFU. This is the first
and most used section.
"""
arc_stats = isolate_section('arcstats', kstats_dict)
throttle = arc_stats['memory_throttle_count']
if throttle == '0':
health = 'HEALTHY'
else:
health = 'THROTTLED'
prt_1('ARC status:', health)
prt_i1('Memory throttle count:', throttle)
print()
arc_size = arc_stats['size']
arc_target_size = arc_stats['c']
arc_max = arc_stats['c_max']
arc_min = arc_stats['c_min']
mfu_size = arc_stats['mfu_size']
mru_size = arc_stats['mru_size']
meta_limit = arc_stats['arc_meta_limit']
meta_size = arc_stats['arc_meta_used']
dnode_limit = arc_stats['arc_dnode_limit']
dnode_size = arc_stats['dnode_size']
target_size_ratio = '{0}:1'.format(int(arc_max) // int(arc_min))
prt_2('ARC size (current):',
f_perc(arc_size, arc_max), f_bytes(arc_size))
prt_i2('Target size (adaptive):',
f_perc(arc_target_size, arc_max), f_bytes(arc_target_size))
prt_i2('Min size (hard limit):',
f_perc(arc_min, arc_max), f_bytes(arc_min))
prt_i2('Max size (high water):',
target_size_ratio, f_bytes(arc_max))
caches_size = int(mfu_size)+int(mru_size)
prt_i2('Most Frequently Used (MFU) cache size:',
f_perc(mfu_size, caches_size), f_bytes(mfu_size))
prt_i2('Most Recently Used (MRU) cache size:',
f_perc(mru_size, caches_size), f_bytes(mru_size))
prt_i2('Metadata cache size (hard limit):',
f_perc(meta_limit, arc_max), f_bytes(meta_limit))
prt_i2('Metadata cache size (current):',
f_perc(meta_size, meta_limit), f_bytes(meta_size))
prt_i2('Dnode cache size (hard limit):',
f_perc(dnode_limit, meta_limit), f_bytes(dnode_limit))
prt_i2('Dnode cache size (current):',
f_perc(dnode_size, dnode_limit), f_bytes(dnode_size))
print()
print('ARC hash breakdown:')
prt_i1('Elements max:', f_hits(arc_stats['hash_elements_max']))
prt_i2('Elements current:',
f_perc(arc_stats['hash_elements'], arc_stats['hash_elements_max']),
f_hits(arc_stats['hash_elements']))
prt_i1('Collisions:', f_hits(arc_stats['hash_collisions']))
prt_i1('Chain max:', f_hits(arc_stats['hash_chain_max']))
prt_i1('Chains:', f_hits(arc_stats['hash_chains']))
print()
print('ARC misc:')
prt_i1('Deleted:', f_hits(arc_stats['deleted']))
prt_i1('Mutex misses:', f_hits(arc_stats['mutex_miss']))
prt_i1('Eviction skips:', f_hits(arc_stats['evict_skip']))
prt_i1('Eviction skips due to L2 writes:',
f_hits(arc_stats['evict_l2_skip']))
prt_i1('L2 cached evictions:', f_bytes(arc_stats['evict_l2_cached']))
prt_i1('L2 eligible evictions:', f_bytes(arc_stats['evict_l2_eligible']))
prt_i2('L2 eligible MFU evictions:',
f_perc(arc_stats['evict_l2_eligible_mfu'],
arc_stats['evict_l2_eligible']),
f_bytes(arc_stats['evict_l2_eligible_mfu']))
prt_i2('L2 eligible MRU evictions:',
f_perc(arc_stats['evict_l2_eligible_mru'],
arc_stats['evict_l2_eligible']),
f_bytes(arc_stats['evict_l2_eligible_mru']))
prt_i1('L2 ineligible evictions:',
f_bytes(arc_stats['evict_l2_ineligible']))
print()
def section_archits(kstats_dict):
"""Print information on how the caches are accessed ("arc hits").
"""
arc_stats = isolate_section('arcstats', kstats_dict)
all_accesses = int(arc_stats['hits'])+int(arc_stats['misses'])
actual_hits = int(arc_stats['mfu_hits'])+int(arc_stats['mru_hits'])
prt_1('ARC total accesses (hits + misses):', f_hits(all_accesses))
ta_todo = (('Cache hit ratio:', arc_stats['hits']),
('Cache miss ratio:', arc_stats['misses']),
('Actual hit ratio (MFU + MRU hits):', actual_hits))
for title, value in ta_todo:
prt_i2(title, f_perc(value, all_accesses), f_hits(value))
dd_total = int(arc_stats['demand_data_hits']) +\
int(arc_stats['demand_data_misses'])
prt_i2('Data demand efficiency:',
f_perc(arc_stats['demand_data_hits'], dd_total),
f_hits(dd_total))
dp_total = int(arc_stats['prefetch_data_hits']) +\
int(arc_stats['prefetch_data_misses'])
prt_i2('Data prefetch efficiency:',
f_perc(arc_stats['prefetch_data_hits'], dp_total),
f_hits(dp_total))
known_hits = int(arc_stats['mfu_hits']) +\
int(arc_stats['mru_hits']) +\
int(arc_stats['mfu_ghost_hits']) +\
int(arc_stats['mru_ghost_hits'])
anon_hits = int(arc_stats['hits'])-known_hits
print()
print('Cache hits by cache type:')
cl_todo = (('Most frequently used (MFU):', arc_stats['mfu_hits']),
('Most recently used (MRU):', arc_stats['mru_hits']),
('Most frequently used (MFU) ghost:',
arc_stats['mfu_ghost_hits']),
('Most recently used (MRU) ghost:',
arc_stats['mru_ghost_hits']))
for title, value in cl_todo:
prt_i2(title, f_perc(value, arc_stats['hits']), f_hits(value))
# For some reason, anon_hits can turn negative, which is weird. Until we
# have figured out why this happens, we just hide the problem, following
# the behavior of the original arc_summary.
if anon_hits >= 0:
prt_i2('Anonymously used:',
f_perc(anon_hits, arc_stats['hits']), f_hits(anon_hits))
print()
print('Cache hits by data type:')
dt_todo = (('Demand data:', arc_stats['demand_data_hits']),
('Prefetch data:', arc_stats['prefetch_data_hits']),
('Demand metadata:', arc_stats['demand_metadata_hits']),
('Prefetch metadata:',
arc_stats['prefetch_metadata_hits']))
for title, value in dt_todo:
prt_i2(title, f_perc(value, arc_stats['hits']), f_hits(value))
print()
print('Cache misses by data type:')
dm_todo = (('Demand data:', arc_stats['demand_data_misses']),
('Prefetch data:',
arc_stats['prefetch_data_misses']),
('Demand metadata:', arc_stats['demand_metadata_misses']),
('Prefetch metadata:',
arc_stats['prefetch_metadata_misses']))
for title, value in dm_todo:
prt_i2(title, f_perc(value, arc_stats['misses']), f_hits(value))
print()
def section_dmu(kstats_dict):
"""Collect information on the DMU"""
zfetch_stats = isolate_section('zfetchstats', kstats_dict)
zfetch_access_total = int(zfetch_stats['hits'])+int(zfetch_stats['misses'])
prt_1('DMU prefetch efficiency:', f_hits(zfetch_access_total))
prt_i2('Hit ratio:', f_perc(zfetch_stats['hits'], zfetch_access_total),
f_hits(zfetch_stats['hits']))
prt_i2('Miss ratio:', f_perc(zfetch_stats['misses'], zfetch_access_total),
f_hits(zfetch_stats['misses']))
print()
def section_l2arc(kstats_dict):
"""Collect information on L2ARC device if present. If not, tell user
that we're skipping the section.
"""
# The L2ARC statistics live in the same section as the normal ARC stuff
arc_stats = isolate_section('arcstats', kstats_dict)
if arc_stats['l2_size'] == '0':
print('L2ARC not detected, skipping section\n')
return
l2_errors = int(arc_stats['l2_writes_error']) +\
int(arc_stats['l2_cksum_bad']) +\
int(arc_stats['l2_io_error'])
l2_access_total = int(arc_stats['l2_hits'])+int(arc_stats['l2_misses'])
health = 'HEALTHY'
if l2_errors > 0:
health = 'DEGRADED'
prt_1('L2ARC status:', health)
l2_todo = (('Low memory aborts:', 'l2_abort_lowmem'),
('Free on write:', 'l2_free_on_write'),
('R/W clashes:', 'l2_rw_clash'),
('Bad checksums:', 'l2_cksum_bad'),
('I/O errors:', 'l2_io_error'))
for title, value in l2_todo:
prt_i1(title, f_hits(arc_stats[value]))
print()
prt_1('L2ARC size (adaptive):', f_bytes(arc_stats['l2_size']))
prt_i2('Compressed:', f_perc(arc_stats['l2_asize'], arc_stats['l2_size']),
f_bytes(arc_stats['l2_asize']))
prt_i2('Header size:',
f_perc(arc_stats['l2_hdr_size'], arc_stats['l2_size']),
f_bytes(arc_stats['l2_hdr_size']))
prt_i2('MFU allocated size:',
f_perc(arc_stats['l2_mfu_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_mfu_asize']))
prt_i2('MRU allocated size:',
f_perc(arc_stats['l2_mru_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_mru_asize']))
prt_i2('Prefetch allocated size:',
f_perc(arc_stats['l2_prefetch_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_prefetch_asize']))
prt_i2('Data (buffer content) allocated size:',
f_perc(arc_stats['l2_bufc_data_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_bufc_data_asize']))
prt_i2('Metadata (buffer content) allocated size:',
f_perc(arc_stats['l2_bufc_metadata_asize'], arc_stats['l2_asize']),
f_bytes(arc_stats['l2_bufc_metadata_asize']))
print()
prt_1('L2ARC breakdown:', f_hits(l2_access_total))
prt_i2('Hit ratio:',
f_perc(arc_stats['l2_hits'], l2_access_total),
f_hits(arc_stats['l2_hits']))
prt_i2('Miss ratio:',
f_perc(arc_stats['l2_misses'], l2_access_total),
f_hits(arc_stats['l2_misses']))
prt_i1('Feeds:', f_hits(arc_stats['l2_feeds']))
print()
print('L2ARC writes:')
if arc_stats['l2_writes_done'] != arc_stats['l2_writes_sent']:
prt_i2('Writes sent:', 'FAULTED', f_hits(arc_stats['l2_writes_sent']))
prt_i2('Done ratio:',
f_perc(arc_stats['l2_writes_done'],
arc_stats['l2_writes_sent']),
f_hits(arc_stats['l2_writes_done']))
prt_i2('Error ratio:',
f_perc(arc_stats['l2_writes_error'],
arc_stats['l2_writes_sent']),
f_hits(arc_stats['l2_writes_error']))
else:
prt_i2('Writes sent:', '100 %', f_hits(arc_stats['l2_writes_sent']))
print()
print('L2ARC evicts:')
prt_i1('Lock retries:', f_hits(arc_stats['l2_evict_lock_retry']))
prt_i1('Upon reading:', f_hits(arc_stats['l2_evict_reading']))
print()
def section_spl(*_):
"""Print the SPL parameters, if requested with alternative format
and/or descriptions. This does not use kstats.
"""
if sys.platform.startswith('freebsd'):
# No SPL support in FreeBSD
return
spls = get_spl_params()
keylist = sorted(spls.keys())
print('Solaris Porting Layer (SPL):')
if ARGS.desc:
descriptions = get_descriptions('spl')
for key in keylist:
value = spls[key]
if ARGS.desc:
try:
print(INDENT+'#', descriptions[key])
except KeyError:
print(INDENT+'# (No description found)') # paranoid
print(format_raw_line(key, value))
print()
def section_tunables(*_):
"""Print the tunables, if requested with alternative format and/or
descriptions. This does not use kstasts.
"""
tunables = get_tunable_params()
keylist = sorted(tunables.keys())
print('Tunables:')
if ARGS.desc:
descriptions = get_descriptions('zfs')
for key in keylist:
value = tunables[key]
if ARGS.desc:
try:
print(INDENT+'#', descriptions[key])
except KeyError:
print(INDENT+'# (No description found)') # paranoid
print(format_raw_line(key, value))
print()
def section_vdev(kstats_dict):
"""Collect information on VDEV caches"""
# Currently [Nov 2017] the VDEV cache is disabled, because it is actually
# harmful. When this is the case, we just skip the whole entry. See
# https://github.com/openzfs/zfs/blob/master/module/zfs/vdev_cache.c
# for details
tunables = get_vdev_params()
if tunables[VDEV_CACHE_SIZE] == '0':
print('VDEV cache disabled, skipping section\n')
return
vdev_stats = isolate_section('vdev_cache_stats', kstats_dict)
vdev_cache_total = int(vdev_stats['hits']) +\
int(vdev_stats['misses']) +\
int(vdev_stats['delegations'])
prt_1('VDEV cache summary:', f_hits(vdev_cache_total))
prt_i2('Hit ratio:', f_perc(vdev_stats['hits'], vdev_cache_total),
f_hits(vdev_stats['hits']))
prt_i2('Miss ratio:', f_perc(vdev_stats['misses'], vdev_cache_total),
f_hits(vdev_stats['misses']))
prt_i2('Delegations:', f_perc(vdev_stats['delegations'], vdev_cache_total),
f_hits(vdev_stats['delegations']))
print()
def section_zil(kstats_dict):
"""Collect information on the ZFS Intent Log. Some of the information
taken from https://github.com/openzfs/zfs/blob/master/include/sys/zil.h
"""
zil_stats = isolate_section('zil', kstats_dict)
prt_1('ZIL committed transactions:',
f_hits(zil_stats['zil_itx_count']))
prt_i1('Commit requests:', f_hits(zil_stats['zil_commit_count']))
prt_i1('Flushes to stable storage:',
f_hits(zil_stats['zil_commit_writer_count']))
prt_i2('Transactions to SLOG storage pool:',
f_bytes(zil_stats['zil_itx_metaslab_slog_bytes']),
f_hits(zil_stats['zil_itx_metaslab_slog_count']))
prt_i2('Transactions to non-SLOG storage pool:',
f_bytes(zil_stats['zil_itx_metaslab_normal_bytes']),
f_hits(zil_stats['zil_itx_metaslab_normal_count']))
print()
section_calls = {'arc': section_arc,
'archits': section_archits,
'dmu': section_dmu,
'l2arc': section_l2arc,
'spl': section_spl,
'tunables': section_tunables,
'vdev': section_vdev,
'zil': section_zil}
def main():
"""Run program. The options to draw a graph and to print all data raw are
treated separately because they come with their own call.
"""
kstats = get_kstats()
if ARGS.graph:
draw_graph(kstats)
sys.exit(0)
print_header()
if ARGS.raw:
print_raw(kstats)
elif ARGS.section:
try:
section_calls[ARGS.section](kstats)
except KeyError:
print('Error: Section "{0}" unknown'.format(ARGS.section))
sys.exit(1)
elif ARGS.page:
print('WARNING: Pages are deprecated, please use "--section"\n')
pages_to_calls = {1: 'arc',
2: 'archits',
3: 'l2arc',
4: 'dmu',
5: 'vdev',
6: 'tunables'}
try:
call = pages_to_calls[ARGS.page]
except KeyError:
print('Error: Page "{0}" not supported'.format(ARGS.page))
sys.exit(1)
else:
section_calls[call](kstats)
else:
# If no parameters were given, we print all sections. We might want to
# change the sequence by hand
calls = sorted(section_calls.keys())
for section in calls:
section_calls[section](kstats)
sys.exit(0)
if __name__ == '__main__':
main()
-1
View File
@@ -1 +0,0 @@
arcstat
+1 -5
View File
@@ -1,5 +1 @@
include $(top_srcdir)/config/Substfiles.am dist_bin_SCRIPTS = arcstat.py
bin_SCRIPTS = arcstat
SUBSTFILES += $(bin_SCRIPTS)
+66 -158
View File
@@ -1,25 +1,20 @@
#!/usr/bin/env @PYTHON_SHEBANG@ #!/usr/bin/python
# #
# Print out ZFS ARC Statistics exported via kstat(1) # Print out ZFS ARC Statistics exported via kstat(1)
# For a definition of fields, or usage, use arcstat -v # For a definition of fields, or usage, use arctstat.pl -v
# #
# This script was originally a fork of the original arcstat.pl (0.1) # This script is a fork of the original arcstat.pl (0.1) by
# by Neelakanth Nadgir, originally published on his Sun blog on # Neelakanth Nadgir, originally published on his Sun blog on
# 09/18/2007 # 09/18/2007
# http://blogs.sun.com/realneel/entry/zfs_arc_statistics # http://blogs.sun.com/realneel/entry/zfs_arc_statistics
# #
# A new version aimed to improve upon the original by adding features # This version aims to improve upon the original by adding features
# and fixing bugs as needed. This version was maintained by Mike # and fixing bugs as needed. This version is maintained by
# Harsch and was hosted in a public open source repository: # Mike Harsch and is hosted in a public open source repository:
# http://github.com/mharsch/arcstat # http://github.com/mharsch/arcstat
# #
# but has since moved to the illumos-gate repository. # Comments, Questions, or Suggestions are always welcome.
# # Contact the maintainer at ( mike at harschsystems dot com )
# This Python port was written by John Hixson for FreeNAS, introduced
# in commit e2c29f:
# https://github.com/freenas/freenas
#
# and has been improved by many people since.
# #
# CDDL HEADER START # CDDL HEADER START
# #
@@ -47,8 +42,7 @@
# @hdr is the array of fields that needs to be printed, so we # @hdr is the array of fields that needs to be printed, so we
# just iterate over this array and print the values using our pretty printer. # just iterate over this array and print the values using our pretty printer.
# #
# This script must remain compatible with Python 2.6+ and Python 3.4+.
#
import sys import sys
import time import time
@@ -56,16 +50,16 @@ import getopt
import re import re
import copy import copy
from decimal import Decimal
from signal import signal, SIGINT, SIGWINCH, SIG_DFL from signal import signal, SIGINT, SIGWINCH, SIG_DFL
cols = { cols = {
# HDR: [Size, Scale, Description] # HDR: [Size, Scale, Description]
"time": [8, -1, "Time"], "time": [8, -1, "Time"],
"hits": [4, 1000, "ARC reads per second"], "hits": [4, 1000, "ARC reads per second"],
"miss": [4, 1000, "ARC misses per second"], "miss": [4, 1000, "ARC misses per second"],
"read": [4, 1000, "Total ARC accesses per second"], "read": [4, 1000, "Total ARC accesses per second"],
"hit%": [4, 100, "ARC hit percentage"], "hit%": [4, 100, "ARC Hit percentage"],
"miss%": [5, 100, "ARC miss percentage"], "miss%": [5, 100, "ARC miss percentage"],
"dhit": [4, 1000, "Demand hits per second"], "dhit": [4, 1000, "Demand hits per second"],
"dmis": [4, 1000, "Demand misses per second"], "dmis": [4, 1000, "Demand misses per second"],
@@ -77,23 +71,16 @@ cols = {
"pm%": [3, 100, "Prefetch miss percentage"], "pm%": [3, 100, "Prefetch miss percentage"],
"mhit": [4, 1000, "Metadata hits per second"], "mhit": [4, 1000, "Metadata hits per second"],
"mmis": [4, 1000, "Metadata misses per second"], "mmis": [4, 1000, "Metadata misses per second"],
"mread": [5, 1000, "Metadata accesses per second"], "mread": [4, 1000, "Metadata accesses per second"],
"mh%": [3, 100, "Metadata hit percentage"], "mh%": [3, 100, "Metadata hit percentage"],
"mm%": [3, 100, "Metadata miss percentage"], "mm%": [3, 100, "Metadata miss percentage"],
"arcsz": [5, 1024, "ARC size"], "arcsz": [5, 1024, "ARC Size"],
"size": [4, 1024, "ARC size"], "c": [4, 1024, "ARC Target Size"],
"c": [4, 1024, "ARC target size"], "mfu": [4, 1000, "MFU List hits per second"],
"mfu": [4, 1000, "MFU list hits per second"], "mru": [4, 1000, "MRU List hits per second"],
"mru": [4, 1000, "MRU list hits per second"], "mfug": [4, 1000, "MFU Ghost List hits per second"],
"mfug": [4, 1000, "MFU ghost list hits per second"], "mrug": [4, 1000, "MRU Ghost List hits per second"],
"mrug": [4, 1000, "MRU ghost list hits per second"],
"eskip": [5, 1000, "evict_skip per second"], "eskip": [5, 1000, "evict_skip per second"],
"el2skip": [7, 1000, "evict skip, due to l2 writes, per second"],
"el2cach": [7, 1024, "Size of L2 cached evictions per second"],
"el2el": [5, 1024, "Size of L2 eligible evictions per second"],
"el2mfu": [6, 1024, "Size of L2 eligible MFU evictions per second"],
"el2mru": [6, 1024, "Size of L2 eligible MRU evictions per second"],
"el2inel": [7, 1024, "Size of L2 ineligible evictions per second"],
"mtxmis": [6, 1000, "mutex_miss per second"], "mtxmis": [6, 1000, "mutex_miss per second"],
"dread": [5, 1000, "Demand accesses per second"], "dread": [5, 1000, "Demand accesses per second"],
"pread": [5, 1000, "Prefetch accesses per second"], "pread": [5, 1000, "Prefetch accesses per second"],
@@ -102,29 +89,14 @@ cols = {
"l2read": [6, 1000, "Total L2ARC accesses per second"], "l2read": [6, 1000, "Total L2ARC accesses per second"],
"l2hit%": [6, 100, "L2ARC access hit percentage"], "l2hit%": [6, 100, "L2ARC access hit percentage"],
"l2miss%": [7, 100, "L2ARC access miss percentage"], "l2miss%": [7, 100, "L2ARC access miss percentage"],
"l2pref": [6, 1024, "L2ARC prefetch allocated size"],
"l2mfu": [5, 1024, "L2ARC MFU allocated size"],
"l2mru": [5, 1024, "L2ARC MRU allocated size"],
"l2data": [6, 1024, "L2ARC data allocated size"],
"l2meta": [6, 1024, "L2ARC metadata allocated size"],
"l2pref%": [7, 100, "L2ARC prefetch percentage"],
"l2mfu%": [6, 100, "L2ARC MFU percentage"],
"l2mru%": [6, 100, "L2ARC MRU percentage"],
"l2data%": [7, 100, "L2ARC data percentage"],
"l2meta%": [7, 100, "L2ARC metadata percentage"],
"l2asize": [7, 1024, "Actual (compressed) size of the L2ARC"], "l2asize": [7, 1024, "Actual (compressed) size of the L2ARC"],
"l2size": [6, 1024, "Size of the L2ARC"], "l2size": [6, 1024, "Size of the L2ARC"],
"l2bytes": [7, 1024, "Bytes read per second from the L2ARC"], "l2bytes": [7, 1024, "bytes read per second from the L2ARC"],
"grow": [4, 1000, "ARC grow disabled"],
"need": [4, 1024, "ARC reclaim need"],
"free": [4, 1024, "ARC free memory"],
"avail": [5, 1024, "ARC available memory"],
"waste": [5, 1024, "Wasted memory due to round up to pagesize"],
} }
v = {} v = {}
hdr = ["time", "read", "miss", "miss%", "dmis", "dm%", "pmis", "pm%", "mmis", hdr = ["time", "read", "miss", "miss%", "dmis", "dm%", "pmis", "pm%", "mmis",
"mm%", "size", "c", "avail"] "mm%", "arcsz", "c"]
xhdr = ["time", "mfu", "mru", "mfug", "mrug", "eskip", "mtxmis", "dread", xhdr = ["time", "mfu", "mru", "mfug", "mrug", "eskip", "mtxmis", "dread",
"pread", "read"] "pread", "read"]
sint = 1 # Default interval is 1 second sint = 1 # Default interval is 1 second
@@ -134,56 +106,12 @@ opfile = None
sep = " " # Default separator is 2 spaces sep = " " # Default separator is 2 spaces
version = "0.4" version = "0.4"
l2exist = False l2exist = False
cmd = ("Usage: arcstat [-havxp] [-f fields] [-o file] [-s string] [interval " cmd = ("Usage: arcstat.py [-hvx] [-f fields] [-o file] [-s string] [interval "
"[count]]\n") "[count]]\n")
cur = {} cur = {}
d = {} d = {}
out = None out = None
kstat = None kstat = None
pretty_print = True
if sys.platform.startswith('freebsd'):
# Requires py-sysctl on FreeBSD
import sysctl
def kstat_update():
global kstat
k = [ctl for ctl in sysctl.filter('kstat.zfs.misc.arcstats')
if ctl.type != sysctl.CTLTYPE_NODE]
if not k:
sys.exit(1)
kstat = {}
for s in k:
if not s:
continue
name, value = s.name, s.value
# Trims 'kstat.zfs.misc.arcstats' from the name
kstat[name[24:]] = int(value)
elif sys.platform.startswith('linux'):
def kstat_update():
global kstat
k = [line.strip() for line in open('/proc/spl/kstat/zfs/arcstats')]
if not k:
sys.exit(1)
del k[0:2]
kstat = {}
for s in k:
if not s:
continue
name, unused, value = s.split()
kstat[name] = int(value)
def detailed_usage(): def detailed_usage():
@@ -199,7 +127,6 @@ def detailed_usage():
def usage(): def usage():
sys.stderr.write("%s\n" % cmd) sys.stderr.write("%s\n" % cmd)
sys.stderr.write("\t -h : Print this help message\n") sys.stderr.write("\t -h : Print this help message\n")
sys.stderr.write("\t -a : Print all possible stats\n")
sys.stderr.write("\t -v : List all possible field headers and definitions" sys.stderr.write("\t -v : List all possible field headers and definitions"
"\n") "\n")
sys.stderr.write("\t -x : Print extended stats\n") sys.stderr.write("\t -x : Print extended stats\n")
@@ -207,17 +134,35 @@ def usage():
sys.stderr.write("\t -o : Redirect output to the specified file\n") sys.stderr.write("\t -o : Redirect output to the specified file\n")
sys.stderr.write("\t -s : Override default field separator with custom " sys.stderr.write("\t -s : Override default field separator with custom "
"character or string\n") "character or string\n")
sys.stderr.write("\t -p : Disable auto-scaling of numerical fields\n")
sys.stderr.write("\nExamples:\n") sys.stderr.write("\nExamples:\n")
sys.stderr.write("\tarcstat -o /tmp/a.log 2 10\n") sys.stderr.write("\tarcstat.py -o /tmp/a.log 2 10\n")
sys.stderr.write("\tarcstat -s \",\" -o /tmp/a.log 2 10\n") sys.stderr.write("\tarcstat.py -s \",\" -o /tmp/a.log 2 10\n")
sys.stderr.write("\tarcstat -v\n") sys.stderr.write("\tarcstat.py -v\n")
sys.stderr.write("\tarcstat -f time,hit%,dh%,ph%,mh% 1\n") sys.stderr.write("\tarcstat.py -f time,hit%,dh%,ph%,mh% 1\n")
sys.stderr.write("\n") sys.stderr.write("\n")
sys.exit(1) sys.exit(1)
def kstat_update():
global kstat
k = [line.strip() for line in open('/proc/spl/kstat/zfs/arcstats')]
if not k:
sys.exit(1)
del k[0:2]
kstat = {}
for s in k:
if not s:
continue
name, unused, value = s.split()
kstat[name] = Decimal(value)
def snap_stats(): def snap_stats():
global cur global cur
global kstat global kstat
@@ -248,7 +193,7 @@ def prettynum(sz, scale, num=0):
elif 0 < num < 1: elif 0 < num < 1:
num = 0 num = 0
while abs(num) > scale and index < 5: while num > scale and index < 5:
save = num save = num
num = num / scale num = num / scale
index += 1 index += 1
@@ -256,7 +201,7 @@ def prettynum(sz, scale, num=0):
if index == 0: if index == 0:
return "%*d" % (sz, num) return "%*d" % (sz, num)
if abs(save / scale) < 10: if (save / scale) < 10:
return "%*.1f%s" % (sz - 1, num, suffix[index]) return "%*.1f%s" % (sz - 1, num, suffix[index])
else: else:
return "%*d%s" % (sz - 1, num, suffix[index]) return "%*d%s" % (sz - 1, num, suffix[index])
@@ -266,14 +211,12 @@ def print_values():
global hdr global hdr
global sep global sep
global v global v
global pretty_print
if pretty_print: for col in hdr:
fmt = lambda col: prettynum(cols[col][0], cols[col][1], v[col]) sys.stdout.write("%s%s" % (
else: prettynum(cols[col][0], cols[col][1], v[col]),
fmt = lambda col: str(v[col]) sep
))
sys.stdout.write(sep.join(fmt(col) for col in hdr))
sys.stdout.write("\n") sys.stdout.write("\n")
sys.stdout.flush() sys.stdout.flush()
@@ -281,14 +224,9 @@ def print_values():
def print_header(): def print_header():
global hdr global hdr
global sep global sep
global pretty_print
if pretty_print: for col in hdr:
fmt = lambda col: "%*s" % (cols[col][0], col) sys.stdout.write("%*s%s" % (cols[col][0], col, sep))
else:
fmt = lambda col: col
sys.stdout.write(sep.join(fmt(col) for col in hdr))
sys.stdout.write("\n") sys.stdout.write("\n")
@@ -325,10 +263,8 @@ def init():
global sep global sep
global out global out
global l2exist global l2exist
global pretty_print
desired_cols = None desired_cols = None
aflag = False
xflag = False xflag = False
hflag = False hflag = False
vflag = False vflag = False
@@ -337,16 +273,14 @@ def init():
try: try:
opts, args = getopt.getopt( opts, args = getopt.getopt(
sys.argv[1:], sys.argv[1:],
"axo:hvs:f:p", "xo:hvs:f:",
[ [
"all",
"extended", "extended",
"outfile", "outfile",
"help", "help",
"verbose", "verbose",
"separator", "separator",
"columns", "columns"
"parsable"
] ]
) )
except getopt.error as msg: except getopt.error as msg:
@@ -355,8 +289,6 @@ def init():
opts = None opts = None
for opt, arg in opts: for opt, arg in opts:
if opt in ('-a', '--all'):
aflag = True
if opt in ('-x', '--extended'): if opt in ('-x', '--extended'):
xflag = True xflag = True
if opt in ('-o', '--outfile'): if opt in ('-o', '--outfile'):
@@ -372,13 +304,19 @@ def init():
if opt in ('-f', '--columns'): if opt in ('-f', '--columns'):
desired_cols = arg desired_cols = arg
i += 1 i += 1
if opt in ('-p', '--parsable'):
pretty_print = False
i += 1 i += 1
argv = sys.argv[i:] argv = sys.argv[i:]
sint = int(argv[0]) if argv else sint sint = Decimal(argv[0]) if argv else sint
count = int(argv[1]) if len(argv) > 1 else (0 if len(argv) > 0 else 1) count = int(argv[1]) if len(argv) > 1 else count
if len(argv) > 1:
sint = Decimal(argv[0])
count = int(argv[1])
elif len(argv) > 0:
sint = Decimal(argv[0])
count = 0
if hflag or (xflag and desired_cols): if hflag or (xflag and desired_cols):
usage() usage()
@@ -418,12 +356,6 @@ def init():
incompat) incompat)
usage() usage()
if aflag:
if l2exist:
hdr = cols.keys()
else:
hdr = [col for col in cols.keys() if not col.startswith("l2")]
if opfile: if opfile:
try: try:
out = open(opfile, "w") out = open(opfile, "w")
@@ -472,19 +404,12 @@ def calculate():
v["mm%"] = 100 - v["mh%"] if v["mread"] > 0 else 0 v["mm%"] = 100 - v["mh%"] if v["mread"] > 0 else 0
v["arcsz"] = cur["size"] v["arcsz"] = cur["size"]
v["size"] = cur["size"]
v["c"] = cur["c"] v["c"] = cur["c"]
v["mfu"] = d["mfu_hits"] / sint v["mfu"] = d["mfu_hits"] / sint
v["mru"] = d["mru_hits"] / sint v["mru"] = d["mru_hits"] / sint
v["mrug"] = d["mru_ghost_hits"] / sint v["mrug"] = d["mru_ghost_hits"] / sint
v["mfug"] = d["mfu_ghost_hits"] / sint v["mfug"] = d["mfu_ghost_hits"] / sint
v["eskip"] = d["evict_skip"] / sint v["eskip"] = d["evict_skip"] / sint
v["el2skip"] = d["evict_l2_skip"] / sint
v["el2cach"] = d["evict_l2_cached"] / sint
v["el2el"] = d["evict_l2_eligible"] / sint
v["el2mfu"] = d["evict_l2_eligible_mfu"] / sint
v["el2mru"] = d["evict_l2_eligible_mru"] / sint
v["el2inel"] = d["evict_l2_ineligible"] / sint
v["mtxmis"] = d["mutex_miss"] / sint v["mtxmis"] = d["mutex_miss"] / sint
if l2exist: if l2exist:
@@ -498,23 +423,6 @@ def calculate():
v["l2size"] = cur["l2_size"] v["l2size"] = cur["l2_size"]
v["l2bytes"] = d["l2_read_bytes"] / sint v["l2bytes"] = d["l2_read_bytes"] / sint
v["l2pref"] = cur["l2_prefetch_asize"]
v["l2mfu"] = cur["l2_mfu_asize"]
v["l2mru"] = cur["l2_mru_asize"]
v["l2data"] = cur["l2_bufc_data_asize"]
v["l2meta"] = cur["l2_bufc_metadata_asize"]
v["l2pref%"] = 100 * v["l2pref"] / v["l2asize"]
v["l2mfu%"] = 100 * v["l2mfu"] / v["l2asize"]
v["l2mru%"] = 100 * v["l2mru"] / v["l2asize"]
v["l2data%"] = 100 * v["l2data"] / v["l2asize"]
v["l2meta%"] = 100 * v["l2meta"] / v["l2asize"]
v["grow"] = 0 if cur["arc_no_grow"] else 1
v["need"] = cur["arc_need_free"]
v["free"] = cur["memory_free_bytes"]
v["avail"] = cur["memory_available_bytes"]
v["waste"] = cur["abd_chunk_waste_size"]
def main(): def main():
global sint global sint
-1
View File
@@ -1 +0,0 @@
dbufstat
+1 -5
View File
@@ -1,5 +1 @@
include $(top_srcdir)/config/Substfiles.am dist_bin_SCRIPTS = dbufstat.py
bin_SCRIPTS = dbufstat
SUBSTFILES += $(bin_SCRIPTS)
@@ -1,4 +1,4 @@
#!/usr/bin/env @PYTHON_SHEBANG@ #!/usr/bin/python
# #
# Print out statistics for all cached dmu buffers. This information # Print out statistics for all cached dmu buffers. This information
# is available through the dbufs kstat and may be post-processed as # is available through the dbufs kstat and may be post-processed as
@@ -27,17 +27,14 @@
# Copyright (C) 2013 Lawrence Livermore National Security, LLC. # Copyright (C) 2013 Lawrence Livermore National Security, LLC.
# Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). # Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
# #
# This script must remain compatible with Python 2.6+ and Python 3.4+.
#
import sys import sys
import getopt import getopt
import errno import errno
import re
bhdr = ["pool", "objset", "object", "level", "blkid", "offset", "dbsize"] bhdr = ["pool", "objset", "object", "level", "blkid", "offset", "dbsize"]
bxhdr = ["pool", "objset", "object", "level", "blkid", "offset", "dbsize", bxhdr = ["pool", "objset", "object", "level", "blkid", "offset", "dbsize",
"meta", "state", "dbholds", "dbc", "list", "atype", "flags", "meta", "state", "dbholds", "list", "atype", "flags",
"count", "asize", "access", "mru", "gmru", "mfu", "gmfu", "l2", "count", "asize", "access", "mru", "gmru", "mfu", "gmfu", "l2",
"l2_dattr", "l2_asize", "l2_comp", "aholds", "dtype", "btype", "l2_dattr", "l2_asize", "l2_comp", "aholds", "dtype", "btype",
"data_bs", "meta_bs", "bsize", "lvls", "dholds", "blocks", "dsize"] "data_bs", "meta_bs", "bsize", "lvls", "dholds", "blocks", "dsize"]
@@ -48,7 +45,7 @@ dxhdr = ["pool", "objset", "object", "dtype", "btype", "data_bs", "meta_bs",
"bsize", "lvls", "dholds", "blocks", "dsize", "cached", "direct", "bsize", "lvls", "dholds", "blocks", "dsize", "cached", "direct",
"indirect", "bonus", "spill"] "indirect", "bonus", "spill"]
dincompat = ["level", "blkid", "offset", "dbsize", "meta", "state", "dbholds", dincompat = ["level", "blkid", "offset", "dbsize", "meta", "state", "dbholds",
"dbc", "list", "atype", "flags", "count", "asize", "access", "list", "atype", "flags", "count", "asize", "access",
"mru", "gmru", "mfu", "gmfu", "l2", "l2_dattr", "l2_asize", "mru", "gmru", "mfu", "gmfu", "l2", "l2_dattr", "l2_asize",
"l2_comp", "aholds"] "l2_comp", "aholds"]
@@ -56,7 +53,7 @@ thdr = ["pool", "objset", "dtype", "cached"]
txhdr = ["pool", "objset", "dtype", "cached", "direct", "indirect", txhdr = ["pool", "objset", "dtype", "cached", "direct", "indirect",
"bonus", "spill"] "bonus", "spill"]
tincompat = ["object", "level", "blkid", "offset", "dbsize", "meta", "state", tincompat = ["object", "level", "blkid", "offset", "dbsize", "meta", "state",
"dbc", "dbholds", "list", "atype", "flags", "count", "asize", "dbholds", "list", "atype", "flags", "count", "asize",
"access", "mru", "gmru", "mfu", "gmfu", "l2", "l2_dattr", "access", "mru", "gmru", "mfu", "gmfu", "l2", "l2_dattr",
"l2_asize", "l2_comp", "aholds", "btype", "data_bs", "meta_bs", "l2_asize", "l2_comp", "aholds", "btype", "data_bs", "meta_bs",
"bsize", "lvls", "dholds", "blocks", "dsize"] "bsize", "lvls", "dholds", "blocks", "dsize"]
@@ -73,10 +70,9 @@ cols = {
"meta": [4, -1, "is this buffer metadata?"], "meta": [4, -1, "is this buffer metadata?"],
"state": [5, -1, "state of buffer (read, cached, etc)"], "state": [5, -1, "state of buffer (read, cached, etc)"],
"dbholds": [7, 1000, "number of holds on buffer"], "dbholds": [7, 1000, "number of holds on buffer"],
"dbc": [3, -1, "in dbuf cache"],
"list": [4, -1, "which ARC list contains this buffer"], "list": [4, -1, "which ARC list contains this buffer"],
"atype": [7, -1, "ARC header type (data or metadata)"], "atype": [7, -1, "ARC header type (data or metadata)"],
"flags": [9, -1, "ARC read flags"], "flags": [8, -1, "ARC read flags"],
"count": [5, -1, "ARC data count"], "count": [5, -1, "ARC data count"],
"asize": [7, 1024, "size of this ARC buffer"], "asize": [7, 1024, "size of this ARC buffer"],
"access": [10, -1, "time this ARC buffer was last accessed"], "access": [10, -1, "time this ARC buffer was last accessed"],
@@ -108,30 +104,15 @@ cols = {
hdr = None hdr = None
xhdr = None xhdr = None
sep = " " # Default separator is 2 spaces sep = " " # Default separator is 2 spaces
cmd = ("Usage: dbufstat [-bdhnrtvx] [-i file] [-f fields] [-o file] " cmd = ("Usage: dbufstat.py [-bdhrtvx] [-i file] [-f fields] [-o file] "
"[-s string] [-F filter]\n") "[-s string]\n")
raw = 0 raw = 0
if sys.platform.startswith("freebsd"):
import io
# Requires py-sysctl on FreeBSD
import sysctl
def default_ifile():
dbufs = sysctl.filter("kstat.zfs.misc.dbufs")[0].value
sys.stdin = io.StringIO(dbufs)
return "-"
elif sys.platform.startswith("linux"):
def default_ifile():
return "/proc/spl/kstat/zfs/dbufs"
def print_incompat_helper(incompat): def print_incompat_helper(incompat):
cnt = 0 cnt = 0
for key in sorted(incompat): for key in sorted(incompat):
if cnt == 0: if cnt is 0:
sys.stderr.write("\t") sys.stderr.write("\t")
elif cnt > 8: elif cnt > 8:
sys.stderr.write(",\n\t") sys.stderr.write(",\n\t")
@@ -170,7 +151,6 @@ def usage():
sys.stderr.write("\t -b : Print table of information for each dbuf\n") sys.stderr.write("\t -b : Print table of information for each dbuf\n")
sys.stderr.write("\t -d : Print table of information for each dnode\n") sys.stderr.write("\t -d : Print table of information for each dnode\n")
sys.stderr.write("\t -h : Print this help message\n") sys.stderr.write("\t -h : Print this help message\n")
sys.stderr.write("\t -n : Exclude header from output\n")
sys.stderr.write("\t -r : Print raw values\n") sys.stderr.write("\t -r : Print raw values\n")
sys.stderr.write("\t -t : Print table of information for each dnode type" sys.stderr.write("\t -t : Print table of information for each dnode type"
"\n") "\n")
@@ -182,13 +162,11 @@ def usage():
sys.stderr.write("\t -o : Redirect output to the specified file\n") sys.stderr.write("\t -o : Redirect output to the specified file\n")
sys.stderr.write("\t -s : Override default field separator with custom " sys.stderr.write("\t -s : Override default field separator with custom "
"character or string\n") "character or string\n")
sys.stderr.write("\t -F : Filter output by value or regex\n")
sys.stderr.write("\nExamples:\n") sys.stderr.write("\nExamples:\n")
sys.stderr.write("\tdbufstat -d -o /tmp/d.log\n") sys.stderr.write("\tdbufstat.py -d -o /tmp/d.log\n")
sys.stderr.write("\tdbufstat -t -s \",\" -o /tmp/t.log\n") sys.stderr.write("\tdbufstat.py -t -s \",\" -o /tmp/t.log\n")
sys.stderr.write("\tdbufstat -v\n") sys.stderr.write("\tdbufstat.py -v\n")
sys.stderr.write("\tdbufstat -d -f pool,object,objset,dsize,cached\n") sys.stderr.write("\tdbufstat.py -d -f pool,object,objset,dsize,cached\n")
sys.stderr.write("\tdbufstat -bx -F dbc=1,objset=54,pool=testpool\n")
sys.stderr.write("\n") sys.stderr.write("\n")
sys.exit(1) sys.exit(1)
@@ -250,8 +228,7 @@ def print_header():
def get_typestring(t): def get_typestring(t):
ot_strings = [ type_strings = ["DMU_OT_NONE",
"DMU_OT_NONE",
# general: # general:
"DMU_OT_OBJECT_DIRECTORY", "DMU_OT_OBJECT_DIRECTORY",
"DMU_OT_OBJECT_ARRAY", "DMU_OT_OBJECT_ARRAY",
@@ -314,39 +291,15 @@ def get_typestring(t):
"DMU_OT_DEADLIST_HDR", "DMU_OT_DEADLIST_HDR",
"DMU_OT_DSL_CLONES", "DMU_OT_DSL_CLONES",
"DMU_OT_BPOBJ_SUBOBJ"] "DMU_OT_BPOBJ_SUBOBJ"]
otn_strings = {
0x80: "DMU_OTN_UINT8_DATA",
0xc0: "DMU_OTN_UINT8_METADATA",
0x81: "DMU_OTN_UINT16_DATA",
0xc1: "DMU_OTN_UINT16_METADATA",
0x82: "DMU_OTN_UINT32_DATA",
0xc2: "DMU_OTN_UINT32_METADATA",
0x83: "DMU_OTN_UINT64_DATA",
0xc3: "DMU_OTN_UINT64_METADATA",
0x84: "DMU_OTN_ZAP_DATA",
0xc4: "DMU_OTN_ZAP_METADATA",
0xa0: "DMU_OTN_UINT8_ENC_DATA",
0xe0: "DMU_OTN_UINT8_ENC_METADATA",
0xa1: "DMU_OTN_UINT16_ENC_DATA",
0xe1: "DMU_OTN_UINT16_ENC_METADATA",
0xa2: "DMU_OTN_UINT32_ENC_DATA",
0xe2: "DMU_OTN_UINT32_ENC_METADATA",
0xa3: "DMU_OTN_UINT64_ENC_DATA",
0xe3: "DMU_OTN_UINT64_ENC_METADATA",
0xa4: "DMU_OTN_ZAP_ENC_DATA",
0xe4: "DMU_OTN_ZAP_ENC_METADATA"}
# If "-rr" option is used, don't convert to string representation # If "-rr" option is used, don't convert to string representation
if raw > 1: if raw > 1:
return "%i" % t return "%i" % t
try: try:
if t < len(ot_strings): return type_strings[t]
return ot_strings[t] except IndexError:
else: return "%i" % t
return otn_strings[t]
except (IndexError, KeyError):
return "(UNKNOWN)"
def get_compstring(c): def get_compstring(c):
@@ -358,7 +311,7 @@ def get_compstring(c):
"ZIO_COMPRESS_GZIP_6", "ZIO_COMPRESS_GZIP_7", "ZIO_COMPRESS_GZIP_6", "ZIO_COMPRESS_GZIP_7",
"ZIO_COMPRESS_GZIP_8", "ZIO_COMPRESS_GZIP_9", "ZIO_COMPRESS_GZIP_8", "ZIO_COMPRESS_GZIP_9",
"ZIO_COMPRESS_ZLE", "ZIO_COMPRESS_LZ4", "ZIO_COMPRESS_ZLE", "ZIO_COMPRESS_LZ4",
"ZIO_COMPRESS_ZSTD", "ZIO_COMPRESS_FUNCTION"] "ZIO_COMPRESS_FUNCTION"]
# If "-rr" option is used, don't convert to string representation # If "-rr" option is used, don't convert to string representation
if raw > 1: if raw > 1:
@@ -431,32 +384,12 @@ def update_dict(d, k, line, labels):
return d return d
def skip_line(vals, filters): def print_dict(d):
''' print_header()
Determines if a line should be skipped during printing
based on a set of filters
'''
if len(filters) == 0:
return False
for key in vals:
if key in filters:
val = prettynum(cols[key][0], cols[key][1], vals[key]).strip()
# we want a full match here
if re.match("(?:" + filters[key] + r")\Z", val) is None:
return True
return False
def print_dict(d, filters, noheader):
if not noheader:
print_header()
for pool in list(d.keys()): for pool in list(d.keys()):
for objset in list(d[pool].keys()): for objset in list(d[pool].keys()):
for v in list(d[pool][objset].values()): for v in list(d[pool][objset].values()):
if not skip_line(v, filters): print_values(v)
print_values(v)
def dnodes_build_dict(filehandle): def dnodes_build_dict(filehandle):
@@ -497,7 +430,7 @@ def types_build_dict(filehandle):
return types return types
def buffers_print_all(filehandle, filters, noheader): def buffers_print_all(filehandle):
labels = dict() labels = dict()
# First 3 lines are header information, skip the first two # First 3 lines are header information, skip the first two
@@ -508,14 +441,11 @@ def buffers_print_all(filehandle, filters, noheader):
for i, v in enumerate(next(filehandle).split()): for i, v in enumerate(next(filehandle).split()):
labels[v] = i labels[v] = i
if not noheader: print_header()
print_header()
# The rest of the file is buffer information # The rest of the file is buffer information
for line in filehandle: for line in filehandle:
vals = parse_line(line.split(), labels) print_values(parse_line(line.split(), labels))
if not skip_line(vals, filters):
print_values(vals)
def main(): def main():
@@ -532,13 +462,11 @@ def main():
tflag = False tflag = False
vflag = False vflag = False
xflag = False xflag = False
nflag = False
filters = dict()
try: try:
opts, args = getopt.getopt( opts, args = getopt.getopt(
sys.argv[1:], sys.argv[1:],
"bdf:hi:o:rs:tvxF:n", "bdf:hi:o:rs:tvx",
[ [
"buffers", "buffers",
"dnodes", "dnodes",
@@ -549,8 +477,7 @@ def main():
"separator", "separator",
"types", "types",
"verbose", "verbose",
"extended", "extended"
"filter"
] ]
) )
except getopt.error: except getopt.error:
@@ -580,35 +507,6 @@ def main():
vflag = True vflag = True
if opt in ('-x', '--extended'): if opt in ('-x', '--extended'):
xflag = True xflag = True
if opt in ('-n', '--noheader'):
nflag = True
if opt in ('-F', '--filter'):
fils = [x.strip() for x in arg.split(",")]
for fil in fils:
f = [x.strip() for x in fil.split("=")]
if len(f) != 2:
sys.stderr.write("Invalid filter '%s'.\n" % fil)
sys.exit(1)
if f[0] not in cols:
sys.stderr.write("Invalid field '%s' in filter.\n" % f[0])
sys.exit(1)
if f[0] in filters:
sys.stderr.write("Field '%s' specified multiple times in "
"filter.\n" % f[0])
sys.exit(1)
try:
re.compile("(?:" + f[1] + r")\Z")
except re.error:
sys.stderr.write("Invalid regex for field '%s' in "
"filter.\n" % f[0])
sys.exit(1)
filters[f[0]] = f[1]
if hflag or (xflag and desired_cols): if hflag or (xflag and desired_cols):
usage() usage()
@@ -660,9 +558,9 @@ def main():
sys.exit(1) sys.exit(1)
if not ifile: if not ifile:
ifile = default_ifile() ifile = '/proc/spl/kstat/zfs/dbufs'
if ifile != "-": if ifile is not "-":
try: try:
tmp = open(ifile, "r") tmp = open(ifile, "r")
sys.stdin = tmp sys.stdin = tmp
@@ -671,13 +569,13 @@ def main():
sys.exit(1) sys.exit(1)
if bflag: if bflag:
buffers_print_all(sys.stdin, filters, nflag) buffers_print_all(sys.stdin)
if dflag: if dflag:
print_dict(dnodes_build_dict(sys.stdin), filters, nflag) print_dict(dnodes_build_dict(sys.stdin))
if tflag: if tflag:
print_dict(types_build_dict(sys.stdin), filters, nflag) print_dict(types_build_dict(sys.stdin))
if __name__ == '__main__': if __name__ == '__main__':
-1
View File
@@ -1 +0,0 @@
/fsck.zfs
-5
View File
@@ -1,6 +1 @@
include $(top_srcdir)/config/Substfiles.am
include $(top_srcdir)/config/Shellcheck.am
dist_sbin_SCRIPTS = fsck.zfs dist_sbin_SCRIPTS = fsck.zfs
SUBSTFILES += $(dist_sbin_SCRIPTS)
+9
View File
@@ -0,0 +1,9 @@
#!/bin/sh
#
# fsck.zfs: A fsck helper to accomidate distributions that expect
# to be able to execute a fsck on all filesystem types. Currently
# this script does nothing but it could be extended to act as a
# compatibility wrapper for 'zpool scrub'.
#
exit 0
-44
View File
@@ -1,44 +0,0 @@
#!/bin/sh
#
# fsck.zfs: A fsck helper to accommodate distributions that expect
# to be able to execute a fsck on all filesystem types.
#
# This script simply bubbles up some already-known-about errors,
# see fsck.zfs(8)
#
if [ "$#" = "0" ]; then
echo "Usage: $0 [options] dataset…" >&2
exit 16
fi
ret=0
for dataset in "$@"; do
case "$dataset" in
-*)
continue
;;
*)
;;
esac
pool="${dataset%%/*}"
case "$(@sbindir@/zpool list -Ho health "$pool")" in
DEGRADED)
ret=$(( ret | 4 ))
;;
FAULTED)
awk '!/^([[:space:]]*#.*)?$/ && $1 == "'"$dataset"'" && $3 == "zfs" {exit 1}' /etc/fstab || \
ret=$(( ret | 8 ))
;;
"")
# Pool not found, error printed by zpool(8)
ret=$(( ret | 8 ))
;;
*)
;;
esac
done
exit "$ret"
+9 -7
View File
@@ -1,5 +1,9 @@
include $(top_srcdir)/config/Rules.am include $(top_srcdir)/config/Rules.am
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
# #
# Ignore the prefix for the mount helper. It must be installed in /sbin/ # Ignore the prefix for the mount helper. It must be installed in /sbin/
# because this path is hardcoded in the mount(8) for security reasons. # because this path is hardcoded in the mount(8) for security reasons.
@@ -13,10 +17,8 @@ mount_zfs_SOURCES = \
mount_zfs.c mount_zfs.c
mount_zfs_LDADD = \ mount_zfs_LDADD = \
$(abs_top_builddir)/lib/libzfs/libzfs.la \ $(top_builddir)/lib/libnvpair/libnvpair.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la \ $(top_builddir)/lib/libuutil/libuutil.la \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la $(top_builddir)/lib/libzpool/libzpool.la \
$(top_builddir)/lib/libzfs/libzfs.la \
mount_zfs_LDADD += $(LTLIBINTL) $(top_builddir)/lib/libzfs_core/libzfs_core.la
include $(top_srcdir)/config/CppCheck.am
+310 -81
View File
@@ -31,57 +31,256 @@
#include <sys/mntent.h> #include <sys/mntent.h>
#include <sys/stat.h> #include <sys/stat.h>
#include <libzfs.h> #include <libzfs.h>
#include <libzutil.h>
#include <locale.h> #include <locale.h>
#include <getopt.h> #include <getopt.h>
#include <fcntl.h> #include <fcntl.h>
#include <errno.h>
#define ZS_COMMENT 0x00000000 /* comment */ #define ZS_COMMENT 0x00000000 /* comment */
#define ZS_ZFSUTIL 0x00000001 /* caller is zfs(8) */ #define ZS_ZFSUTIL 0x00000001 /* caller is zfs(8) */
libzfs_handle_t *g_zfs; libzfs_handle_t *g_zfs;
typedef struct option_map {
const char *name;
unsigned long mntmask;
unsigned long zfsmask;
} option_map_t;
static const option_map_t option_map[] = {
/* Canonicalized filesystem independent options from mount(8) */
{ MNTOPT_NOAUTO, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_DEFAULTS, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_NODEVICES, MS_NODEV, ZS_COMMENT },
{ MNTOPT_DIRSYNC, MS_DIRSYNC, ZS_COMMENT },
{ MNTOPT_NOEXEC, MS_NOEXEC, ZS_COMMENT },
{ MNTOPT_GROUP, MS_GROUP, ZS_COMMENT },
{ MNTOPT_NETDEV, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_NOFAIL, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_NOSUID, MS_NOSUID, ZS_COMMENT },
{ MNTOPT_OWNER, MS_OWNER, ZS_COMMENT },
{ MNTOPT_REMOUNT, MS_REMOUNT, ZS_COMMENT },
{ MNTOPT_RO, MS_RDONLY, ZS_COMMENT },
{ MNTOPT_RW, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_SYNC, MS_SYNCHRONOUS, ZS_COMMENT },
{ MNTOPT_USER, MS_USERS, ZS_COMMENT },
{ MNTOPT_USERS, MS_USERS, ZS_COMMENT },
/* acl flags passed with util-linux-2.24 mount command */
{ MNTOPT_ACL, MS_POSIXACL, ZS_COMMENT },
{ MNTOPT_NOACL, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_POSIXACL, MS_POSIXACL, ZS_COMMENT },
#ifdef MS_NOATIME
{ MNTOPT_NOATIME, MS_NOATIME, ZS_COMMENT },
#endif
#ifdef MS_NODIRATIME
{ MNTOPT_NODIRATIME, MS_NODIRATIME, ZS_COMMENT },
#endif
#ifdef MS_RELATIME
{ MNTOPT_RELATIME, MS_RELATIME, ZS_COMMENT },
#endif
#ifdef MS_STRICTATIME
{ MNTOPT_STRICTATIME, MS_STRICTATIME, ZS_COMMENT },
#endif
#ifdef MS_LAZYTIME
{ MNTOPT_LAZYTIME, MS_LAZYTIME, ZS_COMMENT },
#endif
{ MNTOPT_CONTEXT, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_FSCONTEXT, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_DEFCONTEXT, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_ROOTCONTEXT, MS_COMMENT, ZS_COMMENT },
#ifdef MS_I_VERSION
{ MNTOPT_IVERSION, MS_I_VERSION, ZS_COMMENT },
#endif
#ifdef MS_MANDLOCK
{ MNTOPT_NBMAND, MS_MANDLOCK, ZS_COMMENT },
#endif
/* Valid options not found in mount(8) */
{ MNTOPT_BIND, MS_BIND, ZS_COMMENT },
#ifdef MS_REC
{ MNTOPT_RBIND, MS_BIND|MS_REC, ZS_COMMENT },
#endif
{ MNTOPT_COMMENT, MS_COMMENT, ZS_COMMENT },
#ifdef MS_NOSUB
{ MNTOPT_NOSUB, MS_NOSUB, ZS_COMMENT },
#endif
#ifdef MS_SILENT
{ MNTOPT_QUIET, MS_SILENT, ZS_COMMENT },
#endif
/* Custom zfs options */
{ MNTOPT_XATTR, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_NOXATTR, MS_COMMENT, ZS_COMMENT },
{ MNTOPT_ZFSUTIL, MS_COMMENT, ZS_ZFSUTIL },
{ NULL, 0, 0 } };
/* /*
* Opportunistically convert a target string into a pool name. If the * Break the mount option in to a name/value pair. The name is
* string does not represent a block device with a valid zfs label * validated against the option map and mount flags set accordingly.
* then it is passed through without modification.
*/ */
static void static int
parse_dataset(const char *target, char **dataset) parse_option(char *mntopt, unsigned long *mntflags,
unsigned long *zfsflags, int sloppy)
{ {
const option_map_t *opt;
char *ptr, *name, *value = NULL;
int error = 0;
name = strdup(mntopt);
if (name == NULL)
return (ENOMEM);
for (ptr = name; ptr && *ptr; ptr++) {
if (*ptr == '=') {
*ptr = '\0';
value = ptr+1;
VERIFY3P(value, !=, NULL);
break;
}
}
for (opt = option_map; opt->name != NULL; opt++) {
if (strncmp(name, opt->name, strlen(name)) == 0) {
*mntflags |= opt->mntmask;
*zfsflags |= opt->zfsmask;
error = 0;
goto out;
}
}
if (!sloppy)
error = ENOENT;
out:
/* If required further process on the value may be done here */
free(name);
return (error);
}
/*
* Translate the mount option string in to MS_* mount flags for the
* kernel vfs. When sloppy is non-zero unknown options will be ignored
* otherwise they are considered fatal are copied in to badopt.
*/
static int
parse_options(char *mntopts, unsigned long *mntflags, unsigned long *zfsflags,
int sloppy, char *badopt, char *mtabopt)
{
int error = 0, quote = 0, flag = 0, count = 0;
char *ptr, *opt, *opts;
opts = strdup(mntopts);
if (opts == NULL)
return (ENOMEM);
*mntflags = 0;
opt = NULL;
/* /*
* Prior to util-linux 2.36.2, if a file or directory in the * Scan through all mount options which must be comma delimited.
* current working directory was named 'dataset' then mount(8) * We must be careful to notice regions which are double quoted
* would prepend the current working directory to the dataset. * and skip commas in these regions. Each option is then checked
* Check for it and strip the prepended path when it is added. * to determine if it is a known option.
*/ */
for (ptr = opts; ptr && !flag; ptr++) {
if (opt == NULL)
opt = ptr;
if (*ptr == '"')
quote = !quote;
if (quote)
continue;
if (*ptr == '\0')
flag = 1;
if ((*ptr == ',') || (*ptr == '\0')) {
*ptr = '\0';
error = parse_option(opt, mntflags, zfsflags, sloppy);
if (error) {
strcpy(badopt, opt);
goto out;
}
if (!(*mntflags & MS_REMOUNT) &&
!(*zfsflags & ZS_ZFSUTIL)) {
if (count > 0)
strlcat(mtabopt, ",", MNT_LINE_MAX);
strlcat(mtabopt, opt, MNT_LINE_MAX);
count++;
}
opt = NULL;
}
}
out:
free(opts);
return (error);
}
/*
* Return the pool/dataset to mount given the name passed to mount. This
* is expected to be of the form pool/dataset, however may also refer to
* a block device if that device contains a valid zfs label.
*/
static char *
parse_dataset(char *dataset)
{
char cwd[PATH_MAX]; char cwd[PATH_MAX];
if (getcwd(cwd, PATH_MAX) == NULL) { struct stat64 statbuf;
perror("getcwd"); int error;
return; int len;
/*
* We expect a pool/dataset to be provided, however if we're
* given a device which is a member of a zpool we attempt to
* extract the pool name stored in the label. Given the pool
* name we can mount the root dataset.
*/
error = stat64(dataset, &statbuf);
if (error == 0) {
nvlist_t *config;
char *name;
int fd;
fd = open(dataset, O_RDONLY);
if (fd < 0)
goto out;
error = zpool_read_label(fd, &config, NULL);
(void) close(fd);
if (error)
goto out;
error = nvlist_lookup_string(config,
ZPOOL_CONFIG_POOL_NAME, &name);
if (error) {
nvlist_free(config);
} else {
dataset = strdup(name);
nvlist_free(config);
return (dataset);
}
} }
int len = strlen(cwd); out:
if (strncmp(cwd, target, len) == 0) /*
target += len; * If a file or directory in your current working directory is
* named 'dataset' then mount(8) will prepend your current working
* directory to the dataset. There is no way to prevent this
* behavior so we simply check for it and strip the prepended
* patch when it is added.
*/
if (getcwd(cwd, PATH_MAX) == NULL)
return (dataset);
/* Assume pool/dataset is more likely */ len = strlen(cwd);
strlcpy(*dataset, target, PATH_MAX);
int fd = open(target, O_RDONLY | O_CLOEXEC); /* Do not add one when cwd already ends in a trailing '/' */
if (fd < 0) if (strncmp(cwd, dataset, len) == 0)
return; return (dataset + len + (cwd[len-1] != '/'));
nvlist_t *cfg = NULL; return (dataset);
if (zpool_read_label(fd, &cfg, NULL) == 0) {
char *nm = NULL;
if (!nvlist_lookup_string(cfg, ZPOOL_CONFIG_POOL_NAME, &nm))
strlcpy(*dataset, nm, PATH_MAX);
nvlist_free(cfg);
}
if (close(fd))
perror("close");
} }
/* /*
@@ -125,8 +324,8 @@ mtab_update(char *dataset, char *mntpoint, char *type, char *mntopts)
if (!fp) { if (!fp) {
(void) fprintf(stderr, gettext( (void) fprintf(stderr, gettext(
"filesystem '%s' was mounted, but /etc/mtab " "filesystem '%s' was mounted, but /etc/mtab "
"could not be opened due to error: %s\n"), "could not be opened due to error %d\n"),
dataset, strerror(errno)); dataset, errno);
return (MOUNT_FILEIO); return (MOUNT_FILEIO);
} }
@@ -134,8 +333,8 @@ mtab_update(char *dataset, char *mntpoint, char *type, char *mntopts)
if (error) { if (error) {
(void) fprintf(stderr, gettext( (void) fprintf(stderr, gettext(
"filesystem '%s' was mounted, but /etc/mtab " "filesystem '%s' was mounted, but /etc/mtab "
"could not be updated due to error: %s\n"), "could not be updated due to error %d\n"),
dataset, strerror(errno)); dataset, errno);
return (MOUNT_FILEIO); return (MOUNT_FILEIO);
} }
@@ -144,6 +343,34 @@ mtab_update(char *dataset, char *mntpoint, char *type, char *mntopts)
return (MOUNT_SUCCESS); return (MOUNT_SUCCESS);
} }
static void
append_mntopt(const char *name, const char *val, char *mntopts,
char *mtabopt, boolean_t quote)
{
char tmp[MNT_LINE_MAX];
snprintf(tmp, MNT_LINE_MAX, quote ? ",%s=\"%s\"" : ",%s=%s", name, val);
if (mntopts)
strlcat(mntopts, tmp, MNT_LINE_MAX);
if (mtabopt)
strlcat(mtabopt, tmp, MNT_LINE_MAX);
}
static void
zfs_selinux_setcontext(zfs_handle_t *zhp, zfs_prop_t zpt, const char *name,
char *mntopts, char *mtabopt)
{
char context[ZFS_MAXPROPLEN];
if (zfs_prop_get(zhp, zpt, context, sizeof (context),
NULL, NULL, 0, B_FALSE) == 0) {
if (strcmp(context, "none") != 0)
append_mntopt(name, context, mntopts, mtabopt, B_TRUE);
}
}
int int
main(int argc, char **argv) main(int argc, char **argv)
{ {
@@ -154,13 +381,12 @@ main(int argc, char **argv)
char badopt[MNT_LINE_MAX] = { '\0' }; char badopt[MNT_LINE_MAX] = { '\0' };
char mtabopt[MNT_LINE_MAX] = { '\0' }; char mtabopt[MNT_LINE_MAX] = { '\0' };
char mntpoint[PATH_MAX]; char mntpoint[PATH_MAX];
char dataset[PATH_MAX], *pdataset = dataset; char *dataset;
unsigned long mntflags = 0, zfsflags = 0, remount = 0; unsigned long mntflags = 0, zfsflags = 0, remount = 0;
int sloppy = 0, fake = 0, verbose = 0, nomtab = 0, zfsutil = 0; int sloppy = 0, fake = 0, verbose = 0, nomtab = 0, zfsutil = 0;
int error, c; int error, c;
(void) setlocale(LC_ALL, ""); (void) setlocale(LC_ALL, "");
(void) setlocale(LC_NUMERIC, "C");
(void) textdomain(TEXT_DOMAIN); (void) textdomain(TEXT_DOMAIN);
opterr = 0; opterr = 0;
@@ -185,11 +411,10 @@ main(int argc, char **argv)
break; break;
case 'h': case 'h':
case '?': case '?':
if (optopt) (void) fprintf(stderr, gettext("Invalid option '%c'\n"),
(void) fprintf(stderr, optopt);
gettext("Invalid option '%c'\n"), optopt);
(void) fprintf(stderr, gettext("Usage: mount.zfs " (void) fprintf(stderr, gettext("Usage: mount.zfs "
"[-sfnvh] [-o options] <dataset> <mountpoint>\n")); "[-sfnv] [-o options] <dataset> <mountpoint>\n"));
return (MOUNT_USAGE); return (MOUNT_USAGE);
} }
} }
@@ -211,18 +436,18 @@ main(int argc, char **argv)
return (MOUNT_USAGE); return (MOUNT_USAGE);
} }
parse_dataset(argv[0], &pdataset); dataset = parse_dataset(argv[0]);
/* canonicalize the mount point */ /* canonicalize the mount point */
if (realpath(argv[1], mntpoint) == NULL) { if (realpath(argv[1], mntpoint) == NULL) {
(void) fprintf(stderr, gettext("filesystem '%s' cannot be " (void) fprintf(stderr, gettext("filesystem '%s' cannot be "
"mounted at '%s' due to canonicalization error: %s\n"), "mounted at '%s' due to canonicalization error %d.\n"),
dataset, argv[1], strerror(errno)); dataset, argv[1], errno);
return (MOUNT_SYSERR); return (MOUNT_SYSERR);
} }
/* validate mount options and set mntflags */ /* validate mount options and set mntflags */
error = zfs_parse_mount_options(mntopts, &mntflags, &zfsflags, sloppy, error = parse_options(mntopts, &mntflags, &zfsflags, sloppy,
badopt, mtabopt); badopt, mtabopt);
if (error) { if (error) {
switch (error) { switch (error) {
@@ -246,6 +471,13 @@ main(int argc, char **argv)
} }
} }
if (verbose)
(void) fprintf(stdout, gettext("mount.zfs:\n"
" dataset: \"%s\"\n mountpoint: \"%s\"\n"
" mountflags: 0x%lx\n zfsflags: 0x%lx\n"
" mountopts: \"%s\"\n mtabopts: \"%s\"\n"),
dataset, mntpoint, mntflags, zfsflags, mntopts, mtabopt);
if (mntflags & MS_REMOUNT) { if (mntflags & MS_REMOUNT) {
nomtab = 1; nomtab = 1;
remount = 1; remount = 1;
@@ -255,7 +487,7 @@ main(int argc, char **argv)
zfsutil = 1; zfsutil = 1;
if ((g_zfs = libzfs_init()) == NULL) { if ((g_zfs = libzfs_init()) == NULL) {
(void) fprintf(stderr, "%s\n", libzfs_error_init(errno)); (void) fprintf(stderr, "%s", libzfs_error_init(errno));
return (MOUNT_SYSERR); return (MOUNT_SYSERR);
} }
@@ -268,11 +500,33 @@ main(int argc, char **argv)
return (MOUNT_USAGE); return (MOUNT_USAGE);
} }
if (!zfsutil || sloppy || /*
libzfs_envvar_is_set("ZFS_MOUNT_HELPER")) { * Checks to see if the ZFS_PROP_SELINUX_CONTEXT exists
zfs_adjust_mount_options(zhp, mntpoint, mntopts, mtabopt); * if it does, create a tmp variable in case it's needed
* checks to see if the selinux context is set to the default
* if it is, allow the setting of the other context properties
* this is needed because the 'context' property overrides others
* if it is not the default, set the 'context' property
*/
if (zfs_prop_get(zhp, ZFS_PROP_SELINUX_CONTEXT, prop, sizeof (prop),
NULL, NULL, 0, B_FALSE) == 0) {
if (strcmp(prop, "none") == 0) {
zfs_selinux_setcontext(zhp, ZFS_PROP_SELINUX_FSCONTEXT,
MNTOPT_FSCONTEXT, mntopts, mtabopt);
zfs_selinux_setcontext(zhp, ZFS_PROP_SELINUX_DEFCONTEXT,
MNTOPT_DEFCONTEXT, mntopts, mtabopt);
zfs_selinux_setcontext(zhp,
ZFS_PROP_SELINUX_ROOTCONTEXT, MNTOPT_ROOTCONTEXT,
mntopts, mtabopt);
} else {
append_mntopt(MNTOPT_CONTEXT, prop,
mntopts, mtabopt, B_TRUE);
}
} }
/* A hint used to determine an auto-mounted snapshot mount point */
append_mntopt(MNTOPT_MNTPOINT, mntpoint, mntopts, NULL, B_FALSE);
/* treat all snapshots as legacy mount points */ /* treat all snapshots as legacy mount points */
if (zfs_get_type(zhp) == ZFS_TYPE_SNAPSHOT) if (zfs_get_type(zhp) == ZFS_TYPE_SNAPSHOT)
(void) strlcpy(prop, ZFS_MOUNTPOINT_LEGACY, ZFS_MAXPROPLEN); (void) strlcpy(prop, ZFS_MOUNTPOINT_LEGACY, ZFS_MAXPROPLEN);
@@ -289,11 +543,12 @@ main(int argc, char **argv)
if (zfs_version == 0) { if (zfs_version == 0) {
fprintf(stderr, gettext("unable to fetch " fprintf(stderr, gettext("unable to fetch "
"ZFS version for filesystem '%s'\n"), dataset); "ZFS version for filesystem '%s'\n"), dataset);
zfs_close(zhp);
libzfs_fini(g_zfs);
return (MOUNT_SYSERR); return (MOUNT_SYSERR);
} }
zfs_close(zhp);
libzfs_fini(g_zfs);
/* /*
* Legacy mount points may only be mounted using 'mount', never using * Legacy mount points may only be mounted using 'mount', never using
* 'zfs mount'. However, since 'zfs mount' actually invokes 'mount' * 'zfs mount'. However, since 'zfs mount' actually invokes 'mount'
@@ -311,8 +566,6 @@ main(int argc, char **argv)
"Use 'zfs set mountpoint=%s' or 'mount -t zfs %s %s'.\n" "Use 'zfs set mountpoint=%s' or 'mount -t zfs %s %s'.\n"
"See zfs(8) for more information.\n"), "See zfs(8) for more information.\n"),
dataset, mntpoint, dataset, mntpoint); dataset, mntpoint, dataset, mntpoint);
zfs_close(zhp);
libzfs_fini(g_zfs);
return (MOUNT_USAGE); return (MOUNT_USAGE);
} }
@@ -323,38 +576,14 @@ main(int argc, char **argv)
"Use 'zfs set mountpoint=%s' or 'zfs mount %s'.\n" "Use 'zfs set mountpoint=%s' or 'zfs mount %s'.\n"
"See zfs(8) for more information.\n"), "See zfs(8) for more information.\n"),
dataset, "legacy", dataset); dataset, "legacy", dataset);
zfs_close(zhp);
libzfs_fini(g_zfs);
return (MOUNT_USAGE); return (MOUNT_USAGE);
} }
if (verbose)
(void) fprintf(stdout, gettext("mount.zfs:\n"
" dataset: \"%s\"\n mountpoint: \"%s\"\n"
" mountflags: 0x%lx\n zfsflags: 0x%lx\n"
" mountopts: \"%s\"\n mtabopts: \"%s\"\n"),
dataset, mntpoint, mntflags, zfsflags, mntopts, mtabopt);
if (!fake) { if (!fake) {
if (zfsutil && !sloppy && error = mount(dataset, mntpoint, MNTTYPE_ZFS,
!libzfs_envvar_is_set("ZFS_MOUNT_HELPER")) { mntflags, mntopts);
error = zfs_mount_at(zhp, mntopts, mntflags, mntpoint);
if (error) {
(void) fprintf(stderr, "zfs_mount_at() failed: "
"%s", libzfs_error_description(g_zfs));
zfs_close(zhp);
libzfs_fini(g_zfs);
return (MOUNT_SYSERR);
}
} else {
error = mount(dataset, mntpoint, MNTTYPE_ZFS,
mntflags, mntopts);
}
} }
zfs_close(zhp);
libzfs_fini(g_zfs);
if (error) { if (error) {
switch (errno) { switch (errno) {
case ENOENT: case ENOENT:
@@ -389,7 +618,7 @@ main(int argc, char **argv)
"mount the filesystem again.\n"), dataset); "mount the filesystem again.\n"), dataset);
return (MOUNT_SYSERR); return (MOUNT_SYSERR);
} }
fallthrough; /* fallthru */
#endif #endif
default: default:
(void) fprintf(stderr, gettext("filesystem " (void) fprintf(stderr, gettext("filesystem "
+9 -9
View File
@@ -1,10 +1,11 @@
include $(top_srcdir)/config/Rules.am include $(top_srcdir)/config/Rules.am
# Includes kernel code, generate warnings for large stack frames AM_CFLAGS += $(DEBUG_STACKFLAGS) $(FRAME_LARGER_THAN)
AM_CFLAGS += $(FRAME_LARGER_THAN) AM_CPPFLAGS += -DDEBUG
# Unconditionally enable ASSERTs DEFAULT_INCLUDES += \
AM_CPPFLAGS += -DDEBUG -UNDEBUG -DZFS_DEBUG -I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
bin_PROGRAMS = raidz_test bin_PROGRAMS = raidz_test
@@ -14,9 +15,8 @@ raidz_test_SOURCES = \
raidz_bench.c raidz_bench.c
raidz_test_LDADD = \ raidz_test_LDADD = \
$(abs_top_builddir)/lib/libzpool/libzpool.la \ $(top_builddir)/lib/libnvpair/libnvpair.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la $(top_builddir)/lib/libuutil/libuutil.la \
$(top_builddir)/lib/libzpool/libzpool.la
raidz_test_LDADD += -lm raidz_test_LDADD += -lm -ldl
include $(top_srcdir)/config/CppCheck.am
+8 -23
View File
@@ -31,6 +31,8 @@
#include <sys/vdev_raidz_impl.h> #include <sys/vdev_raidz_impl.h>
#include <stdio.h> #include <stdio.h>
#include <sys/time.h>
#include "raidz_test.h" #include "raidz_test.h"
#define GEN_BENCH_MEMORY (((uint64_t)1ULL)<<32) #define GEN_BENCH_MEMORY (((uint64_t)1ULL)<<32)
@@ -81,17 +83,8 @@ run_gen_bench_impl(const char *impl)
/* create suitable raidz_map */ /* create suitable raidz_map */
ncols = rto_opts.rto_dcols + fn + 1; ncols = rto_opts.rto_dcols + fn + 1;
zio_bench.io_size = 1ULL << ds; zio_bench.io_size = 1ULL << ds;
rm_bench = vdev_raidz_map_alloc(&zio_bench,
if (rto_opts.rto_expand) { BENCH_ASHIFT, ncols, fn+1);
rm_bench = vdev_raidz_map_alloc_expanded(
zio_bench.io_abd,
zio_bench.io_size, zio_bench.io_offset,
rto_opts.rto_ashift, ncols+1, ncols,
fn+1, rto_opts.rto_expand_offset);
} else {
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, fn+1);
}
/* estimate iteration count */ /* estimate iteration count */
iter_cnt = GEN_BENCH_MEMORY; iter_cnt = GEN_BENCH_MEMORY;
@@ -120,7 +113,7 @@ run_gen_bench_impl(const char *impl)
} }
} }
static void void
run_gen_bench(void) run_gen_bench(void)
{ {
char **impl_name; char **impl_name;
@@ -170,16 +163,8 @@ run_rec_bench_impl(const char *impl)
(1ULL << BENCH_ASHIFT)) (1ULL << BENCH_ASHIFT))
continue; continue;
if (rto_opts.rto_expand) { rm_bench = vdev_raidz_map_alloc(&zio_bench,
rm_bench = vdev_raidz_map_alloc_expanded( BENCH_ASHIFT, ncols, PARITY_PQR);
zio_bench.io_abd,
zio_bench.io_size, zio_bench.io_offset,
BENCH_ASHIFT, ncols+1, ncols,
PARITY_PQR, rto_opts.rto_expand_offset);
} else {
rm_bench = vdev_raidz_map_alloc(&zio_bench,
BENCH_ASHIFT, ncols, PARITY_PQR);
}
/* estimate iteration count */ /* estimate iteration count */
iter_cnt = (REC_BENCH_MEMORY); iter_cnt = (REC_BENCH_MEMORY);
@@ -212,7 +197,7 @@ run_rec_bench_impl(const char *impl)
} }
} }
static void void
run_rec_bench(void) run_rec_bench(void)
{ {
char **impl_name; char **impl_name;
+50 -287
View File
@@ -77,20 +77,16 @@ static void print_opts(raidz_test_opts_t *opts, boolean_t force)
(void) fprintf(stdout, DBLSEP "Running with options:\n" (void) fprintf(stdout, DBLSEP "Running with options:\n"
" (-a) zio ashift : %zu\n" " (-a) zio ashift : %zu\n"
" (-o) zio offset : 1 << %zu\n" " (-o) zio offset : 1 << %zu\n"
" (-e) expanded map : %s\n"
" (-r) reflow offset : %llx\n"
" (-d) number of raidz data columns : %zu\n" " (-d) number of raidz data columns : %zu\n"
" (-s) size of DATA : 1 << %zu\n" " (-s) size of DATA : 1 << %zu\n"
" (-S) sweep parameters : %s \n" " (-S) sweep parameters : %s \n"
" (-v) verbose : %s \n\n", " (-v) verbose : %s \n\n",
opts->rto_ashift, /* -a */ opts->rto_ashift, /* -a */
ilog2(opts->rto_offset), /* -o */ ilog2(opts->rto_offset), /* -o */
opts->rto_expand ? "yes" : "no", /* -e */ opts->rto_dcols, /* -d */
(u_longlong_t)opts->rto_expand_offset, /* -r */ ilog2(opts->rto_dsize), /* -s */
opts->rto_dcols, /* -d */ opts->rto_sweep ? "yes" : "no", /* -S */
ilog2(opts->rto_dsize), /* -s */ verbose); /* -v */
opts->rto_sweep ? "yes" : "no", /* -S */
verbose); /* -v */
} }
} }
@@ -108,8 +104,6 @@ static void usage(boolean_t requested)
"\t[-S parameter sweep (default: %s)]\n" "\t[-S parameter sweep (default: %s)]\n"
"\t[-t timeout for parameter sweep test]\n" "\t[-t timeout for parameter sweep test]\n"
"\t[-B benchmark all raidz implementations]\n" "\t[-B benchmark all raidz implementations]\n"
"\t[-e use expanded raidz map (default: %s)]\n"
"\t[-r expanded raidz map reflow offset (default: %llx)]\n"
"\t[-v increase verbosity (default: %zu)]\n" "\t[-v increase verbosity (default: %zu)]\n"
"\t[-h (print help)]\n" "\t[-h (print help)]\n"
"\t[-T test the test, see if failure would be detected]\n" "\t[-T test the test, see if failure would be detected]\n"
@@ -120,8 +114,6 @@ static void usage(boolean_t requested)
o->rto_dcols, /* -d */ o->rto_dcols, /* -d */
ilog2(o->rto_dsize), /* -s */ ilog2(o->rto_dsize), /* -s */
rto_opts.rto_sweep ? "yes" : "no", /* -S */ rto_opts.rto_sweep ? "yes" : "no", /* -S */
rto_opts.rto_expand ? "yes" : "no", /* -e */
(u_longlong_t)o->rto_expand_offset, /* -r */
o->rto_v); /* -d */ o->rto_v); /* -d */
exit(requested ? 0 : 1); exit(requested ? 0 : 1);
@@ -136,7 +128,7 @@ static void process_options(int argc, char **argv)
bcopy(&rto_opts_defaults, o, sizeof (*o)); bcopy(&rto_opts_defaults, o, sizeof (*o));
while ((opt = getopt(argc, argv, "TDBSvha:er:o:d:s:t:")) != -1) { while ((opt = getopt(argc, argv, "TDBSvha:o:d:s:t:")) != -1) {
value = 0; value = 0;
switch (opt) { switch (opt) {
@@ -144,12 +136,6 @@ static void process_options(int argc, char **argv)
value = strtoull(optarg, NULL, 0); value = strtoull(optarg, NULL, 0);
o->rto_ashift = MIN(13, MAX(9, value)); o->rto_ashift = MIN(13, MAX(9, value));
break; break;
case 'e':
o->rto_expand = 1;
break;
case 'r':
o->rto_expand_offset = strtoull(optarg, NULL, 0);
break;
case 'o': case 'o':
value = strtoull(optarg, NULL, 0); value = strtoull(optarg, NULL, 0);
o->rto_offset = ((1ULL << MIN(12, value)) >> 9) << 9; o->rto_offset = ((1ULL << MIN(12, value)) >> 9) << 9;
@@ -193,34 +179,25 @@ static void process_options(int argc, char **argv)
} }
} }
#define DATA_COL(rr, i) ((rr)->rr_col[rr->rr_firstdatacol + (i)].rc_abd) #define DATA_COL(rm, i) ((rm)->rm_col[raidz_parity(rm) + (i)].rc_abd)
#define DATA_COL_SIZE(rr, i) ((rr)->rr_col[rr->rr_firstdatacol + (i)].rc_size) #define DATA_COL_SIZE(rm, i) ((rm)->rm_col[raidz_parity(rm) + (i)].rc_size)
#define CODE_COL(rr, i) ((rr)->rr_col[(i)].rc_abd) #define CODE_COL(rm, i) ((rm)->rm_col[(i)].rc_abd)
#define CODE_COL_SIZE(rr, i) ((rr)->rr_col[(i)].rc_size) #define CODE_COL_SIZE(rm, i) ((rm)->rm_col[(i)].rc_size)
static int static int
cmp_code(raidz_test_opts_t *opts, const raidz_map_t *rm, const int parity) cmp_code(raidz_test_opts_t *opts, const raidz_map_t *rm, const int parity)
{ {
int r, i, ret = 0; int i, ret = 0;
VERIFY(parity >= 1 && parity <= 3); VERIFY(parity >= 1 && parity <= 3);
for (r = 0; r < rm->rm_nrows; r++) { for (i = 0; i < parity; i++) {
raidz_row_t * const rr = rm->rm_row[r]; if (abd_cmp(CODE_COL(rm, i), CODE_COL(opts->rm_golden, i))
raidz_row_t * const rrg = opts->rm_golden->rm_row[r]; != 0) {
for (i = 0; i < parity; i++) { ret++;
if (CODE_COL_SIZE(rrg, i) == 0) { LOG_OPT(D_DEBUG, opts,
VERIFY0(CODE_COL_SIZE(rr, i)); "\nParity block [%d] different!\n", i);
continue;
}
if (abd_cmp(CODE_COL(rr, i),
CODE_COL(rrg, i)) != 0) {
ret++;
LOG_OPT(D_DEBUG, opts,
"\nParity block [%d] different!\n", i);
}
} }
} }
return (ret); return (ret);
@@ -229,26 +206,16 @@ cmp_code(raidz_test_opts_t *opts, const raidz_map_t *rm, const int parity)
static int static int
cmp_data(raidz_test_opts_t *opts, raidz_map_t *rm) cmp_data(raidz_test_opts_t *opts, raidz_map_t *rm)
{ {
int r, i, dcols, ret = 0; int i, ret = 0;
int dcols = opts->rm_golden->rm_cols - raidz_parity(opts->rm_golden);
for (r = 0; r < rm->rm_nrows; r++) { for (i = 0; i < dcols; i++) {
raidz_row_t *rr = rm->rm_row[r]; if (abd_cmp(DATA_COL(opts->rm_golden, i), DATA_COL(rm, i))
raidz_row_t *rrg = opts->rm_golden->rm_row[r]; != 0) {
dcols = opts->rm_golden->rm_row[0]->rr_cols - ret++;
raidz_parity(opts->rm_golden);
for (i = 0; i < dcols; i++) {
if (DATA_COL_SIZE(rrg, i) == 0) {
VERIFY0(DATA_COL_SIZE(rr, i));
continue;
}
if (abd_cmp(DATA_COL(rrg, i), LOG_OPT(D_DEBUG, opts,
DATA_COL(rr, i)) != 0) { "\nData block [%d] different!\n", i);
ret++;
LOG_OPT(D_DEBUG, opts,
"\nData block [%d] different!\n", i);
}
} }
} }
return (ret); return (ret);
@@ -269,13 +236,12 @@ init_rand(void *data, size_t size, void *private)
static void static void
corrupt_colums(raidz_map_t *rm, const int *tgts, const int cnt) corrupt_colums(raidz_map_t *rm, const int *tgts, const int cnt)
{ {
for (int r = 0; r < rm->rm_nrows; r++) { int i;
raidz_row_t *rr = rm->rm_row[r]; raidz_col_t *col;
for (int i = 0; i < cnt; i++) {
raidz_col_t *col = &rr->rr_col[tgts[i]]; for (i = 0; i < cnt; i++) {
abd_iterate_func(col->rc_abd, 0, col->rc_size, col = &rm->rm_col[tgts[i]];
init_rand, NULL); abd_iterate_func(col->rc_abd, 0, col->rc_size, init_rand, NULL);
}
} }
} }
@@ -322,22 +288,10 @@ init_raidz_golden_map(raidz_test_opts_t *opts, const int parity)
VERIFY0(vdev_raidz_impl_set("original")); VERIFY0(vdev_raidz_impl_set("original"));
if (opts->rto_expand) { opts->rm_golden = vdev_raidz_map_alloc(opts->zio_golden,
opts->rm_golden = opts->rto_ashift, total_ncols, parity);
vdev_raidz_map_alloc_expanded(opts->zio_golden->io_abd, rm_test = vdev_raidz_map_alloc(zio_test,
opts->zio_golden->io_size, opts->zio_golden->io_offset, opts->rto_ashift, total_ncols, parity);
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset);
rm_test = vdev_raidz_map_alloc_expanded(zio_test->io_abd,
zio_test->io_size, zio_test->io_offset,
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset);
} else {
opts->rm_golden = vdev_raidz_map_alloc(opts->zio_golden,
opts->rto_ashift, total_ncols, parity);
rm_test = vdev_raidz_map_alloc(zio_test,
opts->rto_ashift, total_ncols, parity);
}
VERIFY(opts->zio_golden); VERIFY(opts->zio_golden);
VERIFY(opts->rm_golden); VERIFY(opts->rm_golden);
@@ -358,187 +312,6 @@ init_raidz_golden_map(raidz_test_opts_t *opts, const int parity)
return (err); return (err);
} }
/*
* If reflow is not in progress, reflow_offset should be UINT64_MAX.
* For each row, if the row is entirely before reflow_offset, it will
* come from the new location. Otherwise this row will come from the
* old location. Therefore, rows that straddle the reflow_offset will
* come from the old location.
*
* NOTE: Until raidz expansion is implemented this function is only
* needed by raidz_test.c to the multi-row raid_map_t functionality.
*/
raidz_map_t *
vdev_raidz_map_alloc_expanded(abd_t *abd, uint64_t size, uint64_t offset,
uint64_t ashift, uint64_t physical_cols, uint64_t logical_cols,
uint64_t nparity, uint64_t reflow_offset)
{
/* The zio's size in units of the vdev's minimum sector size. */
uint64_t s = size >> ashift;
uint64_t q, r, bc, devidx, asize = 0, tot;
/*
* "Quotient": The number of data sectors for this stripe on all but
* the "big column" child vdevs that also contain "remainder" data.
* AKA "full rows"
*/
q = s / (logical_cols - nparity);
/*
* "Remainder": The number of partial stripe data sectors in this I/O.
* This will add a sector to some, but not all, child vdevs.
*/
r = s - q * (logical_cols - nparity);
/* The number of "big columns" - those which contain remainder data. */
bc = (r == 0 ? 0 : r + nparity);
/*
* The total number of data and parity sectors associated with
* this I/O.
*/
tot = s + nparity * (q + (r == 0 ? 0 : 1));
/* How many rows contain data (not skip) */
uint64_t rows = howmany(tot, logical_cols);
int cols = MIN(tot, logical_cols);
raidz_map_t *rm = kmem_zalloc(offsetof(raidz_map_t, rm_row[rows]),
KM_SLEEP);
rm->rm_nrows = rows;
for (uint64_t row = 0; row < rows; row++) {
raidz_row_t *rr = kmem_alloc(offsetof(raidz_row_t,
rr_col[cols]), KM_SLEEP);
rm->rm_row[row] = rr;
/* The starting RAIDZ (parent) vdev sector of the row. */
uint64_t b = (offset >> ashift) + row * logical_cols;
/*
* If we are in the middle of a reflow, and any part of this
* row has not been copied, then use the old location of
* this row.
*/
int row_phys_cols = physical_cols;
if (b + (logical_cols - nparity) > reflow_offset >> ashift)
row_phys_cols--;
/* starting child of this row */
uint64_t child_id = b % row_phys_cols;
/* The starting byte offset on each child vdev. */
uint64_t child_offset = (b / row_phys_cols) << ashift;
/*
* We set cols to the entire width of the block, even
* if this row is shorter. This is needed because parity
* generation (for Q and R) needs to know the entire width,
* because it treats the short row as though it was
* full-width (and the "phantom" sectors were zero-filled).
*
* Another approach to this would be to set cols shorter
* (to just the number of columns that we might do i/o to)
* and have another mechanism to tell the parity generation
* about the "entire width". Reconstruction (at least
* vdev_raidz_reconstruct_general()) would also need to
* know about the "entire width".
*/
rr->rr_cols = cols;
rr->rr_bigcols = bc;
rr->rr_missingdata = 0;
rr->rr_missingparity = 0;
rr->rr_firstdatacol = nparity;
rr->rr_abd_empty = NULL;
rr->rr_nempty = 0;
for (int c = 0; c < rr->rr_cols; c++, child_id++) {
if (child_id >= row_phys_cols) {
child_id -= row_phys_cols;
child_offset += 1ULL << ashift;
}
rr->rr_col[c].rc_devidx = child_id;
rr->rr_col[c].rc_offset = child_offset;
rr->rr_col[c].rc_orig_data = NULL;
rr->rr_col[c].rc_error = 0;
rr->rr_col[c].rc_tried = 0;
rr->rr_col[c].rc_skipped = 0;
rr->rr_col[c].rc_need_orig_restore = B_FALSE;
uint64_t dc = c - rr->rr_firstdatacol;
if (c < rr->rr_firstdatacol) {
rr->rr_col[c].rc_size = 1ULL << ashift;
rr->rr_col[c].rc_abd =
abd_alloc_linear(rr->rr_col[c].rc_size,
B_TRUE);
} else if (row == rows - 1 && bc != 0 && c >= bc) {
/*
* Past the end, this for parity generation.
*/
rr->rr_col[c].rc_size = 0;
rr->rr_col[c].rc_abd = NULL;
} else {
/*
* "data column" (col excluding parity)
* Add an ASCII art diagram here
*/
uint64_t off;
if (c < bc || r == 0) {
off = dc * rows + row;
} else {
off = r * rows +
(dc - r) * (rows - 1) + row;
}
rr->rr_col[c].rc_size = 1ULL << ashift;
rr->rr_col[c].rc_abd = abd_get_offset_struct(
&rr->rr_col[c].rc_abdstruct,
abd, off << ashift, 1 << ashift);
}
asize += rr->rr_col[c].rc_size;
}
/*
* If all data stored spans all columns, there's a danger that
* parity will always be on the same device and, since parity
* isn't read during normal operation, that that device's I/O
* bandwidth won't be used effectively. We therefore switch
* the parity every 1MB.
*
* ...at least that was, ostensibly, the theory. As a practical
* matter unless we juggle the parity between all devices
* evenly, we won't see any benefit. Further, occasional writes
* that aren't a multiple of the LCM of the number of children
* and the minimum stripe width are sufficient to avoid pessimal
* behavior. Unfortunately, this decision created an implicit
* on-disk format requirement that we need to support for all
* eternity, but only for single-parity RAID-Z.
*
* If we intend to skip a sector in the zeroth column for
* padding we must make sure to note this swap. We will never
* intend to skip the first column since at least one data and
* one parity column must appear in each row.
*/
if (rr->rr_firstdatacol == 1 && rr->rr_cols > 1 &&
(offset & (1ULL << 20))) {
ASSERT(rr->rr_cols >= 2);
ASSERT(rr->rr_col[0].rc_size == rr->rr_col[1].rc_size);
devidx = rr->rr_col[0].rc_devidx;
uint64_t o = rr->rr_col[0].rc_offset;
rr->rr_col[0].rc_devidx = rr->rr_col[1].rc_devidx;
rr->rr_col[0].rc_offset = rr->rr_col[1].rc_offset;
rr->rr_col[1].rc_devidx = devidx;
rr->rr_col[1].rc_offset = o;
}
}
ASSERT3U(asize, ==, tot << ashift);
/* init RAIDZ parity ops */
rm->rm_ops = vdev_raidz_math_get_ops();
return (rm);
}
static raidz_map_t * static raidz_map_t *
init_raidz_map(raidz_test_opts_t *opts, zio_t **zio, const int parity) init_raidz_map(raidz_test_opts_t *opts, zio_t **zio, const int parity)
{ {
@@ -557,15 +330,8 @@ init_raidz_map(raidz_test_opts_t *opts, zio_t **zio, const int parity)
(*zio)->io_abd = raidz_alloc(alloc_dsize); (*zio)->io_abd = raidz_alloc(alloc_dsize);
init_zio_abd(*zio); init_zio_abd(*zio);
if (opts->rto_expand) { rm = vdev_raidz_map_alloc(*zio, opts->rto_ashift,
rm = vdev_raidz_map_alloc_expanded((*zio)->io_abd, total_ncols, parity);
(*zio)->io_size, (*zio)->io_offset,
opts->rto_ashift, total_ncols+1, total_ncols,
parity, opts->rto_expand_offset);
} else {
rm = vdev_raidz_map_alloc(*zio, opts->rto_ashift,
total_ncols, parity);
}
VERIFY(rm); VERIFY(rm);
/* Make sure code columns are destroyed */ /* Make sure code columns are destroyed */
@@ -654,7 +420,7 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
if (fn < RAIDZ_REC_PQ) { if (fn < RAIDZ_REC_PQ) {
/* can reconstruct 1 failed data disk */ /* can reconstruct 1 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) { for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm)) if (x0 >= rm->rm_cols - raidz_parity(rm))
continue; continue;
/* Check if should stop */ /* Check if should stop */
@@ -679,11 +445,10 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
} else if (fn < RAIDZ_REC_PQR) { } else if (fn < RAIDZ_REC_PQR) {
/* can reconstruct 2 failed data disk */ /* can reconstruct 2 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) { for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm)) if (x0 >= rm->rm_cols - raidz_parity(rm))
continue; continue;
for (x1 = x0 + 1; x1 < opts->rto_dcols; x1++) { for (x1 = x0 + 1; x1 < opts->rto_dcols; x1++) {
if (x1 >= rm->rm_row[0]->rr_cols - if (x1 >= rm->rm_cols - raidz_parity(rm))
raidz_parity(rm))
continue; continue;
/* Check if should stop */ /* Check if should stop */
@@ -710,15 +475,14 @@ run_rec_check_impl(raidz_test_opts_t *opts, raidz_map_t *rm, const int fn)
} else { } else {
/* can reconstruct 3 failed data disk */ /* can reconstruct 3 failed data disk */
for (x0 = 0; x0 < opts->rto_dcols; x0++) { for (x0 = 0; x0 < opts->rto_dcols; x0++) {
if (x0 >= rm->rm_row[0]->rr_cols - raidz_parity(rm)) if (x0 >= rm->rm_cols - raidz_parity(rm))
continue; continue;
for (x1 = x0 + 1; x1 < opts->rto_dcols; x1++) { for (x1 = x0 + 1; x1 < opts->rto_dcols; x1++) {
if (x1 >= rm->rm_row[0]->rr_cols - if (x1 >= rm->rm_cols - raidz_parity(rm))
raidz_parity(rm))
continue; continue;
for (x2 = x1 + 1; x2 < opts->rto_dcols; x2++) { for (x2 = x1 + 1; x2 < opts->rto_dcols; x2++) {
if (x2 >= rm->rm_row[0]->rr_cols - if (x2 >=
raidz_parity(rm)) rm->rm_cols - raidz_parity(rm))
continue; continue;
/* Check if should stop */ /* Check if should stop */
@@ -936,12 +700,12 @@ run_sweep(void)
opts->rto_dcols = dcols_v[d]; opts->rto_dcols = dcols_v[d];
opts->rto_offset = (1 << ashift_v[a]) * rand(); opts->rto_offset = (1 << ashift_v[a]) * rand();
opts->rto_dsize = size_v[s]; opts->rto_dsize = size_v[s];
opts->rto_expand = rto_opts.rto_expand;
opts->rto_expand_offset = rto_opts.rto_expand_offset;
opts->rto_v = 0; /* be quiet */ opts->rto_v = 0; /* be quiet */
VERIFY3P(thread_create(NULL, 0, sweep_thread, (void *) opts, VERIFY3P(zk_thread_create(NULL, 0,
0, NULL, TS_RUN, defclsyspri), !=, NULL); (thread_func_t)sweep_thread,
(void *) opts, 0, NULL, TS_RUN, 0,
PTHREAD_CREATE_JOINABLE), !=, NULL);
} }
exit: exit:
@@ -970,7 +734,6 @@ exit:
return (sweep_state == SWEEP_ERROR ? SWEEP_ERROR : 0); return (sweep_state == SWEEP_ERROR ? SWEEP_ERROR : 0);
} }
int int
main(int argc, char **argv) main(int argc, char **argv)
{ {
@@ -996,7 +759,7 @@ main(int argc, char **argv)
process_options(argc, argv); process_options(argc, argv);
kernel_init(SPA_MODE_READ); kernel_init(FREAD);
/* setup random data because rand() is not reentrant */ /* setup random data because rand() is not reentrant */
rand_data = (int *)umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL); rand_data = (int *)umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
+1 -9
View File
@@ -38,21 +38,18 @@ static const char *raidz_impl_names[] = {
"avx512bw", "avx512bw",
"aarch64_neon", "aarch64_neon",
"aarch64_neonx2", "aarch64_neonx2",
"powerpc_altivec",
NULL NULL
}; };
typedef struct raidz_test_opts { typedef struct raidz_test_opts {
size_t rto_ashift; size_t rto_ashift;
uint64_t rto_offset; size_t rto_offset;
size_t rto_dcols; size_t rto_dcols;
size_t rto_dsize; size_t rto_dsize;
size_t rto_v; size_t rto_v;
size_t rto_sweep; size_t rto_sweep;
size_t rto_sweep_timeout; size_t rto_sweep_timeout;
size_t rto_benchmark; size_t rto_benchmark;
size_t rto_expand;
uint64_t rto_expand_offset;
size_t rto_sanity; size_t rto_sanity;
size_t rto_gdb; size_t rto_gdb;
@@ -71,8 +68,6 @@ static const raidz_test_opts_t rto_opts_defaults = {
.rto_v = 0, .rto_v = 0,
.rto_sweep = 0, .rto_sweep = 0,
.rto_benchmark = 0, .rto_benchmark = 0,
.rto_expand = 0,
.rto_expand_offset = -1ULL,
.rto_sanity = 0, .rto_sanity = 0,
.rto_gdb = 0, .rto_gdb = 0,
.rto_should_stop = B_FALSE .rto_should_stop = B_FALSE
@@ -117,7 +112,4 @@ void init_zio_abd(zio_t *zio);
void run_raidz_benchmark(void); void run_raidz_benchmark(void);
struct raidz_map *vdev_raidz_map_alloc_expanded(abd_t *, uint64_t, uint64_t,
uint64_t, uint64_t, uint64_t, uint64_t, uint64_t);
#endif /* RAIDZ_TEST_H */ #endif /* RAIDZ_TEST_H */
-2
View File
@@ -1,3 +1 @@
include $(top_srcdir)/config/Shellcheck.am
dist_udev_SCRIPTS = vdev_id dist_udev_SCRIPTS = vdev_id
+129 -315
View File
@@ -79,34 +79,6 @@
# channel 86:00.0 1 A # channel 86:00.0 1 A
# channel 86:00.0 0 B # channel 86:00.0 0 B
# #
# # Example vdev_id.conf - multipath / multijbod-daisychaining
# #
#
# multipath yes
# multijbod yes
#
# # PCI_ID HBA PORT CHANNEL NAME
# channel 85:00.0 1 A
# channel 85:00.0 0 B
# channel 86:00.0 1 A
# channel 86:00.0 0 B
# #
# # Example vdev_id.conf - multipath / mixed
# #
#
# multipath yes
# slot mix
#
# # PCI_ID HBA PORT CHANNEL NAME
# channel 85:00.0 3 A
# channel 85:00.0 2 B
# channel 86:00.0 3 A
# channel 86:00.0 2 B
# channel af:00.0 0 C
# channel af:00.0 1 C
# # # #
# # Example vdev_id.conf - alias # # Example vdev_id.conf - alias
# # # #
@@ -120,10 +92,9 @@ PATH=/bin:/sbin:/usr/bin:/usr/sbin
CONFIG=/etc/zfs/vdev_id.conf CONFIG=/etc/zfs/vdev_id.conf
PHYS_PER_PORT= PHYS_PER_PORT=
DEV= DEV=
MULTIPATH=
TOPOLOGY= TOPOLOGY=
BAY= BAY=
ENCL_ID=""
UNIQ_ENCL_ID=""
usage() { usage() {
cat << EOF cat << EOF
@@ -131,153 +102,71 @@ Usage: vdev_id [-h]
vdev_id <-d device> [-c config_file] [-p phys_per_port] vdev_id <-d device> [-c config_file] [-p phys_per_port]
[-g sas_direct|sas_switch|scsi] [-m] [-g sas_direct|sas_switch|scsi] [-m]
-c specify name of an alternative config file [default=$CONFIG] -c specify name of alernate config file [default=$CONFIG]
-d specify basename of device (i.e. sda) -d specify basename of device (i.e. sda)
-e Create enclose device symlinks only (/dev/by-enclosure) -e Create enclose device symlinks only (/dev/by-enclosure)
-g Storage network topology [default="$TOPOLOGY"] -g Storage network topology [default="$TOPOLOGY"]
-m Run in multipath mode -m Run in multipath mode
-j Run in multijbod mode
-p number of phy's per switch port [default=$PHYS_PER_PORT] -p number of phy's per switch port [default=$PHYS_PER_PORT]
-h show this summary -h show this summary
EOF EOF
exit 1 exit 0
# exit with error to avoid processing usage message by a udev rule
} }
map_slot() { map_slot() {
LINUX_SLOT=$1 local LINUX_SLOT=$1
CHANNEL=$2 local CHANNEL=$2
local MAPPED_SLOT=
MAPPED_SLOT=$(awk -v linux_slot="$LINUX_SLOT" -v channel="$CHANNEL" \ MAPPED_SLOT=`awk "\\$1 == \"slot\" && \\$2 == ${LINUX_SLOT} && \
'$1 == "slot" && $2 == linux_slot && \ \\$4 ~ /^${CHANNEL}$|^$/ { print \\$3; exit }" $CONFIG`
($4 ~ "^"channel"$" || $4 ~ /^$/) { print $3; exit}' $CONFIG)
if [ -z "$MAPPED_SLOT" ] ; then if [ -z "$MAPPED_SLOT" ] ; then
MAPPED_SLOT=$LINUX_SLOT MAPPED_SLOT=$LINUX_SLOT
fi fi
printf "%d" "${MAPPED_SLOT}" printf "%d" ${MAPPED_SLOT}
} }
map_channel() { map_channel() {
MAPPED_CHAN= local MAPPED_CHAN=
PCI_ID=$1 local PCI_ID=$1
PORT=$2 local PORT=$2
case $TOPOLOGY in case $TOPOLOGY in
"sas_switch") "sas_switch")
MAPPED_CHAN=$(awk -v port="$PORT" \ MAPPED_CHAN=`awk "\\$1 == \"channel\" && \\$2 == ${PORT} \
'$1 == "channel" && $2 == port \ { print \\$3; exit }" $CONFIG`
{ print $3; exit }' $CONFIG)
;; ;;
"sas_direct"|"scsi") "sas_direct"|"scsi")
MAPPED_CHAN=$(awk -v pciID="$PCI_ID" -v port="$PORT" \ MAPPED_CHAN=`awk "\\$1 == \"channel\" && \
'$1 == "channel" && $2 == pciID && $3 == port \ \\$2 == \"${PCI_ID}\" && \\$3 == ${PORT} \
{print $4}' $CONFIG) { print \\$4; exit }" $CONFIG`
;; ;;
esac esac
printf "%s" "${MAPPED_CHAN}" printf "%s" ${MAPPED_CHAN}
}
get_encl_id() {
set -- $(echo $1)
count=$#
i=1
while [ $i -le $count ] ; do
d=$(eval echo '$'{$i})
id=$(cat "/sys/class/enclosure/${d}/id")
ENCL_ID="${ENCL_ID} $id"
i=$((i + 1))
done
}
get_uniq_encl_id() {
for uuid in ${ENCL_ID}; do
found=0
for count in ${UNIQ_ENCL_ID}; do
if [ $count = $uuid ]; then
found=1
break
fi
done
if [ $found -eq 0 ]; then
UNIQ_ENCL_ID="${UNIQ_ENCL_ID} $uuid"
fi
done
}
# map_jbod explainer: The bsg driver knows the difference between a SAS
# expander and fanout expander. Use hostX instance along with top-level
# (whole enclosure) expander instances in /sys/class/enclosure and
# matching a field in an array of expanders, using the index of the
# matched array field as the enclosure instance, thereby making jbod IDs
# dynamic. Avoids reliance on high overhead userspace commands like
# multipath and lsscsi and instead uses existing sysfs data. $HOSTCHAN
# variable derived from devpath gymnastics in sas_handler() function.
map_jbod() {
DEVEXP=$(ls -l "/sys/block/$DEV/device/" | grep enclos | awk -F/ '{print $(NF-1) }')
DEV=$1
# Use "set --" to create index values (Arrays)
set -- $(ls -l /sys/class/enclosure | grep -v "^total" | awk '{print $9}')
# Get count of total elements
JBOD_COUNT=$#
JBOD_ITEM=$*
# Build JBODs (enclosure) id from sys/class/enclosure/<dev>/id
get_encl_id "$JBOD_ITEM"
# Different expander instances for each paths.
# Filter out and keep only unique id.
get_uniq_encl_id
# Identify final 'mapped jbod'
j=0
for count in ${UNIQ_ENCL_ID}; do
i=1
j=$((j + 1))
while [ $i -le $JBOD_COUNT ] ; do
d=$(eval echo '$'{$i})
id=$(cat "/sys/class/enclosure/${d}/id")
if [ "$d" = "$DEVEXP" ] && [ $id = $count ] ; then
MAPPED_JBOD=$j
break
fi
i=$((i + 1))
done
done
printf "%d" "${MAPPED_JBOD}"
} }
sas_handler() { sas_handler() {
if [ -z "$PHYS_PER_PORT" ] ; then if [ -z "$PHYS_PER_PORT" ] ; then
PHYS_PER_PORT=$(awk '$1 == "phys_per_port" \ PHYS_PER_PORT=`awk "\\$1 == \"phys_per_port\" \
{print $2; exit}' $CONFIG) {print \\$2; exit}" $CONFIG`
fi fi
PHYS_PER_PORT=${PHYS_PER_PORT:-4} PHYS_PER_PORT=${PHYS_PER_PORT:-4}
if ! echo $PHYS_PER_PORT | grep -q -E '^[0-9]+$' ; then
if ! echo "$PHYS_PER_PORT" | grep -q -E '^[0-9]+$' ; then
echo "Error: phys_per_port value $PHYS_PER_PORT is non-numeric" echo "Error: phys_per_port value $PHYS_PER_PORT is non-numeric"
exit 1 exit 1
fi fi
if [ -z "$MULTIPATH_MODE" ] ; then if [ -z "$MULTIPATH_MODE" ] ; then
MULTIPATH_MODE=$(awk '$1 == "multipath" \ MULTIPATH_MODE=`awk "\\$1 == \"multipath\" \
{print $2; exit}' $CONFIG) {print \\$2; exit}" $CONFIG`
fi
if [ -z "$MULTIJBOD_MODE" ] ; then
MULTIJBOD_MODE=$(awk '$1 == "multijbod" \
{print $2; exit}' $CONFIG)
fi fi
# Use first running component device if we're handling a dm-mpath device # Use first running component device if we're handling a dm-mpath device
if [ "$MULTIPATH_MODE" = "yes" ] ; then if [ "$MULTIPATH_MODE" = "yes" ] ; then
# If udev didn't tell us the UUID via DM_NAME, check /dev/mapper # If udev didn't tell us the UUID via DM_NAME, check /dev/mapper
if [ -z "$DM_NAME" ] ; then if [ -z "$DM_NAME" ] ; then
DM_NAME=$(ls -l --full-time /dev/mapper | DM_NAME=`ls -l --full-time /dev/mapper |
grep "$DEV"$ | awk '{print $9}') awk "/\/$DEV$/{print \\$9}"`
fi fi
# For raw disks udev exports DEVTYPE=partition when # For raw disks udev exports DEVTYPE=partition when
@@ -287,50 +176,28 @@ sas_handler() {
# we have to append the -part suffix directly in the # we have to append the -part suffix directly in the
# helper. # helper.
if [ "$DEVTYPE" != "partition" ] ; then if [ "$DEVTYPE" != "partition" ] ; then
# Match p[number], remove the 'p' and prepend "-part" PART=`echo $DM_NAME | awk -Fp '/p/{print "-part"$2}'`
PART=$(echo "$DM_NAME" |
awk 'match($0,/p[0-9]+$/) {print "-part"substr($0,RSTART+1,RLENGTH-1)}')
fi fi
# Strip off partition information. # Strip off partition information.
DM_NAME=$(echo "$DM_NAME" | sed 's/p[0-9][0-9]*$//') DM_NAME=`echo $DM_NAME | sed 's/p[0-9][0-9]*$//'`
if [ -z "$DM_NAME" ] ; then if [ -z "$DM_NAME" ] ; then
return return
fi fi
# Utilize DM device name to gather subordinate block devices # Get the raw scsi device name from multipath -ll. Strip off
# using sysfs to avoid userspace utilities # leading pipe symbols to make field numbering consistent.
DEV=`multipath -ll $DM_NAME |
# If our DEVNAME is something like /dev/dm-177, then we may be awk '/running/{gsub("^[|]"," "); print $3 ; exit}'`
# able to get our DMDEV from it.
DMDEV=$(echo $DEVNAME | sed 's;/dev/;;g')
if [ ! -e /sys/block/$DMDEV/slaves/* ] ; then
# It's not there, try looking in /dev/mapper
DMDEV=$(ls -l --full-time /dev/mapper | grep $DM_NAME |
awk '{gsub("../", " "); print $NF}')
fi
# Use sysfs pointers in /sys/block/dm-X/slaves because using
# userspace tools creates lots of overhead and should be avoided
# whenever possible. Use awk to isolate lowest instance of
# sd device member in dm device group regardless of string
# length.
DEV=$(ls "/sys/block/$DMDEV/slaves" | awk '
{ len=sprintf ("%20s",length($0)); gsub(/ /,0,str); a[NR]=len "_" $0; }
END {
asort(a)
print substr(a[1],22)
}')
if [ -z "$DEV" ] ; then if [ -z "$DEV" ] ; then
return return
fi fi
fi fi
if echo "$DEV" | grep -q ^/devices/ ; then if echo $DEV | grep -q ^/devices/ ; then
sys_path=$DEV sys_path=$DEV
else else
sys_path=$(udevadm info -q path -p "/sys/block/$DEV" 2>/dev/null) sys_path=`udevadm info -q path -p /sys/block/$DEV 2>/dev/null`
fi fi
# Use positional parameters as an ad-hoc array # Use positional parameters as an ad-hoc array
@@ -340,104 +207,84 @@ sas_handler() {
# Get path up to /sys/.../hostX # Get path up to /sys/.../hostX
i=1 i=1
while [ $i -le $num_dirs ] ; do
while [ $i -le "$num_dirs" ] ; do d=$(eval echo \${$i})
d=$(eval echo '$'{$i})
scsi_host_dir="$scsi_host_dir/$d" scsi_host_dir="$scsi_host_dir/$d"
echo "$d" | grep -q -E '^host[0-9]+$' && break echo $d | grep -q -E '^host[0-9]+$' && break
i=$((i + 1)) i=$(($i + 1))
done done
# Lets grab the SAS host channel number and save it for JBOD sorting later if [ $i = $num_dirs ] ; then
HOSTCHAN=$(echo "$d" | awk -F/ '{ gsub("host","",$NF); print $NF}')
if [ $i = "$num_dirs" ] ; then
return return
fi fi
PCI_ID=$(eval echo '$'{$((i -1))} | awk -F: '{print $2":"$3}') PCI_ID=$(eval echo \${$(($i -1))} | awk -F: '{print $2":"$3}')
# In sas_switch mode, the directory four levels beneath # In sas_switch mode, the directory four levels beneath
# /sys/.../hostX contains symlinks to phy devices that reveal # /sys/.../hostX contains symlinks to phy devices that reveal
# the switch port number. In sas_direct mode, the phy links one # the switch port number. In sas_direct mode, the phy links one
# directory down reveal the HBA port. # directory down reveal the HBA port.
port_dir=$scsi_host_dir port_dir=$scsi_host_dir
case $TOPOLOGY in case $TOPOLOGY in
"sas_switch") j=$((i + 4)) ;; "sas_switch") j=$(($i + 4)) ;;
"sas_direct") j=$((i + 1)) ;; "sas_direct") j=$(($i + 1)) ;;
esac esac
i=$((i + 1)) i=$(($i + 1))
while [ $i -le $j ] ; do while [ $i -le $j ] ; do
port_dir="$port_dir/$(eval echo '$'{$i})" port_dir="$port_dir/$(eval echo \${$i})"
i=$((i + 1)) i=$(($i + 1))
done done
PHY=$(ls -vd "$port_dir"/phy* 2>/dev/null | head -1 | awk -F: '{print $NF}') PHY=`ls -d $port_dir/phy* 2>/dev/null | head -1 | awk -F: '{print $NF}'`
if [ -z "$PHY" ] ; then if [ -z "$PHY" ] ; then
PHY=0 PHY=0
fi fi
PORT=$((PHY / PHYS_PER_PORT)) PORT=$(( $PHY / $PHYS_PER_PORT ))
# Look in /sys/.../sas_device/end_device-X for the bay_identifier # Look in /sys/.../sas_device/end_device-X for the bay_identifier
# attribute. # attribute.
end_device_dir=$port_dir end_device_dir=$port_dir
while [ $i -lt $num_dirs ] ; do
while [ $i -lt "$num_dirs" ] ; do d=$(eval echo \${$i})
d=$(eval echo '$'{$i})
end_device_dir="$end_device_dir/$d" end_device_dir="$end_device_dir/$d"
if echo "$d" | grep -q '^end_device' ; then if echo $d | grep -q '^end_device' ; then
end_device_dir="$end_device_dir/sas_device/$d" end_device_dir="$end_device_dir/sas_device/$d"
break break
fi fi
i=$((i + 1)) i=$(($i + 1))
done done
# Add 'mix' slot type for environments where dm-multipath devices
# include end-devices connected via SAS expanders or direct connection
# to SAS HBA. A mixed connectivity environment such as pool devices
# contained in a SAS JBOD and spare drives or log devices directly
# connected in a server backplane without expanders in the I/O path.
SLOT= SLOT=
case $BAY in case $BAY in
"bay") "bay")
SLOT=$(cat "$end_device_dir/bay_identifier" 2>/dev/null) SLOT=`cat $end_device_dir/bay_identifier 2>/dev/null`
;;
"mix")
if [ $(cat "$end_device_dir/bay_identifier" 2>/dev/null) ] ; then
SLOT=$(cat "$end_device_dir/bay_identifier" 2>/dev/null)
else
SLOT=$(cat "$end_device_dir/phy_identifier" 2>/dev/null)
fi
;; ;;
"phy") "phy")
SLOT=$(cat "$end_device_dir/phy_identifier" 2>/dev/null) SLOT=`cat $end_device_dir/phy_identifier 2>/dev/null`
;; ;;
"port") "port")
d=$(eval echo '$'{$i}) d=$(eval echo \${$i})
SLOT=$(echo "$d" | sed -e 's/^.*://') SLOT=`echo $d | sed -e 's/^.*://'`
;; ;;
"id") "id")
i=$((i + 1)) i=$(($i + 1))
d=$(eval echo '$'{$i}) d=$(eval echo \${$i})
SLOT=$(echo "$d" | sed -e 's/^.*://') SLOT=`echo $d | sed -e 's/^.*://'`
;; ;;
"lun") "lun")
i=$((i + 2)) i=$(($i + 2))
d=$(eval echo '$'{$i}) d=$(eval echo \${$i})
SLOT=$(echo "$d" | sed -e 's/^.*://') SLOT=`echo $d | sed -e 's/^.*://'`
;; ;;
"ses") "ses")
# look for this SAS path in all SCSI Enclosure Services # look for this SAS path in all SCSI Enclosure Services
# (SES) enclosures # (SES) enclosures
sas_address=$(cat "$end_device_dir/sas_address" 2>/dev/null) sas_address=`cat $end_device_dir/sas_address 2>/dev/null`
enclosures=$(lsscsi -g | \ enclosures=`lsscsi -g | \
sed -n -e '/enclosu/s/^.* \([^ ][^ ]*\) *$/\1/p') sed -n -e '/enclosu/s/^.* \([^ ][^ ]*\) *$/\1/p'`
for enclosure in $enclosures; do for enclosure in $enclosures; do
set -- $(sg_ses -p aes "$enclosure" | \ set -- $(sg_ses -p aes $enclosure | \
awk "/device slot number:/{slot=\$12} \ awk "/device slot number:/{slot=\$12} \
/SAS address: $sas_address/\ /SAS address: $sas_address/\
{print slot}") {print slot}")
@@ -452,55 +299,42 @@ sas_handler() {
return return
fi fi
if [ "$MULTIJBOD_MODE" = "yes" ] ; then CHAN=`map_channel $PCI_ID $PORT`
CHAN=$(map_channel "$PCI_ID" "$PORT") SLOT=`map_slot $SLOT $CHAN`
SLOT=$(map_slot "$SLOT" "$CHAN") if [ -z "$CHAN" ] ; then
JBOD=$(map_jbod "$DEV") return
if [ -z "$CHAN" ] ; then
return
fi
echo "${CHAN}"-"${JBOD}"-"${SLOT}${PART}"
else
CHAN=$(map_channel "$PCI_ID" "$PORT")
SLOT=$(map_slot "$SLOT" "$CHAN")
if [ -z "$CHAN" ] ; then
return
fi
echo "${CHAN}${SLOT}${PART}"
fi fi
echo ${CHAN}${SLOT}${PART}
} }
scsi_handler() { scsi_handler() {
if [ -z "$FIRST_BAY_NUMBER" ] ; then if [ -z "$FIRST_BAY_NUMBER" ] ; then
FIRST_BAY_NUMBER=$(awk '$1 == "first_bay_number" \ FIRST_BAY_NUMBER=`awk "\\$1 == \"first_bay_number\" \
{print $2; exit}' $CONFIG) {print \\$2; exit}" $CONFIG`
fi fi
FIRST_BAY_NUMBER=${FIRST_BAY_NUMBER:-0} FIRST_BAY_NUMBER=${FIRST_BAY_NUMBER:-0}
if [ -z "$PHYS_PER_PORT" ] ; then if [ -z "$PHYS_PER_PORT" ] ; then
PHYS_PER_PORT=$(awk '$1 == "phys_per_port" \ PHYS_PER_PORT=`awk "\\$1 == \"phys_per_port\" \
{print $2; exit}' $CONFIG) {print \\$2; exit}" $CONFIG`
fi fi
PHYS_PER_PORT=${PHYS_PER_PORT:-4} PHYS_PER_PORT=${PHYS_PER_PORT:-4}
if ! echo $PHYS_PER_PORT | grep -q -E '^[0-9]+$' ; then
if ! echo "$PHYS_PER_PORT" | grep -q -E '^[0-9]+$' ; then
echo "Error: phys_per_port value $PHYS_PER_PORT is non-numeric" echo "Error: phys_per_port value $PHYS_PER_PORT is non-numeric"
exit 1 exit 1
fi fi
if [ -z "$MULTIPATH_MODE" ] ; then if [ -z "$MULTIPATH_MODE" ] ; then
MULTIPATH_MODE=$(awk '$1 == "multipath" \ MULTIPATH_MODE=`awk "\\$1 == \"multipath\" \
{print $2; exit}' $CONFIG) {print \\$2; exit}" $CONFIG`
fi fi
# Use first running component device if we're handling a dm-mpath device # Use first running component device if we're handling a dm-mpath device
if [ "$MULTIPATH_MODE" = "yes" ] ; then if [ "$MULTIPATH_MODE" = "yes" ] ; then
# If udev didn't tell us the UUID via DM_NAME, check /dev/mapper # If udev didn't tell us the UUID via DM_NAME, check /dev/mapper
if [ -z "$DM_NAME" ] ; then if [ -z "$DM_NAME" ] ; then
DM_NAME=$(ls -l --full-time /dev/mapper | DM_NAME=`ls -l --full-time /dev/mapper |
grep "$DEV"$ | awk '{print $9}') awk "/\/$DEV$/{print \\$9}"`
fi fi
# For raw disks udev exports DEVTYPE=partition when # For raw disks udev exports DEVTYPE=partition when
@@ -510,30 +344,28 @@ scsi_handler() {
# we have to append the -part suffix directly in the # we have to append the -part suffix directly in the
# helper. # helper.
if [ "$DEVTYPE" != "partition" ] ; then if [ "$DEVTYPE" != "partition" ] ; then
# Match p[number], remove the 'p' and prepend "-part" PART=`echo $DM_NAME | awk -Fp '/p/{print "-part"$2}'`
PART=$(echo "$DM_NAME" |
awk 'match($0,/p[0-9]+$/) {print "-part"substr($0,RSTART+1,RLENGTH-1)}')
fi fi
# Strip off partition information. # Strip off partition information.
DM_NAME=$(echo "$DM_NAME" | sed 's/p[0-9][0-9]*$//') DM_NAME=`echo $DM_NAME | sed 's/p[0-9][0-9]*$//'`
if [ -z "$DM_NAME" ] ; then if [ -z "$DM_NAME" ] ; then
return return
fi fi
# Get the raw scsi device name from multipath -ll. Strip off # Get the raw scsi device name from multipath -ll. Strip off
# leading pipe symbols to make field numbering consistent. # leading pipe symbols to make field numbering consistent.
DEV=$(multipath -ll "$DM_NAME" | DEV=`multipath -ll $DM_NAME |
awk '/running/{gsub("^[|]"," "); print $3 ; exit}') awk '/running/{gsub("^[|]"," "); print $3 ; exit}'`
if [ -z "$DEV" ] ; then if [ -z "$DEV" ] ; then
return return
fi fi
fi fi
if echo "$DEV" | grep -q ^/devices/ ; then if echo $DEV | grep -q ^/devices/ ; then
sys_path=$DEV sys_path=$DEV
else else
sys_path=$(udevadm info -q path -p "/sys/block/$DEV" 2>/dev/null) sys_path=`udevadm info -q path -p /sys/block/$DEV 2>/dev/null`
fi fi
# expect sys_path like this, for example: # expect sys_path like this, for example:
@@ -546,47 +378,44 @@ scsi_handler() {
# Get path up to /sys/.../hostX # Get path up to /sys/.../hostX
i=1 i=1
while [ $i -le $num_dirs ] ; do
while [ $i -le "$num_dirs" ] ; do d=$(eval echo \${$i})
d=$(eval echo '$'{$i})
scsi_host_dir="$scsi_host_dir/$d" scsi_host_dir="$scsi_host_dir/$d"
echo $d | grep -q -E '^host[0-9]+$' && break
echo "$d" | grep -q -E '^host[0-9]+$' && break i=$(($i + 1))
i=$((i + 1))
done done
if [ $i = "$num_dirs" ] ; then if [ $i = $num_dirs ] ; then
return return
fi fi
PCI_ID=$(eval echo '$'{$((i -1))} | awk -F: '{print $2":"$3}') PCI_ID=$(eval echo \${$(($i -1))} | awk -F: '{print $2":"$3}')
# In scsi mode, the directory two levels beneath # In scsi mode, the directory two levels beneath
# /sys/.../hostX reveals the port and slot. # /sys/.../hostX reveals the port and slot.
port_dir=$scsi_host_dir port_dir=$scsi_host_dir
j=$((i + 2)) j=$(($i + 2))
i=$((i + 1)) i=$(($i + 1))
while [ $i -le $j ] ; do while [ $i -le $j ] ; do
port_dir="$port_dir/$(eval echo '$'{$i})" port_dir="$port_dir/$(eval echo \${$i})"
i=$((i + 1)) i=$(($i + 1))
done done
set -- $(echo "$port_dir" | sed -e 's/^.*:\([^:]*\):\([^:]*\)$/\1 \2/') set -- $(echo $port_dir | sed -e 's/^.*:\([^:]*\):\([^:]*\)$/\1 \2/')
PORT=$1 PORT=$1
SLOT=$(($2 + FIRST_BAY_NUMBER)) SLOT=$(($2 + $FIRST_BAY_NUMBER))
if [ -z "$SLOT" ] ; then if [ -z "$SLOT" ] ; then
return return
fi fi
CHAN=$(map_channel "$PCI_ID" "$PORT") CHAN=`map_channel $PCI_ID $PORT`
SLOT=$(map_slot "$SLOT" "$CHAN") SLOT=`map_slot $SLOT $CHAN`
if [ -z "$CHAN" ] ; then if [ -z "$CHAN" ] ; then
return return
fi fi
echo "${CHAN}${SLOT}${PART}" echo ${CHAN}${SLOT}${PART}
} }
# Figure out the name for the enclosure symlink # Figure out the name for the enclosure symlink
@@ -596,10 +425,8 @@ enclosure_handler () {
# DEVPATH=/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/subsystem/devices/0:0:0:0/scsi_generic/sg0 # DEVPATH=/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/subsystem/devices/0:0:0:0/scsi_generic/sg0
# Get the enclosure ID ("0:0:0:0") # Get the enclosure ID ("0:0:0:0")
ENC="${DEVPATH%/*}" ENC=$(basename $(readlink -m "/sys/$DEVPATH/../.."))
ENC="${ENC%/*}" if [ ! -d /sys/class/enclosure/$ENC ] ; then
ENC="${ENC##*/}"
if [ ! -d "/sys/class/enclosure/$ENC" ] ; then
# Not an enclosure, bail out # Not an enclosure, bail out
return return
fi fi
@@ -607,26 +434,25 @@ enclosure_handler () {
# Get the long sysfs device path to our enclosure. Looks like: # Get the long sysfs device path to our enclosure. Looks like:
# /devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/ ... /enclosure/0:0:0:0 # /devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/ ... /enclosure/0:0:0:0
ENC_DEVICE=$(readlink "/sys/class/enclosure/$ENC") ENC_DEVICE=$(readlink /sys/class/enclosure/$ENC)
# Grab the full path to the hosts port dir: # Grab the full path to the hosts port dir:
# /devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0 # /devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0
PORT_DIR=$(echo "$ENC_DEVICE" | grep -Eo '.+host[0-9]+/port-[0-9]+:[0-9]+') PORT_DIR=$(echo $ENC_DEVICE | grep -Eo '.+host[0-9]+/port-[0-9]+:[0-9]+')
# Get the port number # Get the port number
PORT_ID=$(echo "$PORT_DIR" | grep -Eo "[0-9]+$") PORT_ID=$(echo $PORT_DIR | grep -Eo "[0-9]+$")
# The PCI directory is two directories up from the port directory # The PCI directory is two directories up from the port directory
# /sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0 # /sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0
PCI_ID_LONG="$(readlink -m "/sys/$PORT_DIR/../..")" PCI_ID_LONG=$(basename $(readlink -m "/sys/$PORT_DIR/../.."))
PCI_ID_LONG="${PCI_ID_LONG##*/}"
# Strip down the PCI address from 0000:05:00.0 to 05:00.0 # Strip down the PCI address from 0000:05:00.0 to 05:00.0
PCI_ID="${PCI_ID_LONG#[0-9]*:}" PCI_ID=$(echo "$PCI_ID_LONG" | sed -r 's/^[0-9]+://g')
# Name our device according to vdev_id.conf (like "L0" or "U1"). # Name our device according to vdev_id.conf (like "L0" or "U1").
NAME=$(awk "/channel/{if (\$1 == \"channel\" && \$2 == \"$PCI_ID\" && \ NAME=$(awk "/channel/{if (\$1 == \"channel\" && \$2 == \"$PCI_ID\" && \
\$3 == \"$PORT_ID\") {print \$4\$3}}" $CONFIG) \$3 == \"$PORT_ID\") {print \$4int(count[\$4])}; count[\$4]++}" $CONFIG)
echo "${NAME}" echo "${NAME}"
} }
@@ -661,12 +487,10 @@ alias_handler () {
# digits as partitions, causing alias creation to fail. This # digits as partitions, causing alias creation to fail. This
# ambiguity seems unavoidable, so devices using this facility # ambiguity seems unavoidable, so devices using this facility
# must not use such names. # must not use such names.
DM_PART= local DM_PART=
if echo "$DM_NAME" | grep -q -E 'p[0-9][0-9]*$' ; then if echo $DM_NAME | grep -q -E 'p[0-9][0-9]*$' ; then
if [ "$DEVTYPE" != "partition" ] ; then if [ "$DEVTYPE" != "partition" ] ; then
# Match p[number], remove the 'p' and prepend "-part" DM_PART=`echo $DM_NAME | awk -Fp '/p/{print "-part"$2}'`
DM_PART=$(echo "$DM_NAME" |
awk 'match($0,/p[0-9]+$/) {print "-part"substr($0,RSTART+1,RLENGTH-1)}')
fi fi
fi fi
@@ -674,25 +498,21 @@ alias_handler () {
for link in $DEVLINKS ; do for link in $DEVLINKS ; do
# Remove partition information to match key of top-level device. # Remove partition information to match key of top-level device.
if [ -n "$DM_PART" ] ; then if [ -n "$DM_PART" ] ; then
link=$(echo "$link" | sed 's/p[0-9][0-9]*$//') link=`echo $link | sed 's/p[0-9][0-9]*$//'`
fi fi
# Check both the fully qualified and the base name of link. # Check both the fully qualified and the base name of link.
for l in $link ${link##*/} ; do for l in $link `basename $link` ; do
if [ ! -z "$l" ]; then alias=`awk "\\$1 == \"alias\" && \\$3 == \"${l}\" \
alias=$(awk -v var="$l" '($1 == "alias") && \ { print \\$2; exit }" $CONFIG`
($3 == var) \ if [ -n "$alias" ] ; then
{ print $2; exit }' $CONFIG) echo ${alias}${DM_PART}
if [ -n "$alias" ] ; then return
echo "${alias}${DM_PART}"
return
fi
fi fi
done done
done done
} }
# main while getopts 'c:d:eg:mp:h' OPTION; do
while getopts 'c:d:eg:jmp:h' OPTION; do
case ${OPTION} in case ${OPTION} in
c) c)
CONFIG=${OPTARG} CONFIG=${OPTARG}
@@ -705,9 +525,7 @@ while getopts 'c:d:eg:jmp:h' OPTION; do
# create the enclosure device symlinks only. We also need # create the enclosure device symlinks only. We also need
# "enclosure_symlinks yes" set in vdev_id.config to actually create the # "enclosure_symlinks yes" set in vdev_id.config to actually create the
# symlink. # symlink.
ENCLOSURE_MODE=$(awk '{if ($1 == "enclosure_symlinks") \ ENCLOSURE_MODE=$(awk '{if ($1 == "enclosure_symlinks") print $2}' $CONFIG)
print $2}' "$CONFIG")
if [ "$ENCLOSURE_MODE" != "yes" ] ; then if [ "$ENCLOSURE_MODE" != "yes" ] ; then
exit 0 exit 0
fi fi
@@ -718,9 +536,6 @@ while getopts 'c:d:eg:jmp:h' OPTION; do
p) p)
PHYS_PER_PORT=${OPTARG} PHYS_PER_PORT=${OPTARG}
;; ;;
j)
MULTIJBOD_MODE=yes
;;
m) m)
MULTIPATH_MODE=yes MULTIPATH_MODE=yes
;; ;;
@@ -730,35 +545,34 @@ while getopts 'c:d:eg:jmp:h' OPTION; do
esac esac
done done
if [ ! -r "$CONFIG" ] ; then if [ ! -r $CONFIG ] ; then
echo "Error: Config file \"$CONFIG\" not found" exit 0
exit 1
fi fi
if [ -z "$DEV" ] && [ -z "$ENCLOSURE_MODE" ] ; then if [ -z "$DEV" -a -z "$ENCLOSURE_MODE" ] ; then
echo "Error: missing required option -d" echo "Error: missing required option -d"
exit 1 exit 1
fi fi
if [ -z "$TOPOLOGY" ] ; then if [ -z "$TOPOLOGY" ] ; then
TOPOLOGY=$(awk '($1 == "topology") {print $2; exit}' "$CONFIG") TOPOLOGY=`awk "\\$1 == \"topology\" {print \\$2; exit}" $CONFIG`
fi fi
if [ -z "$BAY" ] ; then if [ -z "$BAY" ] ; then
BAY=$(awk '($1 == "slot") {print $2; exit}' "$CONFIG") BAY=`awk "\\$1 == \"slot\" {print \\$2; exit}" $CONFIG`
fi fi
TOPOLOGY=${TOPOLOGY:-sas_direct} TOPOLOGY=${TOPOLOGY:-sas_direct}
# Should we create /dev/by-enclosure symlinks? # Should we create /dev/by-enclosure symlinks?
if [ "$ENCLOSURE_MODE" = "yes" ] && [ "$TOPOLOGY" = "sas_direct" ] ; then if [ "$ENCLOSURE_MODE" = "yes" -a "$TOPOLOGY" = "sas_direct" ] ; then
ID_ENCLOSURE=$(enclosure_handler) ID_ENCLOSURE=$(enclosure_handler)
if [ -z "$ID_ENCLOSURE" ] ; then if [ -z "$ID_ENCLOSURE" ] ; then
exit 0 exit 0
fi fi
# Just create the symlinks to the enclosure devices and then exit. # Just create the symlinks to the enclosure devices and then exit.
ENCLOSURE_PREFIX=$(awk '/enclosure_symlinks_prefix/{print $2}' "$CONFIG") ENCLOSURE_PREFIX=$(awk '/enclosure_symlinks_prefix/{print $2}' $CONFIG)
if [ -z "$ENCLOSURE_PREFIX" ] ; then if [ -z "$ENCLOSURE_PREFIX" ] ; then
ENCLOSURE_PREFIX="enc" ENCLOSURE_PREFIX="enc"
fi fi
@@ -768,16 +582,16 @@ if [ "$ENCLOSURE_MODE" = "yes" ] && [ "$TOPOLOGY" = "sas_direct" ] ; then
fi fi
# First check if an alias was defined for this device. # First check if an alias was defined for this device.
ID_VDEV=$(alias_handler) ID_VDEV=`alias_handler`
if [ -z "$ID_VDEV" ] ; then if [ -z "$ID_VDEV" ] ; then
BAY=${BAY:-bay} BAY=${BAY:-bay}
case $TOPOLOGY in case $TOPOLOGY in
sas_direct|sas_switch) sas_direct|sas_switch)
ID_VDEV=$(sas_handler) ID_VDEV=`sas_handler`
;; ;;
scsi) scsi)
ID_VDEV=$(scsi_handler) ID_VDEV=`scsi_handler`
;; ;;
*) *)
echo "Error: unknown topology $TOPOLOGY" echo "Error: unknown topology $TOPOLOGY"
+11 -9
View File
@@ -1,18 +1,20 @@
include $(top_srcdir)/config/Rules.am include $(top_srcdir)/config/Rules.am
# Unconditionally enable debugging for zdb AM_CPPFLAGS += -DDEBUG
AM_CPPFLAGS += -DDEBUG -UNDEBUG -DZFS_DEBUG
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
sbin_PROGRAMS = zdb sbin_PROGRAMS = zdb
zdb_SOURCES = \ zdb_SOURCES = \
zdb.c \ zdb.c \
zdb_il.c \ zdb_il.c
zdb.h
zdb_LDADD = \ zdb_LDADD = \
$(abs_top_builddir)/lib/libzpool/libzpool.la \ $(top_builddir)/lib/libnvpair/libnvpair.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la \ $(top_builddir)/lib/libuutil/libuutil.la \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la $(top_builddir)/lib/libzpool/libzpool.la \
$(top_builddir)/lib/libzfs/libzfs.la \
include $(top_srcdir)/config/CppCheck.am $(top_builddir)/lib/libzfs_core/libzfs_core.la
+571 -4878
View File
File diff suppressed because it is too large Load Diff
+70 -97
View File
@@ -25,7 +25,7 @@
*/ */
/* /*
* Copyright (c) 2013, 2017 by Delphix. All rights reserved. * Copyright (c) 2013, 2016 by Delphix. All rights reserved.
*/ */
/* /*
@@ -42,14 +42,11 @@
#include <sys/resource.h> #include <sys/resource.h>
#include <sys/zil.h> #include <sys/zil.h>
#include <sys/zil_impl.h> #include <sys/zil_impl.h>
#include <sys/spa_impl.h>
#include <sys/abd.h> #include <sys/abd.h>
#include "zdb.h"
extern uint8_t dump_opt[256]; extern uint8_t dump_opt[256];
static char tab_prefix[4] = "\t\t\t"; static char prefix[4] = "\t\t\t";
static void static void
print_log_bp(const blkptr_t *bp, const char *prefix) print_log_bp(const blkptr_t *bp, const char *prefix)
@@ -62,9 +59,8 @@ print_log_bp(const blkptr_t *bp, const char *prefix)
/* ARGSUSED */ /* ARGSUSED */
static void static void
zil_prt_rec_create(zilog_t *zilog, int txtype, const void *arg) zil_prt_rec_create(zilog_t *zilog, int txtype, lr_create_t *lr)
{ {
const lr_create_t *lr = arg;
time_t crtime = lr->lr_crtime[0]; time_t crtime = lr->lr_crtime[0];
char *name, *link; char *name, *link;
lr_attr_t *lrattr; lr_attr_t *lrattr;
@@ -79,55 +75,49 @@ zil_prt_rec_create(zilog_t *zilog, int txtype, const void *arg)
if (txtype == TX_SYMLINK) { if (txtype == TX_SYMLINK) {
link = name + strlen(name) + 1; link = name + strlen(name) + 1;
(void) printf("%s%s -> %s\n", tab_prefix, name, link); (void) printf("%s%s -> %s\n", prefix, name, link);
} else if (txtype != TX_MKXATTR) { } else if (txtype != TX_MKXATTR) {
(void) printf("%s%s\n", tab_prefix, name); (void) printf("%s%s\n", prefix, name);
} }
(void) printf("%s%s", tab_prefix, ctime(&crtime)); (void) printf("%s%s", prefix, ctime(&crtime));
(void) printf("%sdoid %llu, foid %llu, slots %llu, mode %llo\n", (void) printf("%sdoid %llu, foid %llu, slots %llu, mode %llo\n", prefix,
tab_prefix, (u_longlong_t)lr->lr_doid, (u_longlong_t)lr->lr_doid,
(u_longlong_t)LR_FOID_GET_OBJ(lr->lr_foid), (u_longlong_t)LR_FOID_GET_OBJ(lr->lr_foid),
(u_longlong_t)LR_FOID_GET_SLOTS(lr->lr_foid), (u_longlong_t)LR_FOID_GET_SLOTS(lr->lr_foid),
(longlong_t)lr->lr_mode); (longlong_t)lr->lr_mode);
(void) printf("%suid %llu, gid %llu, gen %llu, rdev 0x%llx\n", (void) printf("%suid %llu, gid %llu, gen %llu, rdev 0x%llx\n", prefix,
tab_prefix,
(u_longlong_t)lr->lr_uid, (u_longlong_t)lr->lr_gid, (u_longlong_t)lr->lr_uid, (u_longlong_t)lr->lr_gid,
(u_longlong_t)lr->lr_gen, (u_longlong_t)lr->lr_rdev); (u_longlong_t)lr->lr_gen, (u_longlong_t)lr->lr_rdev);
} }
/* ARGSUSED */ /* ARGSUSED */
static void static void
zil_prt_rec_remove(zilog_t *zilog, int txtype, const void *arg) zil_prt_rec_remove(zilog_t *zilog, int txtype, lr_remove_t *lr)
{ {
const lr_remove_t *lr = arg; (void) printf("%sdoid %llu, name %s\n", prefix,
(void) printf("%sdoid %llu, name %s\n", tab_prefix,
(u_longlong_t)lr->lr_doid, (char *)(lr + 1)); (u_longlong_t)lr->lr_doid, (char *)(lr + 1));
} }
/* ARGSUSED */ /* ARGSUSED */
static void static void
zil_prt_rec_link(zilog_t *zilog, int txtype, const void *arg) zil_prt_rec_link(zilog_t *zilog, int txtype, lr_link_t *lr)
{ {
const lr_link_t *lr = arg; (void) printf("%sdoid %llu, link_obj %llu, name %s\n", prefix,
(void) printf("%sdoid %llu, link_obj %llu, name %s\n", tab_prefix,
(u_longlong_t)lr->lr_doid, (u_longlong_t)lr->lr_link_obj, (u_longlong_t)lr->lr_doid, (u_longlong_t)lr->lr_link_obj,
(char *)(lr + 1)); (char *)(lr + 1));
} }
/* ARGSUSED */ /* ARGSUSED */
static void static void
zil_prt_rec_rename(zilog_t *zilog, int txtype, const void *arg) zil_prt_rec_rename(zilog_t *zilog, int txtype, lr_rename_t *lr)
{ {
const lr_rename_t *lr = arg;
char *snm = (char *)(lr + 1); char *snm = (char *)(lr + 1);
char *tnm = snm + strlen(snm) + 1; char *tnm = snm + strlen(snm) + 1;
(void) printf("%ssdoid %llu, tdoid %llu\n", tab_prefix, (void) printf("%ssdoid %llu, tdoid %llu\n", prefix,
(u_longlong_t)lr->lr_sdoid, (u_longlong_t)lr->lr_tdoid); (u_longlong_t)lr->lr_sdoid, (u_longlong_t)lr->lr_tdoid);
(void) printf("%ssrc %s tgt %s\n", tab_prefix, snm, tnm); (void) printf("%ssrc %s tgt %s\n", prefix, snm, tnm);
} }
/* ARGSUSED */ /* ARGSUSED */
@@ -135,8 +125,9 @@ static int
zil_prt_rec_write_cb(void *data, size_t len, void *unused) zil_prt_rec_write_cb(void *data, size_t len, void *unused)
{ {
char *cdata = data; char *cdata = data;
int i;
for (size_t i = 0; i < len; i++) { for (i = 0; i < len; i++) {
if (isprint(*cdata)) if (isprint(*cdata))
(void) printf("%c ", *cdata); (void) printf("%c ", *cdata);
else else
@@ -148,16 +139,15 @@ zil_prt_rec_write_cb(void *data, size_t len, void *unused)
/* ARGSUSED */ /* ARGSUSED */
static void static void
zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg) zil_prt_rec_write(zilog_t *zilog, int txtype, lr_write_t *lr)
{ {
const lr_write_t *lr = arg;
abd_t *data; abd_t *data;
const blkptr_t *bp = &lr->lr_blkptr; blkptr_t *bp = &lr->lr_blkptr;
zbookmark_phys_t zb; zbookmark_phys_t zb;
int verbose = MAX(dump_opt['d'], dump_opt['i']); int verbose = MAX(dump_opt['d'], dump_opt['i']);
int error; int error;
(void) printf("%sfoid %llu, offset %llx, length %llx\n", tab_prefix, (void) printf("%sfoid %llu, offset %llx, length %llx\n", prefix,
(u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_offset, (u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_offset,
(u_longlong_t)lr->lr_length); (u_longlong_t)lr->lr_length);
@@ -165,21 +155,20 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
return; return;
if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) { if (lr->lr_common.lrc_reclen == sizeof (lr_write_t)) {
(void) printf("%shas blkptr, %s\n", tab_prefix, (void) printf("%shas blkptr, %s\n", prefix,
!BP_IS_HOLE(bp) && !BP_IS_HOLE(bp) &&
bp->blk_birth >= spa_min_claim_txg(zilog->zl_spa) ? bp->blk_birth >= spa_first_txg(zilog->zl_spa) ?
"will claim" : "won't claim"); "will claim" : "won't claim");
print_log_bp(bp, tab_prefix); print_log_bp(bp, prefix);
if (BP_IS_HOLE(bp)) { if (BP_IS_HOLE(bp)) {
(void) printf("\t\t\tLSIZE 0x%llx\n", (void) printf("\t\t\tLSIZE 0x%llx\n",
(u_longlong_t)BP_GET_LSIZE(bp)); (u_longlong_t)BP_GET_LSIZE(bp));
(void) printf("%s<hole>\n", tab_prefix); (void) printf("%s<hole>\n", prefix);
return; return;
} }
if (bp->blk_birth < zilog->zl_header->zh_claim_txg) { if (bp->blk_birth < zilog->zl_header->zh_claim_txg) {
(void) printf("%s<block already committed>\n", (void) printf("%s<block already committed>\n", prefix);
tab_prefix);
return; return;
} }
@@ -199,7 +188,7 @@ zil_prt_rec_write(zilog_t *zilog, int txtype, const void *arg)
abd_copy_from_buf(data, lr + 1, lr->lr_length); abd_copy_from_buf(data, lr + 1, lr->lr_length);
} }
(void) printf("%s", tab_prefix); (void) printf("%s", prefix);
(void) abd_iterate_func(data, (void) abd_iterate_func(data,
0, MIN(lr->lr_length, (verbose < 6 ? 20 : SPA_MAXBLOCKSIZE)), 0, MIN(lr->lr_length, (verbose < 6 ? 20 : SPA_MAXBLOCKSIZE)),
zil_prt_rec_write_cb, NULL); zil_prt_rec_write_cb, NULL);
@@ -211,55 +200,52 @@ out:
/* ARGSUSED */ /* ARGSUSED */
static void static void
zil_prt_rec_truncate(zilog_t *zilog, int txtype, const void *arg) zil_prt_rec_truncate(zilog_t *zilog, int txtype, lr_truncate_t *lr)
{ {
const lr_truncate_t *lr = arg; (void) printf("%sfoid %llu, offset 0x%llx, length 0x%llx\n", prefix,
(void) printf("%sfoid %llu, offset 0x%llx, length 0x%llx\n", tab_prefix,
(u_longlong_t)lr->lr_foid, (longlong_t)lr->lr_offset, (u_longlong_t)lr->lr_foid, (longlong_t)lr->lr_offset,
(u_longlong_t)lr->lr_length); (u_longlong_t)lr->lr_length);
} }
/* ARGSUSED */ /* ARGSUSED */
static void static void
zil_prt_rec_setattr(zilog_t *zilog, int txtype, const void *arg) zil_prt_rec_setattr(zilog_t *zilog, int txtype, lr_setattr_t *lr)
{ {
const lr_setattr_t *lr = arg;
time_t atime = (time_t)lr->lr_atime[0]; time_t atime = (time_t)lr->lr_atime[0];
time_t mtime = (time_t)lr->lr_mtime[0]; time_t mtime = (time_t)lr->lr_mtime[0];
(void) printf("%sfoid %llu, mask 0x%llx\n", tab_prefix, (void) printf("%sfoid %llu, mask 0x%llx\n", prefix,
(u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_mask); (u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_mask);
if (lr->lr_mask & AT_MODE) { if (lr->lr_mask & AT_MODE) {
(void) printf("%sAT_MODE %llo\n", tab_prefix, (void) printf("%sAT_MODE %llo\n", prefix,
(longlong_t)lr->lr_mode); (longlong_t)lr->lr_mode);
} }
if (lr->lr_mask & AT_UID) { if (lr->lr_mask & AT_UID) {
(void) printf("%sAT_UID %llu\n", tab_prefix, (void) printf("%sAT_UID %llu\n", prefix,
(u_longlong_t)lr->lr_uid); (u_longlong_t)lr->lr_uid);
} }
if (lr->lr_mask & AT_GID) { if (lr->lr_mask & AT_GID) {
(void) printf("%sAT_GID %llu\n", tab_prefix, (void) printf("%sAT_GID %llu\n", prefix,
(u_longlong_t)lr->lr_gid); (u_longlong_t)lr->lr_gid);
} }
if (lr->lr_mask & AT_SIZE) { if (lr->lr_mask & AT_SIZE) {
(void) printf("%sAT_SIZE %llu\n", tab_prefix, (void) printf("%sAT_SIZE %llu\n", prefix,
(u_longlong_t)lr->lr_size); (u_longlong_t)lr->lr_size);
} }
if (lr->lr_mask & AT_ATIME) { if (lr->lr_mask & AT_ATIME) {
(void) printf("%sAT_ATIME %llu.%09llu %s", tab_prefix, (void) printf("%sAT_ATIME %llu.%09llu %s", prefix,
(u_longlong_t)lr->lr_atime[0], (u_longlong_t)lr->lr_atime[0],
(u_longlong_t)lr->lr_atime[1], (u_longlong_t)lr->lr_atime[1],
ctime(&atime)); ctime(&atime));
} }
if (lr->lr_mask & AT_MTIME) { if (lr->lr_mask & AT_MTIME) {
(void) printf("%sAT_MTIME %llu.%09llu %s", tab_prefix, (void) printf("%sAT_MTIME %llu.%09llu %s", prefix,
(u_longlong_t)lr->lr_mtime[0], (u_longlong_t)lr->lr_mtime[0],
(u_longlong_t)lr->lr_mtime[1], (u_longlong_t)lr->lr_mtime[1],
ctime(&mtime)); ctime(&mtime));
@@ -268,48 +254,46 @@ zil_prt_rec_setattr(zilog_t *zilog, int txtype, const void *arg)
/* ARGSUSED */ /* ARGSUSED */
static void static void
zil_prt_rec_acl(zilog_t *zilog, int txtype, const void *arg) zil_prt_rec_acl(zilog_t *zilog, int txtype, lr_acl_t *lr)
{ {
const lr_acl_t *lr = arg; (void) printf("%sfoid %llu, aclcnt %llu\n", prefix,
(void) printf("%sfoid %llu, aclcnt %llu\n", tab_prefix,
(u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_aclcnt); (u_longlong_t)lr->lr_foid, (u_longlong_t)lr->lr_aclcnt);
} }
typedef void (*zil_prt_rec_func_t)(zilog_t *, int, const void *); typedef void (*zil_prt_rec_func_t)(zilog_t *, int, void *);
typedef struct zil_rec_info { typedef struct zil_rec_info {
zil_prt_rec_func_t zri_print; zil_prt_rec_func_t zri_print;
const char *zri_name; char *zri_name;
uint64_t zri_count; uint64_t zri_count;
} zil_rec_info_t; } zil_rec_info_t;
static zil_rec_info_t zil_rec_info[TX_MAX_TYPE] = { static zil_rec_info_t zil_rec_info[TX_MAX_TYPE] = {
{.zri_print = NULL, .zri_name = "Total "}, { NULL, "Total " },
{.zri_print = zil_prt_rec_create, .zri_name = "TX_CREATE "}, { (zil_prt_rec_func_t)zil_prt_rec_create, "TX_CREATE " },
{.zri_print = zil_prt_rec_create, .zri_name = "TX_MKDIR "}, { (zil_prt_rec_func_t)zil_prt_rec_create, "TX_MKDIR " },
{.zri_print = zil_prt_rec_create, .zri_name = "TX_MKXATTR "}, { (zil_prt_rec_func_t)zil_prt_rec_create, "TX_MKXATTR " },
{.zri_print = zil_prt_rec_create, .zri_name = "TX_SYMLINK "}, { (zil_prt_rec_func_t)zil_prt_rec_create, "TX_SYMLINK " },
{.zri_print = zil_prt_rec_remove, .zri_name = "TX_REMOVE "}, { (zil_prt_rec_func_t)zil_prt_rec_remove, "TX_REMOVE " },
{.zri_print = zil_prt_rec_remove, .zri_name = "TX_RMDIR "}, { (zil_prt_rec_func_t)zil_prt_rec_remove, "TX_RMDIR " },
{.zri_print = zil_prt_rec_link, .zri_name = "TX_LINK "}, { (zil_prt_rec_func_t)zil_prt_rec_link, "TX_LINK " },
{.zri_print = zil_prt_rec_rename, .zri_name = "TX_RENAME "}, { (zil_prt_rec_func_t)zil_prt_rec_rename, "TX_RENAME " },
{.zri_print = zil_prt_rec_write, .zri_name = "TX_WRITE "}, { (zil_prt_rec_func_t)zil_prt_rec_write, "TX_WRITE " },
{.zri_print = zil_prt_rec_truncate, .zri_name = "TX_TRUNCATE "}, { (zil_prt_rec_func_t)zil_prt_rec_truncate, "TX_TRUNCATE " },
{.zri_print = zil_prt_rec_setattr, .zri_name = "TX_SETATTR "}, { (zil_prt_rec_func_t)zil_prt_rec_setattr, "TX_SETATTR " },
{.zri_print = zil_prt_rec_acl, .zri_name = "TX_ACL_V0 "}, { (zil_prt_rec_func_t)zil_prt_rec_acl, "TX_ACL_V0 " },
{.zri_print = zil_prt_rec_acl, .zri_name = "TX_ACL_ACL "}, { (zil_prt_rec_func_t)zil_prt_rec_acl, "TX_ACL_ACL " },
{.zri_print = zil_prt_rec_create, .zri_name = "TX_CREATE_ACL "}, { (zil_prt_rec_func_t)zil_prt_rec_create, "TX_CREATE_ACL " },
{.zri_print = zil_prt_rec_create, .zri_name = "TX_CREATE_ATTR "}, { (zil_prt_rec_func_t)zil_prt_rec_create, "TX_CREATE_ATTR " },
{.zri_print = zil_prt_rec_create, .zri_name = "TX_CREATE_ACL_ATTR "}, { (zil_prt_rec_func_t)zil_prt_rec_create, "TX_CREATE_ACL_ATTR " },
{.zri_print = zil_prt_rec_create, .zri_name = "TX_MKDIR_ACL "}, { (zil_prt_rec_func_t)zil_prt_rec_create, "TX_MKDIR_ACL " },
{.zri_print = zil_prt_rec_create, .zri_name = "TX_MKDIR_ATTR "}, { (zil_prt_rec_func_t)zil_prt_rec_create, "TX_MKDIR_ATTR " },
{.zri_print = zil_prt_rec_create, .zri_name = "TX_MKDIR_ACL_ATTR "}, { (zil_prt_rec_func_t)zil_prt_rec_create, "TX_MKDIR_ACL_ATTR " },
{.zri_print = zil_prt_rec_write, .zri_name = "TX_WRITE2 "}, { (zil_prt_rec_func_t)zil_prt_rec_write, "TX_WRITE2 " },
}; };
/* ARGSUSED */ /* ARGSUSED */
static int static int
print_log_record(zilog_t *zilog, const lr_t *lr, void *arg, uint64_t claim_txg) print_log_record(zilog_t *zilog, lr_t *lr, void *arg, uint64_t claim_txg)
{ {
int txtype; int txtype;
int verbose = MAX(dump_opt['d'], dump_opt['i']); int verbose = MAX(dump_opt['d'], dump_opt['i']);
@@ -327,13 +311,8 @@ print_log_record(zilog_t *zilog, const lr_t *lr, void *arg, uint64_t claim_txg)
(u_longlong_t)lr->lrc_txg, (u_longlong_t)lr->lrc_txg,
(u_longlong_t)lr->lrc_seq); (u_longlong_t)lr->lrc_seq);
if (txtype && verbose >= 3) { if (txtype && verbose >= 3)
if (!zilog->zl_os->os_encrypted) { zil_rec_info[txtype].zri_print(zilog, txtype, lr);
zil_rec_info[txtype].zri_print(zilog, txtype, lr);
} else {
(void) printf("%s(encrypted)\n", tab_prefix);
}
}
zil_rec_info[txtype].zri_count++; zil_rec_info[txtype].zri_count++;
zil_rec_info[0].zri_count++; zil_rec_info[0].zri_count++;
@@ -343,12 +322,11 @@ print_log_record(zilog_t *zilog, const lr_t *lr, void *arg, uint64_t claim_txg)
/* ARGSUSED */ /* ARGSUSED */
static int static int
print_log_block(zilog_t *zilog, const blkptr_t *bp, void *arg, print_log_block(zilog_t *zilog, blkptr_t *bp, void *arg, uint64_t claim_txg)
uint64_t claim_txg)
{ {
char blkbuf[BP_SPRINTF_LEN + 10]; char blkbuf[BP_SPRINTF_LEN + 10];
int verbose = MAX(dump_opt['d'], dump_opt['i']); int verbose = MAX(dump_opt['d'], dump_opt['i']);
const char *claim; char *claim;
if (verbose <= 3) if (verbose <= 3)
return (0); return (0);
@@ -363,7 +341,7 @@ print_log_block(zilog_t *zilog, const blkptr_t *bp, void *arg,
if (claim_txg != 0) if (claim_txg != 0)
claim = "already claimed"; claim = "already claimed";
else if (bp->blk_birth >= spa_min_claim_txg(zilog->zl_spa)) else if (bp->blk_birth >= spa_first_txg(zilog->zl_spa))
claim = "will claim"; claim = "will claim";
else else
claim = "won't claim"; claim = "won't claim";
@@ -377,7 +355,7 @@ print_log_block(zilog_t *zilog, const blkptr_t *bp, void *arg,
static void static void
print_log_stats(int verbose) print_log_stats(int verbose)
{ {
unsigned i, w, p10; int i, w, p10;
if (verbose > 3) if (verbose > 3)
(void) printf("\n"); (void) printf("\n");
@@ -418,15 +396,10 @@ dump_intent_log(zilog_t *zilog)
for (i = 0; i < TX_MAX_TYPE; i++) for (i = 0; i < TX_MAX_TYPE; i++)
zil_rec_info[i].zri_count = 0; zil_rec_info[i].zri_count = 0;
/* see comment in zil_claim() or zil_check_log_chain() */
if (zilog->zl_spa->spa_uberblock.ub_checkpoint_txg != 0 &&
zh->zh_claim_txg == 0)
return;
if (verbose >= 2) { if (verbose >= 2) {
(void) printf("\n"); (void) printf("\n");
(void) zil_parse(zilog, print_log_block, print_log_record, NULL, (void) zil_parse(zilog, print_log_block, print_log_record, NULL,
zh->zh_claim_txg, B_FALSE); zh->zh_claim_txg);
print_log_stats(verbose); print_log_stats(verbose);
} }
} }
+53 -12
View File
@@ -1,10 +1,10 @@
include $(top_srcdir)/config/Rules.am include $(top_srcdir)/config/Rules.am
include $(top_srcdir)/config/Shellcheck.am
AM_CFLAGS += $(LIBUDEV_CFLAGS) $(LIBUUID_CFLAGS) DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
SUBDIRS = zed.d EXTRA_DIST = zed.d/README
SHELLCHECKDIRS = $(SUBDIRS)
sbin_PROGRAMS = zed sbin_PROGRAMS = zed
@@ -40,14 +40,55 @@ FMA_SRC = \
zed_SOURCES = $(ZED_SRC) $(FMA_SRC) zed_SOURCES = $(ZED_SRC) $(FMA_SRC)
zed_LDADD = \ zed_LDADD = \
$(abs_top_builddir)/lib/libzfs/libzfs.la \ $(top_builddir)/lib/libavl/libavl.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la \ $(top_builddir)/lib/libnvpair/libnvpair.la \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la \ $(top_builddir)/lib/libspl/libspl.la \
$(abs_top_builddir)/lib/libuutil/libuutil.la $(top_builddir)/lib/libuutil/libuutil.la \
$(top_builddir)/lib/libzpool/libzpool.la \
$(top_builddir)/lib/libzfs/libzfs.la \
$(top_builddir)/lib/libzfs_core/libzfs_core.la
zed_LDADD += -lrt $(LIBATOMIC_LIBS) $(LIBUDEV_LIBS) $(LIBUUID_LIBS) zed_LDFLAGS = -lrt -pthread
zed_LDFLAGS = -pthread
EXTRA_DIST = agents/README.md zedconfdir = $(sysconfdir)/zfs/zed.d
include $(top_srcdir)/config/CppCheck.am dist_zedconf_DATA = \
zed.d/zed-functions.sh \
zed.d/zed.rc
zedexecdir = $(libexecdir)/zfs/zed.d
dist_zedexec_SCRIPTS = \
zed.d/all-debug.sh \
zed.d/all-syslog.sh \
zed.d/data-notify.sh \
zed.d/generic-notify.sh \
zed.d/resilver_finish-notify.sh \
zed.d/scrub_finish-notify.sh \
zed.d/statechange-led.sh \
zed.d/statechange-notify.sh \
zed.d/vdev_clear-led.sh \
zed.d/vdev_attach-led.sh \
zed.d/pool_import-led.sh \
zed.d/resilver_finish-start-scrub.sh
zedconfdefaults = \
all-syslog.sh \
data-notify.sh \
resilver_finish-notify.sh \
scrub_finish-notify.sh \
statechange-led.sh \
statechange-notify.sh \
vdev_clear-led.sh \
vdev_attach-led.sh \
pool_import-led.sh \
resilver_finish-start-scrub.sh
install-data-hook:
$(MKDIR_P) "$(DESTDIR)$(zedconfdir)"
for f in $(zedconfdefaults); do \
test -f "$(DESTDIR)$(zedconfdir)/$${f}" -o \
-L "$(DESTDIR)$(zedconfdir)/$${f}" || \
ln -s "$(zedexecdir)/$${f}" "$(DESTDIR)$(zedconfdir)"; \
done
chmod 0600 "$(DESTDIR)$(zedconfdir)/zed.rc"
+1 -2
View File
@@ -25,7 +25,7 @@
*/ */
/* /*
* This file implements the minimal FMD module API required to support the * This file imlements the minimal FMD module API required to support the
* fault logic modules in ZED. This support includes module registration, * fault logic modules in ZED. This support includes module registration,
* memory allocation, module property accessors, basic case management, * memory allocation, module property accessors, basic case management,
* one-shot timers and SERD engines. * one-shot timers and SERD engines.
@@ -599,7 +599,6 @@ fmd_timer_install(fmd_hdl_t *hdl, void *arg, fmd_event_t *ep, hrtime_t delta)
sev.sigev_notify_function = _timer_notify; sev.sigev_notify_function = _timer_notify;
sev.sigev_notify_attributes = NULL; sev.sigev_notify_attributes = NULL;
sev.sigev_value.sival_ptr = ftp; sev.sigev_value.sival_ptr = ftp;
sev.sigev_signo = 0;
timer_create(CLOCK_REALTIME, &sev, &ftp->ft_tid); timer_create(CLOCK_REALTIME, &sev, &ftp->ft_tid);
timer_settime(ftp->ft_tid, 0, &its, NULL); timer_settime(ftp->ft_tid, 0, &its, NULL);
+1 -1
View File
@@ -281,7 +281,7 @@ fmd_serd_eng_empty(fmd_serd_eng_t *sgp)
void void
fmd_serd_eng_reset(fmd_serd_eng_t *sgp) fmd_serd_eng_reset(fmd_serd_eng_t *sgp)
{ {
serd_log_msg(" SERD Engine: resetting %s", sgp->sg_name); serd_log_msg(" SERD Engine: reseting %s", sgp->sg_name);
while (sgp->sg_count != 0) while (sgp->sg_count != 0)
fmd_serd_eng_discard(sgp, list_head(&sgp->sg_list)); fmd_serd_eng_discard(sgp, list_head(&sgp->sg_list));
+41 -105
View File
@@ -12,8 +12,6 @@
/* /*
* Copyright (c) 2016, Intel Corporation. * Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2018, loli10K <ezomori.nozomu@gmail.com>
* Copyright (c) 2021 Hewlett Packard Enterprise Development LP
*/ */
#include <libnvpair.h> #include <libnvpair.h>
@@ -55,25 +53,13 @@ pthread_t g_agents_tid;
libzfs_handle_t *g_zfs_hdl; libzfs_handle_t *g_zfs_hdl;
/* guid search data */ /* guid search data */
typedef enum device_type {
DEVICE_TYPE_L2ARC, /* l2arc device */
DEVICE_TYPE_SPARE, /* spare device */
DEVICE_TYPE_PRIMARY /* any primary pool storage device */
} device_type_t;
typedef struct guid_search { typedef struct guid_search {
uint64_t gs_pool_guid; uint64_t gs_pool_guid;
uint64_t gs_vdev_guid; uint64_t gs_vdev_guid;
char *gs_devid; char *gs_devid;
device_type_t gs_vdev_type;
uint64_t gs_vdev_expandtime; /* vdev expansion time */
} guid_search_t; } guid_search_t;
/* static void
* Walks the vdev tree recursively looking for a matching devid.
* Returns B_TRUE as soon as a matching device is found, B_FALSE otherwise.
*/
static boolean_t
zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg) zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg)
{ {
guid_search_t *gsp = arg; guid_search_t *gsp = arg;
@@ -86,48 +72,19 @@ zfs_agent_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *arg)
*/ */
if (nvlist_lookup_nvlist_array(nvl, ZPOOL_CONFIG_CHILDREN, if (nvlist_lookup_nvlist_array(nvl, ZPOOL_CONFIG_CHILDREN,
&child, &children) == 0) { &child, &children) == 0) {
for (c = 0; c < children; c++) { for (c = 0; c < children; c++)
if (zfs_agent_iter_vdev(zhp, child[c], gsp)) { zfs_agent_iter_vdev(zhp, child[c], gsp);
gsp->gs_vdev_type = DEVICE_TYPE_PRIMARY; return;
return (B_TRUE);
}
}
} }
/* /*
* Iterate over any spares and cache devices * On a devid match, grab the vdev guid
*/ */
if (nvlist_lookup_nvlist_array(nvl, ZPOOL_CONFIG_SPARES, if ((gsp->gs_vdev_guid == 0) &&
&child, &children) == 0) {
for (c = 0; c < children; c++) {
if (zfs_agent_iter_vdev(zhp, child[c], gsp)) {
gsp->gs_vdev_type = DEVICE_TYPE_L2ARC;
return (B_TRUE);
}
}
}
if (nvlist_lookup_nvlist_array(nvl, ZPOOL_CONFIG_L2CACHE,
&child, &children) == 0) {
for (c = 0; c < children; c++) {
if (zfs_agent_iter_vdev(zhp, child[c], gsp)) {
gsp->gs_vdev_type = DEVICE_TYPE_SPARE;
return (B_TRUE);
}
}
}
/*
* On a devid match, grab the vdev guid and expansion time, if any.
*/
if (gsp->gs_devid != NULL &&
(nvlist_lookup_string(nvl, ZPOOL_CONFIG_DEVID, &path) == 0) && (nvlist_lookup_string(nvl, ZPOOL_CONFIG_DEVID, &path) == 0) &&
(strcmp(gsp->gs_devid, path) == 0)) { (strcmp(gsp->gs_devid, path) == 0)) {
(void) nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_GUID, (void) nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_GUID,
&gsp->gs_vdev_guid); &gsp->gs_vdev_guid);
(void) nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_EXPANSION_TIME,
&gsp->gs_vdev_expandtime);
return (B_TRUE);
} }
return (B_FALSE);
} }
static int static int
@@ -142,7 +99,7 @@ zfs_agent_iter_pool(zpool_handle_t *zhp, void *arg)
if ((config = zpool_get_config(zhp, NULL)) != NULL) { if ((config = zpool_get_config(zhp, NULL)) != NULL) {
if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE, if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE,
&nvl) == 0) { &nvl) == 0) {
(void) zfs_agent_iter_vdev(zhp, nvl, gsp); zfs_agent_iter_vdev(zhp, nvl, gsp);
} }
} }
/* /*
@@ -178,12 +135,10 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
} }
/* /*
* On Linux, we don't get the expected FM_RESOURCE_REMOVED ereport * On ZFS on Linux, we don't get the expected FM_RESOURCE_REMOVED
* from the vdev_disk layer after a hot unplug. Fortunately we do * ereport from vdev_disk layer after a hot unplug. Fortunately we
* get an EC_DEV_REMOVE from our disk monitor and it is a suitable * get a EC_DEV_REMOVE from our disk monitor and it is a suitable
* proxy so we remap it here for the benefit of the diagnosis engine. * proxy so we remap it here for the benefit of the diagnosis engine.
* Starting in OpenZFS 2.0, we do get FM_RESOURCE_REMOVED from the spa
* layer. Processing multiple FM_RESOURCE_REMOVED events is not harmful.
*/ */
if ((strcmp(class, EC_DEV_REMOVE) == 0) && if ((strcmp(class, EC_DEV_REMOVE) == 0) &&
(strcmp(subclass, ESC_DISK) == 0) && (strcmp(subclass, ESC_DISK) == 0) &&
@@ -193,8 +148,6 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
struct timeval tv; struct timeval tv;
int64_t tod[2]; int64_t tod[2];
uint64_t pool_guid = 0, vdev_guid = 0; uint64_t pool_guid = 0, vdev_guid = 0;
guid_search_t search = { 0 };
device_type_t devtype = DEVICE_TYPE_PRIMARY;
class = "resource.fs.zfs.removed"; class = "resource.fs.zfs.removed";
subclass = ""; subclass = "";
@@ -203,61 +156,30 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
(void) nvlist_lookup_uint64(nvl, ZFS_EV_POOL_GUID, &pool_guid); (void) nvlist_lookup_uint64(nvl, ZFS_EV_POOL_GUID, &pool_guid);
(void) nvlist_lookup_uint64(nvl, ZFS_EV_VDEV_GUID, &vdev_guid); (void) nvlist_lookup_uint64(nvl, ZFS_EV_VDEV_GUID, &vdev_guid);
(void) gettimeofday(&tv, NULL);
tod[0] = tv.tv_sec;
tod[1] = tv.tv_usec;
(void) nvlist_add_int64_array(payload, FM_EREPORT_TIME, tod, 2);
/* /*
* For multipath, spare and l2arc devices ZFS_EV_VDEV_GUID or * For multipath, ZFS_EV_VDEV_GUID is missing so find it.
* ZFS_EV_POOL_GUID may be missing so find them.
*/ */
if (pool_guid == 0 || vdev_guid == 0) { if (vdev_guid == 0) {
if ((nvlist_lookup_string(nvl, DEV_IDENTIFIER, guid_search_t search = { 0 };
&search.gs_devid) == 0) &&
(zpool_iter(g_zfs_hdl, zfs_agent_iter_pool, &search)
== 1)) {
if (pool_guid == 0)
pool_guid = search.gs_pool_guid;
if (vdev_guid == 0)
vdev_guid = search.gs_vdev_guid;
devtype = search.gs_vdev_type;
}
}
/* (void) nvlist_lookup_string(nvl, DEV_IDENTIFIER,
* We want to avoid reporting "remove" events coming from &search.gs_devid);
* libudev for VDEVs which were expanded recently (10s) and
* avoid activating spares in response to partitions being (void) zpool_iter(g_zfs_hdl, zfs_agent_iter_pool,
* deleted and created in rapid succession. &search);
*/ pool_guid = search.gs_pool_guid;
if (search.gs_vdev_expandtime != 0 && vdev_guid = search.gs_vdev_guid;
search.gs_vdev_expandtime + 10 > tv.tv_sec) {
zed_log_msg(LOG_INFO, "agent post event: ignoring '%s' "
"for recently expanded device '%s'", EC_DEV_REMOVE,
search.gs_devid);
goto out;
} }
(void) nvlist_add_uint64(payload, (void) nvlist_add_uint64(payload,
FM_EREPORT_PAYLOAD_ZFS_POOL_GUID, pool_guid); FM_EREPORT_PAYLOAD_ZFS_POOL_GUID, pool_guid);
(void) nvlist_add_uint64(payload, (void) nvlist_add_uint64(payload,
FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID, vdev_guid); FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID, vdev_guid);
switch (devtype) {
case DEVICE_TYPE_L2ARC: (void) gettimeofday(&tv, NULL);
(void) nvlist_add_string(payload, tod[0] = tv.tv_sec;
FM_EREPORT_PAYLOAD_ZFS_VDEV_TYPE, tod[1] = tv.tv_usec;
VDEV_TYPE_L2CACHE); (void) nvlist_add_int64_array(payload, FM_EREPORT_TIME, tod, 2);
break;
case DEVICE_TYPE_SPARE:
(void) nvlist_add_string(payload,
FM_EREPORT_PAYLOAD_ZFS_VDEV_TYPE, VDEV_TYPE_SPARE);
break;
case DEVICE_TYPE_PRIMARY:
(void) nvlist_add_string(payload,
FM_EREPORT_PAYLOAD_ZFS_VDEV_TYPE, VDEV_TYPE_DISK);
break;
}
zed_log_msg(LOG_INFO, "agent post event: mapping '%s' to '%s'", zed_log_msg(LOG_INFO, "agent post event: mapping '%s' to '%s'",
EC_DEV_REMOVE, class); EC_DEV_REMOVE, class);
@@ -271,7 +193,6 @@ zfs_agent_post_event(const char *class, const char *subclass, nvlist_t *nvl)
list_insert_tail(&agent_events, event); list_insert_tail(&agent_events, event);
(void) pthread_mutex_unlock(&agent_lock); (void) pthread_mutex_unlock(&agent_lock);
out:
(void) pthread_cond_signal(&agent_cond); (void) pthread_cond_signal(&agent_cond);
} }
@@ -392,7 +313,6 @@ zfs_agent_init(libzfs_handle_t *zfs_hdl)
list_destroy(&agent_events); list_destroy(&agent_events);
zed_log_die("Failed to initialize agents"); zed_log_die("Failed to initialize agents");
} }
pthread_setname_np(g_agents_tid, "agents");
} }
void void
@@ -430,3 +350,19 @@ zfs_agent_fini(void)
g_zfs_hdl = NULL; g_zfs_hdl = NULL;
} }
/*
* In ZED context, all the FMA agents run in the same thread
* and do not require a unique libzfs instance. Modules should
* use these stubs.
*/
libzfs_handle_t *
__libzfs_init(void)
{
return (g_zfs_hdl);
}
void
__libzfs_fini(libzfs_handle_t *hdl)
{
}
+7
View File
@@ -39,6 +39,13 @@ extern int zfs_slm_init(void);
extern void zfs_slm_fini(void); extern void zfs_slm_fini(void);
extern void zfs_slm_event(const char *, const char *, nvlist_t *); extern void zfs_slm_event(const char *, const char *, nvlist_t *);
/*
* In ZED context, all the FMA agents run in the same thread
* and do not require a unique libzfs instance.
*/
extern libzfs_handle_t *__libzfs_init(void);
extern void __libzfs_fini(libzfs_handle_t *);
#ifdef __cplusplus #ifdef __cplusplus
} }
#endif #endif
+73 -32
View File
@@ -26,7 +26,6 @@
*/ */
#include <stddef.h> #include <stddef.h>
#include <string.h>
#include <strings.h> #include <strings.h>
#include <libuutil.h> #include <libuutil.h>
#include <libzfs.h> #include <libzfs.h>
@@ -35,7 +34,6 @@
#include <sys/fs/zfs.h> #include <sys/fs/zfs.h>
#include <sys/fm/protocol.h> #include <sys/fm/protocol.h>
#include <sys/fm/fs/zfs.h> #include <sys/fm/fs/zfs.h>
#include <sys/zio.h>
#include "zfs_agents.h" #include "zfs_agents.h"
#include "fmd_api.h" #include "fmd_api.h"
@@ -169,12 +167,14 @@ zfs_case_unserialize(fmd_hdl_t *hdl, fmd_case_t *cp)
static void static void
zfs_mark_vdev(uint64_t pool_guid, nvlist_t *vd, er_timeval_t *loaded) zfs_mark_vdev(uint64_t pool_guid, nvlist_t *vd, er_timeval_t *loaded)
{ {
uint64_t vdev_guid = 0; uint64_t vdev_guid;
uint_t c, children; uint_t c, children;
nvlist_t **child; nvlist_t **child;
zfs_case_t *zcp; zfs_case_t *zcp;
int ret;
(void) nvlist_lookup_uint64(vd, ZPOOL_CONFIG_GUID, &vdev_guid); ret = nvlist_lookup_uint64(vd, ZPOOL_CONFIG_GUID, &vdev_guid);
assert(ret == 0);
/* /*
* Mark any cases associated with this (pool, vdev) pair. * Mark any cases associated with this (pool, vdev) pair.
@@ -253,10 +253,7 @@ zfs_mark_pool(zpool_handle_t *zhp, void *unused)
} }
ret = nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE, &vd); ret = nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE, &vd);
if (ret) { assert(ret == 0);
zpool_close(zhp);
return (-1);
}
zfs_mark_vdev(pool_guid, vd, &loaded); zfs_mark_vdev(pool_guid, vd, &loaded);
@@ -380,6 +377,11 @@ zfs_case_solve(fmd_hdl_t *hdl, zfs_case_t *zcp, const char *faultname,
nvlist_t *detector, *fault; nvlist_t *detector, *fault;
boolean_t serialize; boolean_t serialize;
nvlist_t *fru = NULL; nvlist_t *fru = NULL;
#ifdef HAVE_LIBTOPO
nvlist_t *fmri;
topo_hdl_t *thp;
int err;
#endif
fmd_hdl_debug(hdl, "solving fault '%s'", faultname); fmd_hdl_debug(hdl, "solving fault '%s'", faultname);
/* /*
@@ -398,6 +400,64 @@ zfs_case_solve(fmd_hdl_t *hdl, zfs_case_t *zcp, const char *faultname,
zcp->zc_data.zc_vdev_guid); zcp->zc_data.zc_vdev_guid);
} }
#ifdef HAVE_LIBTOPO
/*
* We also want to make sure that the detector (pool or vdev) properly
* reflects the diagnosed state, when the fault corresponds to internal
* ZFS state (i.e. not checksum or I/O error-induced). Otherwise, a
* device which was unavailable early in boot (because the driver/file
* wasn't available) and is now healthy will be mis-diagnosed.
*/
if (!fmd_nvl_fmri_present(hdl, detector) ||
(checkunusable && !fmd_nvl_fmri_unusable(hdl, detector))) {
fmd_case_close(hdl, zcp->zc_case);
nvlist_free(detector);
return;
}
fru = NULL;
if (zcp->zc_fru != NULL &&
(thp = fmd_hdl_topo_hold(hdl, TOPO_VERSION)) != NULL) {
/*
* If the vdev had an associated FRU, then get the FRU nvlist
* from the topo handle and use that in the suspect list. We
* explicitly lookup the FRU because the fmri reported from the
* kernel may not have up to date details about the disk itself
* (serial, part, etc).
*/
if (topo_fmri_str2nvl(thp, zcp->zc_fru, &fmri, &err) == 0) {
libzfs_handle_t *zhdl = fmd_hdl_getspecific(hdl);
/*
* If the disk is part of the system chassis, but the
* FRU indicates a different chassis ID than our
* current system, then ignore the error. This
* indicates that the device was part of another
* cluster head, and for obvious reasons cannot be
* imported on this system.
*/
if (libzfs_fru_notself(zhdl, zcp->zc_fru)) {
fmd_case_close(hdl, zcp->zc_case);
nvlist_free(fmri);
fmd_hdl_topo_rele(hdl, thp);
nvlist_free(detector);
return;
}
/*
* If the device is no longer present on the system, or
* topo_fmri_fru() fails for other reasons, then fall
* back to the fmri specified in the vdev.
*/
if (topo_fmri_fru(thp, fmri, &fru, &err) != 0)
fru = fmd_nvl_dup(hdl, fmri, FMD_SLEEP);
nvlist_free(fmri);
}
fmd_hdl_topo_rele(hdl, thp);
}
#endif
fault = fmd_nvl_create_fault(hdl, faultname, 100, detector, fault = fmd_nvl_create_fault(hdl, faultname, 100, detector,
fru, detector); fru, detector);
fmd_case_add_suspect(hdl, zcp->zc_case, fault); fmd_case_add_suspect(hdl, zcp->zc_case, fault);
@@ -774,8 +834,6 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_PROBE_FAILURE))) { ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_PROBE_FAILURE))) {
char *failmode = NULL; char *failmode = NULL;
boolean_t checkremove = B_FALSE; boolean_t checkremove = B_FALSE;
uint32_t pri = 0;
int32_t flags = 0;
/* /*
* If this is a checksum or I/O error, then toss it into the * If this is a checksum or I/O error, then toss it into the
@@ -798,23 +856,6 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
checkremove = B_TRUE; checkremove = B_TRUE;
} else if (fmd_nvl_class_match(hdl, nvl, } else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_CHECKSUM))) { ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_CHECKSUM))) {
/*
* We ignore ereports for checksum errors generated by
* scrub/resilver I/O to avoid potentially further
* degrading the pool while it's being repaired.
*/
if (((nvlist_lookup_uint32(nvl,
FM_EREPORT_PAYLOAD_ZFS_ZIO_PRIORITY, &pri) == 0) &&
(pri == ZIO_PRIORITY_SCRUB ||
pri == ZIO_PRIORITY_REBUILD)) ||
((nvlist_lookup_int32(nvl,
FM_EREPORT_PAYLOAD_ZFS_ZIO_FLAGS, &flags) == 0) &&
(flags & (ZIO_FLAG_SCRUB | ZIO_FLAG_RESILVER)))) {
fmd_hdl_debug(hdl, "ignoring '%s' for "
"scrub/resilver I/O", class);
return;
}
if (zcp->zc_data.zc_serd_checksum[0] == '\0') { if (zcp->zc_data.zc_serd_checksum[0] == '\0') {
zfs_serd_name(zcp->zc_data.zc_serd_checksum, zfs_serd_name(zcp->zc_data.zc_serd_checksum,
pool_guid, vdev_guid, "checksum"); pool_guid, vdev_guid, "checksum");
@@ -941,27 +982,27 @@ _zfs_diagnosis_init(fmd_hdl_t *hdl)
{ {
libzfs_handle_t *zhdl; libzfs_handle_t *zhdl;
if ((zhdl = libzfs_init()) == NULL) if ((zhdl = __libzfs_init()) == NULL)
return; return;
if ((zfs_case_pool = uu_list_pool_create("zfs_case_pool", if ((zfs_case_pool = uu_list_pool_create("zfs_case_pool",
sizeof (zfs_case_t), offsetof(zfs_case_t, zc_node), sizeof (zfs_case_t), offsetof(zfs_case_t, zc_node),
NULL, UU_LIST_POOL_DEBUG)) == NULL) { NULL, UU_LIST_POOL_DEBUG)) == NULL) {
libzfs_fini(zhdl); __libzfs_fini(zhdl);
return; return;
} }
if ((zfs_cases = uu_list_create(zfs_case_pool, NULL, if ((zfs_cases = uu_list_create(zfs_case_pool, NULL,
UU_LIST_DEBUG)) == NULL) { UU_LIST_DEBUG)) == NULL) {
uu_list_pool_destroy(zfs_case_pool); uu_list_pool_destroy(zfs_case_pool);
libzfs_fini(zhdl); __libzfs_fini(zhdl);
return; return;
} }
if (fmd_hdl_register(hdl, FMD_API_VERSION, &fmd_info) != 0) { if (fmd_hdl_register(hdl, FMD_API_VERSION, &fmd_info) != 0) {
uu_list_destroy(zfs_cases); uu_list_destroy(zfs_cases);
uu_list_pool_destroy(zfs_case_pool); uu_list_pool_destroy(zfs_case_pool);
libzfs_fini(zhdl); __libzfs_fini(zhdl);
return; return;
} }
@@ -997,5 +1038,5 @@ _zfs_diagnosis_fini(fmd_hdl_t *hdl)
uu_list_pool_destroy(zfs_case_pool); uu_list_pool_destroy(zfs_case_pool);
zhdl = fmd_hdl_getspecific(hdl); zhdl = fmd_hdl_getspecific(hdl);
libzfs_fini(zhdl); __libzfs_fini(zhdl);
} }
+101 -470
View File
@@ -23,7 +23,6 @@
* Copyright (c) 2012 by Delphix. All rights reserved. * Copyright (c) 2012 by Delphix. All rights reserved.
* Copyright 2014 Nexenta Systems, Inc. All rights reserved. * Copyright 2014 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2016, 2017, Intel Corporation. * Copyright (c) 2016, 2017, Intel Corporation.
* Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
*/ */
/* /*
@@ -63,14 +62,17 @@
* If the device could not be replaced, then the second online attempt will * If the device could not be replaced, then the second online attempt will
* trigger the FMA fault that we skipped earlier. * trigger the FMA fault that we skipped earlier.
* *
* On Linux udev provides a disk insert for both the disk and the partition. * ZFS on Linux porting notes:
* In lieu of a thread pool, just spawn a thread on demmand.
* Linux udev provides a disk insert for both the disk and the partition
*
*/ */
#include <ctype.h> #include <ctype.h>
#include <devid.h>
#include <fcntl.h> #include <fcntl.h>
#include <libnvpair.h> #include <libnvpair.h>
#include <libzfs.h> #include <libzfs.h>
#include <libzutil.h>
#include <limits.h> #include <limits.h>
#include <stddef.h> #include <stddef.h>
#include <stdlib.h> #include <stdlib.h>
@@ -80,10 +82,8 @@
#include <sys/sunddi.h> #include <sys/sunddi.h>
#include <sys/sysevent/eventdefs.h> #include <sys/sysevent/eventdefs.h>
#include <sys/sysevent/dev.h> #include <sys/sysevent/dev.h>
#include <thread_pool.h>
#include <pthread.h> #include <pthread.h>
#include <unistd.h> #include <unistd.h>
#include <errno.h>
#include "zfs_agents.h" #include "zfs_agents.h"
#include "../zed_log.h" #include "../zed_log.h"
@@ -96,12 +96,12 @@ typedef void (*zfs_process_func_t)(zpool_handle_t *, nvlist_t *, boolean_t);
libzfs_handle_t *g_zfshdl; libzfs_handle_t *g_zfshdl;
list_t g_pool_list; /* list of unavailable pools at initialization */ list_t g_pool_list; /* list of unavailable pools at initialization */
list_t g_device_list; /* list of disks with asynchronous label request */ list_t g_device_list; /* list of disks with asynchronous label request */
tpool_t *g_tpool;
boolean_t g_enumeration_done; boolean_t g_enumeration_done;
pthread_t g_zfs_tid; /* zfs_enum_pools() thread */ pthread_t g_zfs_tid;
typedef struct unavailpool { typedef struct unavailpool {
zpool_handle_t *uap_zhp; zpool_handle_t *uap_zhp;
pthread_t uap_enable_tid; /* dataset enable thread if activated */
list_node_t uap_node; list_node_t uap_node;
} unavailpool_t; } unavailpool_t;
@@ -134,6 +134,7 @@ zfs_unavail_pool(zpool_handle_t *zhp, void *data)
unavailpool_t *uap; unavailpool_t *uap;
uap = malloc(sizeof (unavailpool_t)); uap = malloc(sizeof (unavailpool_t));
uap->uap_zhp = zhp; uap->uap_zhp = zhp;
uap->uap_enable_tid = 0;
list_insert_tail((list_t *)data, uap); list_insert_tail((list_t *)data, uap);
} else { } else {
zpool_close(zhp); zpool_close(zhp);
@@ -154,7 +155,7 @@ zfs_unavail_pool(zpool_handle_t *zhp, void *data)
* 1. physical match with no fs, no partition * 1. physical match with no fs, no partition
* tag it top, partition disk * tag it top, partition disk
* *
* 2. physical match again, see partition and tag * 2. physical match again, see partion and tag
* *
*/ */
@@ -183,14 +184,14 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
nvlist_t *nvroot, *newvd; nvlist_t *nvroot, *newvd;
pendingdev_t *device; pendingdev_t *device;
uint64_t wholedisk = 0ULL; uint64_t wholedisk = 0ULL;
uint64_t offline = 0ULL, faulted = 0ULL; uint64_t offline = 0ULL;
uint64_t guid = 0ULL; uint64_t guid = 0ULL;
char *physpath = NULL, *new_devid = NULL, *enc_sysfs_path = NULL; char *physpath = NULL, *new_devid = NULL, *enc_sysfs_path = NULL;
char rawpath[PATH_MAX], fullpath[PATH_MAX]; char rawpath[PATH_MAX], fullpath[PATH_MAX];
char devpath[PATH_MAX]; char devpath[PATH_MAX];
int ret; int ret;
boolean_t is_sd = B_FALSE; int is_dm = 0;
boolean_t is_mpath_wholedisk = B_FALSE; int is_sd = 0;
uint_t c; uint_t c;
vdev_stat_t *vs; vdev_stat_t *vs;
@@ -211,73 +212,15 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
&enc_sysfs_path); &enc_sysfs_path);
(void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_WHOLE_DISK, &wholedisk); (void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_WHOLE_DISK, &wholedisk);
(void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_OFFLINE, &offline); (void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_OFFLINE, &offline);
(void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_FAULTED, &faulted);
(void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_GUID, &guid); (void) nvlist_lookup_uint64(vdev, ZPOOL_CONFIG_GUID, &guid);
/* if (offline)
* Special case: return; /* don't intervene if it was taken offline */
*
* We've seen times where a disk won't have a ZPOOL_CONFIG_PHYS_PATH
* entry in their config. For example, on this force-faulted disk:
*
* children[0]:
* type: 'disk'
* id: 0
* guid: 14309659774640089719
* path: '/dev/disk/by-vdev/L28'
* whole_disk: 0
* DTL: 654
* create_txg: 4
* com.delphix:vdev_zap_leaf: 1161
* faulted: 1
* aux_state: 'external'
* children[1]:
* type: 'disk'
* id: 1
* guid: 16002508084177980912
* path: '/dev/disk/by-vdev/L29'
* devid: 'dm-uuid-mpath-35000c500a61d68a3'
* phys_path: 'L29'
* vdev_enc_sysfs_path: '/sys/class/enclosure/0:0:1:0/SLOT 30 32'
* whole_disk: 0
* DTL: 1028
* create_txg: 4
* com.delphix:vdev_zap_leaf: 131
*
* If the disk's path is a /dev/disk/by-vdev/ path, then we can infer
* the ZPOOL_CONFIG_PHYS_PATH from the by-vdev disk name.
*/
if (physpath == NULL && path != NULL) {
/* If path begins with "/dev/disk/by-vdev/" ... */
if (strncmp(path, DEV_BYVDEV_PATH,
strlen(DEV_BYVDEV_PATH)) == 0) {
/* Set physpath to the char after "/dev/disk/by-vdev" */
physpath = &path[strlen(DEV_BYVDEV_PATH)];
}
}
/* is_dm = zfs_dev_is_dm(path);
* We don't want to autoreplace offlined disks. However, we do want to
* replace force-faulted disks (`zpool offline -f`). Force-faulted
* disks have both offline=1 and faulted=1 in the nvlist.
*/
if (offline && !faulted) {
zed_log_msg(LOG_INFO, "%s: %s is offline, skip autoreplace",
__func__, path);
return;
}
is_mpath_wholedisk = is_mpath_whole_disk(path);
zed_log_msg(LOG_INFO, "zfs_process_add: pool '%s' vdev '%s', phys '%s'" zed_log_msg(LOG_INFO, "zfs_process_add: pool '%s' vdev '%s', phys '%s'"
" %s blank disk, %s mpath blank disk, %s labeled, enc sysfs '%s', " " wholedisk %d, dm %d (%llu)", zpool_get_name(zhp), path,
"(guid %llu)", physpath ? physpath : "NULL", wholedisk, is_dm,
zpool_get_name(zhp), path,
physpath ? physpath : "NULL",
wholedisk ? "is" : "not",
is_mpath_wholedisk? "is" : "not",
labeled ? "is" : "not",
enc_sysfs_path,
(long long unsigned int)guid); (long long unsigned int)guid);
/* /*
@@ -311,9 +254,8 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
ZFS_ONLINE_CHECKREMOVE | ZFS_ONLINE_UNSPARE, &newstate) == 0 && ZFS_ONLINE_CHECKREMOVE | ZFS_ONLINE_UNSPARE, &newstate) == 0 &&
(newstate == VDEV_STATE_HEALTHY || (newstate == VDEV_STATE_HEALTHY ||
newstate == VDEV_STATE_DEGRADED)) { newstate == VDEV_STATE_DEGRADED)) {
zed_log_msg(LOG_INFO, zed_log_msg(LOG_INFO, " zpool_vdev_online: vdev %s is %s",
" zpool_vdev_online: vdev '%s' ('%s') is " fullpath, (newstate == VDEV_STATE_HEALTHY) ?
"%s", fullpath, physpath, (newstate == VDEV_STATE_HEALTHY) ?
"HEALTHY" : "DEGRADED"); "HEALTHY" : "DEGRADED");
return; return;
} }
@@ -323,19 +265,18 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
* testing) * testing)
*/ */
if (physpath != NULL && strcmp("scsidebug", physpath) == 0) if (physpath != NULL && strcmp("scsidebug", physpath) == 0)
is_sd = B_TRUE; is_sd = 1;
/* /*
* If the pool doesn't have the autoreplace property set, then use * If the pool doesn't have the autoreplace property set, then use
* vdev online to trigger a FMA fault by posting an ereport. * vdev online to trigger a FMA fault by posting an ereport.
*/ */
if (!zpool_get_prop_int(zhp, ZPOOL_PROP_AUTOREPLACE, NULL) || if (!zpool_get_prop_int(zhp, ZPOOL_PROP_AUTOREPLACE, NULL) ||
!(wholedisk || is_mpath_wholedisk) || (physpath == NULL)) { !(wholedisk || is_dm) || (physpath == NULL)) {
(void) zpool_vdev_online(zhp, fullpath, ZFS_ONLINE_FORCEFAULT, (void) zpool_vdev_online(zhp, fullpath, ZFS_ONLINE_FORCEFAULT,
&newstate); &newstate);
zed_log_msg(LOG_INFO, "Pool's autoreplace is not enabled or " zed_log_msg(LOG_INFO, "Pool's autoreplace is not enabled or "
"not a blank disk for '%s' ('%s')", fullpath, "not a whole disk for '%s'", fullpath);
physpath);
return; return;
} }
@@ -347,7 +288,7 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
(void) snprintf(rawpath, sizeof (rawpath), "%s%s", (void) snprintf(rawpath, sizeof (rawpath), "%s%s",
is_sd ? DEV_BYVDEV_PATH : DEV_BYPATH_PATH, physpath); is_sd ? DEV_BYVDEV_PATH : DEV_BYPATH_PATH, physpath);
if (realpath(rawpath, devpath) == NULL && !is_mpath_wholedisk) { if (realpath(rawpath, devpath) == NULL && !is_dm) {
zed_log_msg(LOG_INFO, " realpath: %s failed (%s)", zed_log_msg(LOG_INFO, " realpath: %s failed (%s)",
rawpath, strerror(errno)); rawpath, strerror(errno));
@@ -363,14 +304,12 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
if ((vs->vs_state != VDEV_STATE_DEGRADED) && if ((vs->vs_state != VDEV_STATE_DEGRADED) &&
(vs->vs_state != VDEV_STATE_FAULTED) && (vs->vs_state != VDEV_STATE_FAULTED) &&
(vs->vs_state != VDEV_STATE_CANT_OPEN)) { (vs->vs_state != VDEV_STATE_CANT_OPEN)) {
zed_log_msg(LOG_INFO, " not autoreplacing since disk isn't in "
"a bad state (currently %d)", vs->vs_state);
return; return;
} }
nvlist_lookup_string(vdev, "new_devid", &new_devid); nvlist_lookup_string(vdev, "new_devid", &new_devid);
if (is_mpath_wholedisk) { if (is_dm) {
/* Don't label device mapper or multipath disks. */ /* Don't label device mapper or multipath disks. */
} else if (!labeled) { } else if (!labeled) {
/* /*
@@ -487,25 +426,9 @@ zfs_process_add(zpool_handle_t *zhp, nvlist_t *vdev, boolean_t labeled)
nvlist_free(newvd); nvlist_free(newvd);
/* /*
* Wait for udev to verify the links exist, then auto-replace * auto replace a leaf disk at same physical location
* the leaf disk at same physical location.
*/ */
if (zpool_label_disk_wait(path, 3000) != 0) { ret = zpool_vdev_attach(zhp, fullpath, path, nvroot, B_TRUE);
zed_log_msg(LOG_WARNING, "zfs_mod: expected replacement "
"disk %s is missing", path);
nvlist_free(nvroot);
return;
}
/*
* Prefer sequential resilvering when supported (mirrors and dRAID),
* otherwise fallback to a traditional healing resilver.
*/
ret = zpool_vdev_attach(zhp, fullpath, path, nvroot, B_TRUE, B_TRUE);
if (ret != 0) {
ret = zpool_vdev_attach(zhp, fullpath, path, nvroot,
B_TRUE, B_FALSE);
}
zed_log_msg(LOG_INFO, " zpool_vdev_replace: %s with %s (%s)", zed_log_msg(LOG_INFO, " zpool_vdev_replace: %s with %s (%s)",
fullpath, path, (ret == 0) ? "no errors" : fullpath, path, (ret == 0) ? "no errors" :
@@ -525,7 +448,6 @@ typedef struct dev_data {
boolean_t dd_islabeled; boolean_t dd_islabeled;
uint64_t dd_pool_guid; uint64_t dd_pool_guid;
uint64_t dd_vdev_guid; uint64_t dd_vdev_guid;
uint64_t dd_new_vdev_guid;
const char *dd_new_devid; const char *dd_new_devid;
} dev_data_t; } dev_data_t;
@@ -536,7 +458,6 @@ zfs_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *data)
char *path = NULL; char *path = NULL;
uint_t c, children; uint_t c, children;
nvlist_t **child; nvlist_t **child;
uint64_t guid = 0;
/* /*
* First iterate over any children. * First iterate over any children.
@@ -545,33 +466,23 @@ zfs_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *data)
&child, &children) == 0) { &child, &children) == 0) {
for (c = 0; c < children; c++) for (c = 0; c < children; c++)
zfs_iter_vdev(zhp, child[c], data); zfs_iter_vdev(zhp, child[c], data);
} return;
/*
* Iterate over any spares and cache devices
*/
if (nvlist_lookup_nvlist_array(nvl, ZPOOL_CONFIG_SPARES,
&child, &children) == 0) {
for (c = 0; c < children; c++)
zfs_iter_vdev(zhp, child[c], data);
}
if (nvlist_lookup_nvlist_array(nvl, ZPOOL_CONFIG_L2CACHE,
&child, &children) == 0) {
for (c = 0; c < children; c++)
zfs_iter_vdev(zhp, child[c], data);
} }
/* once a vdev was matched and processed there is nothing left to do */ /* once a vdev was matched and processed there is nothing left to do */
if (dp->dd_found) if (dp->dd_found)
return; return;
(void) nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_GUID, &guid);
/* /*
* Match by GUID if available otherwise fallback to devid or physical * Match by GUID if available otherwise fallback to devid or physical
*/ */
if (dp->dd_vdev_guid != 0) { if (dp->dd_vdev_guid != 0) {
if (guid != dp->dd_vdev_guid) uint64_t guid;
if (nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_GUID,
&guid) != 0 || guid != dp->dd_vdev_guid) {
return; return;
}
zed_log_msg(LOG_INFO, " zfs_iter_vdev: matched on %llu", guid); zed_log_msg(LOG_INFO, " zfs_iter_vdev: matched on %llu", guid);
dp->dd_found = B_TRUE; dp->dd_found = B_TRUE;
@@ -581,25 +492,10 @@ zfs_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *data)
* illumos, substring matching is not required to accommodate * illumos, substring matching is not required to accommodate
* the partition suffix. An exact match will be present in * the partition suffix. An exact match will be present in
* the dp->dd_compare value. * the dp->dd_compare value.
* If the attached disk already contains a vdev GUID, it means
* the disk is not clean. In such a scenario, the physical path
* would be a match that makes the disk faulted when trying to
* online it. So, we would only want to proceed if either GUID
* matches with the last attached disk or the disk is in clean
* state.
*/ */
if (nvlist_lookup_string(nvl, dp->dd_prop, &path) != 0 || if (nvlist_lookup_string(nvl, dp->dd_prop, &path) != 0 ||
strcmp(dp->dd_compare, path) != 0) { strcmp(dp->dd_compare, path) != 0)
zed_log_msg(LOG_INFO, " %s: no match (%s != vdev %s)",
__func__, dp->dd_compare, path);
return; return;
}
if (dp->dd_new_vdev_guid != 0 && dp->dd_new_vdev_guid != guid) {
zed_log_msg(LOG_INFO, " %s: no match (GUID:%llu"
" != vdev GUID:%llu)", __func__,
dp->dd_new_vdev_guid, guid);
return;
}
zed_log_msg(LOG_INFO, " zfs_iter_vdev: matched %s on %s", zed_log_msg(LOG_INFO, " zfs_iter_vdev: matched %s on %s",
dp->dd_prop, path); dp->dd_prop, path);
@@ -615,14 +511,19 @@ zfs_iter_vdev(zpool_handle_t *zhp, nvlist_t *nvl, void *data)
(dp->dd_func)(zhp, nvl, dp->dd_islabeled); (dp->dd_func)(zhp, nvl, dp->dd_islabeled);
} }
static void static void *
zfs_enable_ds(void *arg) zfs_enable_ds(void *arg)
{ {
unavailpool_t *pool = (unavailpool_t *)arg; unavailpool_t *pool = (unavailpool_t *)arg;
assert(pool->uap_enable_tid = pthread_self());
(void) zpool_enable_datasets(pool->uap_zhp, NULL, 0); (void) zpool_enable_datasets(pool->uap_zhp, NULL, 0);
zpool_close(pool->uap_zhp); zpool_close(pool->uap_zhp);
free(pool); pool->uap_zhp = NULL;
/* Note: zfs_slm_fini() will cleanup this pool entry on exit */
return (NULL);
} }
static int static int
@@ -647,8 +548,6 @@ zfs_iter_pool(zpool_handle_t *zhp, void *data)
ZPOOL_CONFIG_VDEV_TREE, &nvl); ZPOOL_CONFIG_VDEV_TREE, &nvl);
zfs_iter_vdev(zhp, nvl, data); zfs_iter_vdev(zhp, nvl, data);
} }
} else {
zed_log_msg(LOG_INFO, "%s: no config\n", __func__);
} }
/* /*
@@ -659,13 +558,15 @@ zfs_iter_pool(zpool_handle_t *zhp, void *data)
for (pool = list_head(&g_pool_list); pool != NULL; for (pool = list_head(&g_pool_list); pool != NULL;
pool = list_next(&g_pool_list, pool)) { pool = list_next(&g_pool_list, pool)) {
if (pool->uap_enable_tid != 0)
continue; /* entry already processed */
if (strcmp(zpool_get_name(zhp), if (strcmp(zpool_get_name(zhp),
zpool_get_name(pool->uap_zhp))) zpool_get_name(pool->uap_zhp)))
continue; continue;
if (zfs_toplevel_state(zhp) >= VDEV_STATE_DEGRADED) { if (zfs_toplevel_state(zhp) >= VDEV_STATE_DEGRADED) {
list_remove(&g_pool_list, pool); /* send to a background thread; keep on list */
(void) tpool_dispatch(g_tpool, zfs_enable_ds, (void) pthread_create(&pool->uap_enable_tid,
pool); NULL, zfs_enable_ds, pool);
break; break;
} }
} }
@@ -681,7 +582,7 @@ zfs_iter_pool(zpool_handle_t *zhp, void *data)
*/ */
static boolean_t static boolean_t
devphys_iter(const char *physical, const char *devid, zfs_process_func_t func, devphys_iter(const char *physical, const char *devid, zfs_process_func_t func,
boolean_t is_slice, uint64_t new_vdev_guid) boolean_t is_slice)
{ {
dev_data_t data = { 0 }; dev_data_t data = { 0 };
@@ -691,73 +592,6 @@ devphys_iter(const char *physical, const char *devid, zfs_process_func_t func,
data.dd_found = B_FALSE; data.dd_found = B_FALSE;
data.dd_islabeled = is_slice; data.dd_islabeled = is_slice;
data.dd_new_devid = devid; /* used by auto replace code */ data.dd_new_devid = devid; /* used by auto replace code */
data.dd_new_vdev_guid = new_vdev_guid;
(void) zpool_iter(g_zfshdl, zfs_iter_pool, &data);
return (data.dd_found);
}
/*
* Given a device identifier, find any vdevs with a matching by-vdev
* path. Normally we shouldn't need this as the comparison would be
* made earlier in the devphys_iter(). For example, if we were replacing
* /dev/disk/by-vdev/L28, normally devphys_iter() would match the
* ZPOOL_CONFIG_PHYS_PATH of "L28" from the old disk config to "L28"
* of the new disk config. However, we've seen cases where
* ZPOOL_CONFIG_PHYS_PATH was not in the config for the old disk. Here's
* an example of a real 2-disk mirror pool where one disk was force
* faulted:
*
* com.delphix:vdev_zap_top: 129
* children[0]:
* type: 'disk'
* id: 0
* guid: 14309659774640089719
* path: '/dev/disk/by-vdev/L28'
* whole_disk: 0
* DTL: 654
* create_txg: 4
* com.delphix:vdev_zap_leaf: 1161
* faulted: 1
* aux_state: 'external'
* children[1]:
* type: 'disk'
* id: 1
* guid: 16002508084177980912
* path: '/dev/disk/by-vdev/L29'
* devid: 'dm-uuid-mpath-35000c500a61d68a3'
* phys_path: 'L29'
* vdev_enc_sysfs_path: '/sys/class/enclosure/0:0:1:0/SLOT 30 32'
* whole_disk: 0
* DTL: 1028
* create_txg: 4
* com.delphix:vdev_zap_leaf: 131
*
* So in the case above, the only thing we could compare is the path.
*
* We can do this because we assume by-vdev paths are authoritative as physical
* paths. We could not assume this for normal paths like /dev/sda since the
* physical location /dev/sda points to could change over time.
*/
static boolean_t
by_vdev_path_iter(const char *by_vdev_path, const char *devid,
zfs_process_func_t func, boolean_t is_slice)
{
dev_data_t data = { 0 };
data.dd_compare = by_vdev_path;
data.dd_func = func;
data.dd_prop = ZPOOL_CONFIG_PATH;
data.dd_found = B_FALSE;
data.dd_islabeled = is_slice;
data.dd_new_devid = devid;
if (strncmp(by_vdev_path, DEV_BYVDEV_PATH,
strlen(DEV_BYVDEV_PATH)) != 0) {
/* by_vdev_path doesn't start with "/dev/disk/by-vdev/" */
return (B_FALSE);
}
(void) zpool_iter(g_zfshdl, zfs_iter_pool, &data); (void) zpool_iter(g_zfshdl, zfs_iter_pool, &data);
@@ -785,27 +619,6 @@ devid_iter(const char *devid, zfs_process_func_t func, boolean_t is_slice)
return (data.dd_found); return (data.dd_found);
} }
/*
* Given a device guid, find any vdevs with a matching guid.
*/
static boolean_t
guid_iter(uint64_t pool_guid, uint64_t vdev_guid, const char *devid,
zfs_process_func_t func, boolean_t is_slice)
{
dev_data_t data = { 0 };
data.dd_func = func;
data.dd_found = B_FALSE;
data.dd_pool_guid = pool_guid;
data.dd_vdev_guid = vdev_guid;
data.dd_islabeled = is_slice;
data.dd_new_devid = devid;
(void) zpool_iter(g_zfshdl, zfs_iter_pool, &data);
return (data.dd_found);
}
/* /*
* Handle a EC_DEV_ADD.ESC_DISK event. * Handle a EC_DEV_ADD.ESC_DISK event.
* *
@@ -828,21 +641,16 @@ guid_iter(uint64_t pool_guid, uint64_t vdev_guid, const char *devid,
static int static int
zfs_deliver_add(nvlist_t *nvl, boolean_t is_lofi) zfs_deliver_add(nvlist_t *nvl, boolean_t is_lofi)
{ {
char *devpath = NULL, *devid = NULL; char *devpath = NULL, *devid;
uint64_t pool_guid = 0, vdev_guid = 0;
boolean_t is_slice; boolean_t is_slice;
/* /*
* Expecting a devid string and an optional physical location and guid * Expecting a devid string and an optional physical location
*/ */
if (nvlist_lookup_string(nvl, DEV_IDENTIFIER, &devid) != 0) { if (nvlist_lookup_string(nvl, DEV_IDENTIFIER, &devid) != 0)
zed_log_msg(LOG_INFO, "%s: no dev identifier\n", __func__);
return (-1); return (-1);
}
(void) nvlist_lookup_string(nvl, DEV_PHYS_PATH, &devpath); (void) nvlist_lookup_string(nvl, DEV_PHYS_PATH, &devpath);
(void) nvlist_lookup_uint64(nvl, ZFS_EV_POOL_GUID, &pool_guid);
(void) nvlist_lookup_uint64(nvl, ZFS_EV_VDEV_GUID, &vdev_guid);
is_slice = (nvlist_lookup_boolean(nvl, DEV_IS_PART) == 0); is_slice = (nvlist_lookup_boolean(nvl, DEV_IS_PART) == 0);
@@ -850,31 +658,15 @@ zfs_deliver_add(nvlist_t *nvl, boolean_t is_lofi)
devid, devpath ? devpath : "NULL", is_slice); devid, devpath ? devpath : "NULL", is_slice);
/* /*
* Iterate over all vdevs looking for a match in the following order: * Iterate over all vdevs looking for a match in the folllowing order:
* 1. ZPOOL_CONFIG_DEVID (identifies the unique disk) * 1. ZPOOL_CONFIG_DEVID (identifies the unique disk)
* 2. ZPOOL_CONFIG_PHYS_PATH (identifies disk physical location). * 2. ZPOOL_CONFIG_PHYS_PATH (identifies disk physical location).
* 3. ZPOOL_CONFIG_GUID (identifies unique vdev). *
* 4. ZPOOL_CONFIG_PATH for /dev/disk/by-vdev devices only (since * For disks, we only want to pay attention to vdevs marked as whole
* by-vdev paths represent physical paths). * disks or are a multipath device.
*/ */
if (devid_iter(devid, zfs_process_add, is_slice)) if (!devid_iter(devid, zfs_process_add, is_slice) && devpath != NULL)
return (0); (void) devphys_iter(devpath, devid, zfs_process_add, is_slice);
if (devpath != NULL && devphys_iter(devpath, devid, zfs_process_add,
is_slice, vdev_guid))
return (0);
if (vdev_guid != 0)
(void) guid_iter(pool_guid, vdev_guid, devid, zfs_process_add,
is_slice);
if (devpath != NULL) {
/* Can we match a /dev/disk/by-vdev/ path? */
char by_vdev_path[MAXPATHLEN];
snprintf(by_vdev_path, sizeof (by_vdev_path),
"/dev/disk/by-vdev/%s", devpath);
if (by_vdev_path_iter(by_vdev_path, devid, zfs_process_add,
is_slice))
return (0);
}
return (0); return (0);
} }
@@ -906,88 +698,13 @@ zfs_deliver_check(nvlist_t *nvl)
return (0); return (0);
} }
/*
* Given a path to a vdev, lookup the vdev's physical size from its
* config nvlist.
*
* Returns the vdev's physical size in bytes on success, 0 on error.
*/
static uint64_t
vdev_size_from_config(zpool_handle_t *zhp, const char *vdev_path)
{
nvlist_t *nvl = NULL;
boolean_t avail_spare, l2cache, log;
vdev_stat_t *vs = NULL;
uint_t c;
nvl = zpool_find_vdev(zhp, vdev_path, &avail_spare, &l2cache, &log);
if (!nvl)
return (0);
verify(nvlist_lookup_uint64_array(nvl, ZPOOL_CONFIG_VDEV_STATS,
(uint64_t **)&vs, &c) == 0);
if (!vs) {
zed_log_msg(LOG_INFO, "%s: no nvlist for '%s'", __func__,
vdev_path);
return (0);
}
return (vs->vs_pspace);
}
/*
* Given a path to a vdev, lookup if the vdev is a "whole disk" in the
* config nvlist. "whole disk" means that ZFS was passed a whole disk
* at pool creation time, which it partitioned up and has full control over.
* Thus a partition with wholedisk=1 set tells us that zfs created the
* partition at creation time. A partition without whole disk set would have
* been created by externally (like with fdisk) and passed to ZFS.
*
* Returns the whole disk value (either 0 or 1).
*/
static uint64_t
vdev_whole_disk_from_config(zpool_handle_t *zhp, const char *vdev_path)
{
nvlist_t *nvl = NULL;
boolean_t avail_spare, l2cache, log;
uint64_t wholedisk = 0;
nvl = zpool_find_vdev(zhp, vdev_path, &avail_spare, &l2cache, &log);
if (!nvl)
return (0);
(void) nvlist_lookup_uint64(nvl, ZPOOL_CONFIG_WHOLE_DISK, &wholedisk);
return (wholedisk);
}
/*
* If the device size grew more than 1% then return true.
*/
#define DEVICE_GREW(oldsize, newsize) \
((newsize > oldsize) && \
((newsize / (newsize - oldsize)) <= 100))
static int static int
zfsdle_vdev_online(zpool_handle_t *zhp, void *data) zfsdle_vdev_online(zpool_handle_t *zhp, void *data)
{ {
char *devname = data;
boolean_t avail_spare, l2cache; boolean_t avail_spare, l2cache;
nvlist_t *udev_nvl = data; vdev_state_t newstate;
nvlist_t *tgt; nvlist_t *tgt;
int error;
char *tmp_devname, devname[MAXPATHLEN] = "";
uint64_t guid;
if (nvlist_lookup_uint64(udev_nvl, ZFS_EV_VDEV_GUID, &guid) == 0) {
sprintf(devname, "%llu", (u_longlong_t)guid);
} else if (nvlist_lookup_string(udev_nvl, DEV_PHYS_PATH,
&tmp_devname) == 0) {
strlcpy(devname, tmp_devname, MAXPATHLEN);
zfs_append_partition(devname, MAXPATHLEN);
} else {
zed_log_msg(LOG_INFO, "%s: no guid or physpath", __func__);
}
zed_log_msg(LOG_INFO, "zfsdle_vdev_online: searching for '%s' in '%s'", zed_log_msg(LOG_INFO, "zfsdle_vdev_online: searching for '%s' in '%s'",
devname, zpool_get_name(zhp)); devname, zpool_get_name(zhp));
@@ -995,119 +712,40 @@ zfsdle_vdev_online(zpool_handle_t *zhp, void *data)
if ((tgt = zpool_find_vdev_by_physpath(zhp, devname, if ((tgt = zpool_find_vdev_by_physpath(zhp, devname,
&avail_spare, &l2cache, NULL)) != NULL) { &avail_spare, &l2cache, NULL)) != NULL) {
char *path, fullpath[MAXPATHLEN]; char *path, fullpath[MAXPATHLEN];
uint64_t wholedisk = 0; uint64_t wholedisk = 0ULL;
error = nvlist_lookup_string(tgt, ZPOOL_CONFIG_PATH, &path); verify(nvlist_lookup_string(tgt, ZPOOL_CONFIG_PATH,
if (error) { &path) == 0);
zpool_close(zhp); verify(nvlist_lookup_uint64(tgt, ZPOOL_CONFIG_WHOLE_DISK,
return (0); &wholedisk) == 0);
}
(void) nvlist_lookup_uint64(tgt, ZPOOL_CONFIG_WHOLE_DISK,
&wholedisk);
(void) strlcpy(fullpath, path, sizeof (fullpath));
if (wholedisk) { if (wholedisk) {
path = strrchr(path, '/'); char *spath = zfs_strip_partition(fullpath);
if (path != NULL) { if (!spath) {
path = zfs_strip_partition(path + 1); zed_log_msg(LOG_INFO, "%s: Can't alloc",
if (path == NULL) { __func__);
zpool_close(zhp);
return (0);
}
} else {
zpool_close(zhp);
return (0); return (0);
} }
(void) strlcpy(fullpath, path, sizeof (fullpath)); (void) strlcpy(fullpath, spath, sizeof (fullpath));
free(path); free(spath);
/* /*
* We need to reopen the pool associated with this * We need to reopen the pool associated with this
* device so that the kernel can update the size of * device so that the kernel can update the size
* the expanded device. When expanding there is no * of the expanded device.
* need to restart the scrub from the beginning.
*/ */
boolean_t scrub_restart = B_FALSE; (void) zpool_reopen(zhp);
(void) zpool_reopen_one(zhp, &scrub_restart);
} else {
(void) strlcpy(fullpath, path, sizeof (fullpath));
} }
if (zpool_get_prop_int(zhp, ZPOOL_PROP_AUTOEXPAND, NULL)) { if (zpool_get_prop_int(zhp, ZPOOL_PROP_AUTOEXPAND, NULL)) {
vdev_state_t newstate; zed_log_msg(LOG_INFO, "zfsdle_vdev_online: setting "
"device '%s' to ONLINE state in pool '%s'",
if (zpool_get_state(zhp) != POOL_STATE_UNAVAIL) { fullpath, zpool_get_name(zhp));
/* if (zpool_get_state(zhp) != POOL_STATE_UNAVAIL)
* If this disk size has not changed, then (void) zpool_vdev_online(zhp, fullpath, 0,
* there's no need to do an autoexpand. To &newstate);
* check we look at the disk's size in its
* config, and compare it to the disk size
* that udev is reporting.
*/
uint64_t udev_size = 0, conf_size = 0,
wholedisk = 0, udev_parent_size = 0;
/*
* Get the size of our disk that udev is
* reporting.
*/
if (nvlist_lookup_uint64(udev_nvl, DEV_SIZE,
&udev_size) != 0) {
udev_size = 0;
}
/*
* Get the size of our disk's parent device
* from udev (where sda1's parent is sda).
*/
if (nvlist_lookup_uint64(udev_nvl,
DEV_PARENT_SIZE, &udev_parent_size) != 0) {
udev_parent_size = 0;
}
conf_size = vdev_size_from_config(zhp,
fullpath);
wholedisk = vdev_whole_disk_from_config(zhp,
fullpath);
/*
* Only attempt an autoexpand if the vdev size
* changed. There are two different cases
* to consider.
*
* 1. wholedisk=1
* If you do a 'zpool create' on a whole disk
* (like /dev/sda), then zfs will create
* partitions on the disk (like /dev/sda1). In
* that case, wholedisk=1 will be set in the
* partition's nvlist config. So zed will need
* to see if your parent device (/dev/sda)
* expanded in size, and if so, then attempt
* the autoexpand.
*
* 2. wholedisk=0
* If you do a 'zpool create' on an existing
* partition, or a device that doesn't allow
* partitions, then wholedisk=0, and you will
* simply need to check if the device itself
* expanded in size.
*/
if (DEVICE_GREW(conf_size, udev_size) ||
(wholedisk && DEVICE_GREW(conf_size,
udev_parent_size))) {
error = zpool_vdev_online(zhp, fullpath,
0, &newstate);
zed_log_msg(LOG_INFO,
"%s: autoexpanding '%s' from %llu"
" to %llu bytes in pool '%s': %d",
__func__, fullpath, conf_size,
MAX(udev_size, udev_parent_size),
zpool_get_name(zhp), error);
}
}
} }
zpool_close(zhp); zpool_close(zhp);
return (1); return (1);
@@ -1117,33 +755,23 @@ zfsdle_vdev_online(zpool_handle_t *zhp, void *data)
} }
/* /*
* This function handles the ESC_DEV_DLE device change event. Use the * This function handles the ESC_DEV_DLE event.
* provided vdev guid when looking up a disk or partition, when the guid
* is not present assume the entire disk is owned by ZFS and append the
* expected -part1 partition information then lookup by physical path.
*/ */
static int static int
zfs_deliver_dle(nvlist_t *nvl) zfs_deliver_dle(nvlist_t *nvl)
{ {
char *devname, name[MAXPATHLEN]; char *devname;
uint64_t guid;
if (nvlist_lookup_uint64(nvl, ZFS_EV_VDEV_GUID, &guid) == 0) { if (nvlist_lookup_string(nvl, DEV_PHYS_PATH, &devname) != 0) {
sprintf(name, "%llu", (u_longlong_t)guid); zed_log_msg(LOG_INFO, "zfs_deliver_dle: no physpath");
} else if (nvlist_lookup_string(nvl, DEV_PHYS_PATH, &devname) == 0) { return (-1);
strlcpy(name, devname, MAXPATHLEN);
zfs_append_partition(name, MAXPATHLEN);
} else {
sprintf(name, "unknown");
zed_log_msg(LOG_INFO, "zfs_deliver_dle: no guid or physpath");
} }
if (zpool_iter(g_zfshdl, zfsdle_vdev_online, nvl) != 1) { if (zpool_iter(g_zfshdl, zfsdle_vdev_online, devname) != 1) {
zed_log_msg(LOG_INFO, "zfs_deliver_dle: device '%s' not " zed_log_msg(LOG_INFO, "zfs_deliver_dle: device '%s' not "
"found", name); "found", devname);
return (1); return (1);
} }
return (0); return (0);
} }
@@ -1221,12 +849,12 @@ zfs_enum_pools(void *arg)
* *
* sent messages from zevents or udev monitor * sent messages from zevents or udev monitor
* *
* For now, each agent has its own libzfs instance * For now, each agent has it's own libzfs instance
*/ */
int int
zfs_slm_init(void) zfs_slm_init()
{ {
if ((g_zfshdl = libzfs_init()) == NULL) if ((g_zfshdl = __libzfs_init()) == NULL)
return (-1); return (-1);
/* /*
@@ -1238,11 +866,10 @@ zfs_slm_init(void)
if (pthread_create(&g_zfs_tid, NULL, zfs_enum_pools, NULL) != 0) { if (pthread_create(&g_zfs_tid, NULL, zfs_enum_pools, NULL) != 0) {
list_destroy(&g_pool_list); list_destroy(&g_pool_list);
libzfs_fini(g_zfshdl); __libzfs_fini(g_zfshdl);
return (-1); return (-1);
} }
pthread_setname_np(g_zfs_tid, "enum-pools");
list_create(&g_device_list, sizeof (struct pendingdev), list_create(&g_device_list, sizeof (struct pendingdev),
offsetof(struct pendingdev, pd_node)); offsetof(struct pendingdev, pd_node));
@@ -1250,22 +877,26 @@ zfs_slm_init(void)
} }
void void
zfs_slm_fini(void) zfs_slm_fini()
{ {
unavailpool_t *pool; unavailpool_t *pool;
pendingdev_t *device; pendingdev_t *device;
/* wait for zfs_enum_pools thread to complete */ /* wait for zfs_enum_pools thread to complete */
(void) pthread_join(g_zfs_tid, NULL); (void) pthread_join(g_zfs_tid, NULL);
/* destroy the thread pool */
if (g_tpool != NULL) {
tpool_wait(g_tpool);
tpool_destroy(g_tpool);
}
while ((pool = (list_head(&g_pool_list))) != NULL) { while ((pool = (list_head(&g_pool_list))) != NULL) {
/*
* each pool entry has two possibilities
* 1. was made available (so wait for zfs_enable_ds thread)
* 2. still unavailable (just close the pool)
*/
if (pool->uap_enable_tid)
(void) pthread_join(pool->uap_enable_tid, NULL);
else if (pool->uap_zhp != NULL)
zpool_close(pool->uap_zhp);
list_remove(&g_pool_list, pool); list_remove(&g_pool_list, pool);
zpool_close(pool->uap_zhp);
free(pool); free(pool);
} }
list_destroy(&g_pool_list); list_destroy(&g_pool_list);
@@ -1276,7 +907,7 @@ zfs_slm_fini(void)
} }
list_destroy(&g_device_list); list_destroy(&g_device_list);
libzfs_fini(g_zfshdl); __libzfs_fini(g_zfshdl);
} }
void void
+159 -82
View File
@@ -22,7 +22,6 @@
* Copyright (c) 2006, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2006, 2010, Oracle and/or its affiliates. All rights reserved.
* *
* Copyright (c) 2016, Intel Corporation. * Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2018, loli10K <ezomori.nozomu@gmail.com>
*/ */
/* /*
@@ -40,7 +39,6 @@
#include <sys/fm/fs/zfs.h> #include <sys/fm/fs/zfs.h>
#include <libzfs.h> #include <libzfs.h>
#include <string.h> #include <string.h>
#include <libgen.h>
#include "zfs_agents.h" #include "zfs_agents.h"
#include "fmd_api.h" #include "fmd_api.h"
@@ -73,6 +71,7 @@ zfs_retire_clear_data(fmd_hdl_t *hdl, zfs_retire_data_t *zdp)
*/ */
typedef struct find_cbdata { typedef struct find_cbdata {
uint64_t cb_guid; uint64_t cb_guid;
const char *cb_fru;
zpool_handle_t *cb_zhp; zpool_handle_t *cb_zhp;
nvlist_t *cb_vdev; nvlist_t *cb_vdev;
} find_cbdata_t; } find_cbdata_t;
@@ -96,18 +95,26 @@ find_pool(zpool_handle_t *zhp, void *data)
* Find a vdev within a tree with a matching GUID. * Find a vdev within a tree with a matching GUID.
*/ */
static nvlist_t * static nvlist_t *
find_vdev(libzfs_handle_t *zhdl, nvlist_t *nv, uint64_t search_guid) find_vdev(libzfs_handle_t *zhdl, nvlist_t *nv, const char *search_fru,
uint64_t search_guid)
{ {
uint64_t guid; uint64_t guid;
nvlist_t **child; nvlist_t **child;
uint_t c, children; uint_t c, children;
nvlist_t *ret; nvlist_t *ret;
char *fru;
if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_GUID, &guid) == 0 && if (search_fru != NULL) {
guid == search_guid) { if (nvlist_lookup_string(nv, ZPOOL_CONFIG_FRU, &fru) == 0 &&
fmd_hdl_debug(fmd_module_hdl("zfs-retire"), libzfs_fru_compare(zhdl, fru, search_fru))
"matched vdev %llu", guid); return (nv);
return (nv); } else {
if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_GUID, &guid) == 0 &&
guid == search_guid) {
fmd_hdl_debug(fmd_module_hdl("zfs-retire"),
"matched vdev %llu", guid);
return (nv);
}
} }
if (nvlist_lookup_nvlist_array(nv, ZPOOL_CONFIG_CHILDREN, if (nvlist_lookup_nvlist_array(nv, ZPOOL_CONFIG_CHILDREN,
@@ -115,7 +122,8 @@ find_vdev(libzfs_handle_t *zhdl, nvlist_t *nv, uint64_t search_guid)
return (NULL); return (NULL);
for (c = 0; c < children; c++) { for (c = 0; c < children; c++) {
if ((ret = find_vdev(zhdl, child[c], search_guid)) != NULL) if ((ret = find_vdev(zhdl, child[c], search_fru,
search_guid)) != NULL)
return (ret); return (ret);
} }
@@ -124,16 +132,8 @@ find_vdev(libzfs_handle_t *zhdl, nvlist_t *nv, uint64_t search_guid)
return (NULL); return (NULL);
for (c = 0; c < children; c++) { for (c = 0; c < children; c++) {
if ((ret = find_vdev(zhdl, child[c], search_guid)) != NULL) if ((ret = find_vdev(zhdl, child[c], search_fru,
return (ret); search_guid)) != NULL)
}
if (nvlist_lookup_nvlist_array(nv, ZPOOL_CONFIG_SPARES,
&child, &children) != 0)
return (NULL);
for (c = 0; c < children; c++) {
if ((ret = find_vdev(zhdl, child[c], search_guid)) != NULL)
return (ret); return (ret);
} }
@@ -167,7 +167,8 @@ find_by_guid(libzfs_handle_t *zhdl, uint64_t pool_guid, uint64_t vdev_guid,
} }
if (vdev_guid != 0) { if (vdev_guid != 0) {
if ((*vdevp = find_vdev(zhdl, nvroot, vdev_guid)) == NULL) { if ((*vdevp = find_vdev(zhdl, nvroot, NULL,
vdev_guid)) == NULL) {
zpool_close(zhp); zpool_close(zhp);
return (NULL); return (NULL);
} }
@@ -176,37 +177,72 @@ find_by_guid(libzfs_handle_t *zhdl, uint64_t pool_guid, uint64_t vdev_guid,
return (zhp); return (zhp);
} }
#ifdef HAVE_LIBTOPO
static int
search_pool(zpool_handle_t *zhp, void *data)
{
find_cbdata_t *cbp = data;
nvlist_t *config;
nvlist_t *nvroot;
config = zpool_get_config(zhp, NULL);
if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE,
&nvroot) != 0) {
zpool_close(zhp);
return (0);
}
if ((cbp->cb_vdev = find_vdev(zpool_get_handle(zhp), nvroot,
cbp->cb_fru, 0)) != NULL) {
cbp->cb_zhp = zhp;
return (1);
}
zpool_close(zhp);
return (0);
}
/*
* Given a FRU FMRI, find the matching pool and vdev.
*/
static zpool_handle_t *
find_by_fru(libzfs_handle_t *zhdl, const char *fru, nvlist_t **vdevp)
{
find_cbdata_t cb;
cb.cb_fru = fru;
cb.cb_zhp = NULL;
if (zpool_iter(zhdl, search_pool, &cb) != 1)
return (NULL);
*vdevp = cb.cb_vdev;
return (cb.cb_zhp);
}
#endif /* HAVE_LIBTOPO */
/* /*
* Given a vdev, attempt to replace it with every known spare until one * Given a vdev, attempt to replace it with every known spare until one
* succeeds or we run out of devices to try. * succeeds.
* Return whether we were successful or not in replacing the device.
*/ */
static boolean_t static void
replace_with_spare(fmd_hdl_t *hdl, zpool_handle_t *zhp, nvlist_t *vdev) replace_with_spare(fmd_hdl_t *hdl, zpool_handle_t *zhp, nvlist_t *vdev)
{ {
nvlist_t *config, *nvroot, *replacement; nvlist_t *config, *nvroot, *replacement;
nvlist_t **spares; nvlist_t **spares;
uint_t s, nspares; uint_t s, nspares;
char *dev_name; char *dev_name;
zprop_source_t source;
int ashift;
config = zpool_get_config(zhp, NULL); config = zpool_get_config(zhp, NULL);
if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE, if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_VDEV_TREE,
&nvroot) != 0) &nvroot) != 0)
return (B_FALSE); return;
/* /*
* Find out if there are any hot spares available in the pool. * Find out if there are any hot spares available in the pool.
*/ */
if (nvlist_lookup_nvlist_array(nvroot, ZPOOL_CONFIG_SPARES, if (nvlist_lookup_nvlist_array(nvroot, ZPOOL_CONFIG_SPARES,
&spares, &nspares) != 0) &spares, &nspares) != 0)
return (B_FALSE); return;
/*
* lookup "ashift" pool property, we may need it for the replacement
*/
ashift = zpool_get_prop_int(zhp, ZPOOL_PROP_ASHIFT, &source);
replacement = fmd_nvl_alloc(hdl, FMD_SLEEP); replacement = fmd_nvl_alloc(hdl, FMD_SLEEP);
@@ -220,23 +256,12 @@ replace_with_spare(fmd_hdl_t *hdl, zpool_handle_t *zhp, nvlist_t *vdev)
* replace it. * replace it.
*/ */
for (s = 0; s < nspares; s++) { for (s = 0; s < nspares; s++) {
boolean_t rebuild = B_FALSE; char *spare_name;
char *spare_name, *type;
if (nvlist_lookup_string(spares[s], ZPOOL_CONFIG_PATH, if (nvlist_lookup_string(spares[s], ZPOOL_CONFIG_PATH,
&spare_name) != 0) &spare_name) != 0)
continue; continue;
/* prefer sequential resilvering for distributed spares */
if ((nvlist_lookup_string(spares[s], ZPOOL_CONFIG_TYPE,
&type) == 0) && strcmp(type, VDEV_TYPE_DRAID_SPARE) == 0)
rebuild = B_TRUE;
/* if set, add the "ashift" pool property to the spare nvlist */
if (source != ZPROP_SRC_DEFAULT)
(void) nvlist_add_uint64(spares[s],
ZPOOL_CONFIG_ASHIFT, ashift);
(void) nvlist_add_nvlist_array(replacement, (void) nvlist_add_nvlist_array(replacement,
ZPOOL_CONFIG_CHILDREN, &spares[s], 1); ZPOOL_CONFIG_CHILDREN, &spares[s], 1);
@@ -244,17 +269,12 @@ replace_with_spare(fmd_hdl_t *hdl, zpool_handle_t *zhp, nvlist_t *vdev)
dev_name, basename(spare_name)); dev_name, basename(spare_name));
if (zpool_vdev_attach(zhp, dev_name, spare_name, if (zpool_vdev_attach(zhp, dev_name, spare_name,
replacement, B_TRUE, rebuild) == 0) { replacement, B_TRUE) == 0)
free(dev_name); break;
nvlist_free(replacement);
return (B_TRUE);
}
} }
free(dev_name); free(dev_name);
nvlist_free(replacement); nvlist_free(replacement);
return (B_FALSE);
} }
/* /*
@@ -269,6 +289,10 @@ zfs_vdev_repair(fmd_hdl_t *hdl, nvlist_t *nvl)
zfs_retire_data_t *zdp = fmd_hdl_getspecific(hdl); zfs_retire_data_t *zdp = fmd_hdl_getspecific(hdl);
zfs_retire_repaired_t *zrp; zfs_retire_repaired_t *zrp;
uint64_t pool_guid, vdev_guid; uint64_t pool_guid, vdev_guid;
#ifdef HAVE_LIBTOPO
nvlist_t *asru;
#endif
if (nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_POOL_GUID, if (nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_POOL_GUID,
&pool_guid) != 0 || nvlist_lookup_uint64(nvl, &pool_guid) != 0 || nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID, &vdev_guid) != 0) FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID, &vdev_guid) != 0)
@@ -291,6 +315,47 @@ zfs_vdev_repair(fmd_hdl_t *hdl, nvlist_t *nvl)
return; return;
} }
#ifdef HAVE_LIBTOPO
asru = fmd_nvl_alloc(hdl, FMD_SLEEP);
(void) nvlist_add_uint8(asru, FM_VERSION, ZFS_SCHEME_VERSION0);
(void) nvlist_add_string(asru, FM_FMRI_SCHEME, FM_FMRI_SCHEME_ZFS);
(void) nvlist_add_uint64(asru, FM_FMRI_ZFS_POOL, pool_guid);
(void) nvlist_add_uint64(asru, FM_FMRI_ZFS_VDEV, vdev_guid);
/*
* We explicitly check for the unusable state here to make sure we
* aren't responding to a transient state change. As part of opening a
* vdev, it's possible to see the 'statechange' event, only to be
* followed by a vdev failure later. If we don't check the current
* state of the vdev (or pool) before marking it repaired, then we risk
* generating spurious repair events followed immediately by the same
* diagnosis.
*
* This assumes that the ZFS scheme code associated unusable (i.e.
* isolated) with its own definition of faulty state. In the case of a
* DEGRADED leaf vdev (due to checksum errors), this is not the case.
* This works, however, because the transient state change is not
* posted in this case. This could be made more explicit by not
* relying on the scheme's unusable callback and instead directly
* checking the vdev state, where we could correctly account for
* DEGRADED state.
*/
if (!fmd_nvl_fmri_unusable(hdl, asru) && fmd_nvl_fmri_has_fault(hdl,
asru, FMD_HAS_FAULT_ASRU, NULL)) {
topo_hdl_t *thp;
char *fmri = NULL;
int err;
thp = fmd_hdl_topo_hold(hdl, TOPO_VERSION);
if (topo_fmri_nvl2str(thp, asru, &fmri, &err) == 0)
(void) fmd_repair_asru(hdl, fmri);
fmd_hdl_topo_rele(hdl, thp);
topo_hdl_strfree(thp, fmri);
}
nvlist_free(asru);
#endif
zrp = fmd_hdl_alloc(hdl, sizeof (zfs_retire_repaired_t), FMD_SLEEP); zrp = fmd_hdl_alloc(hdl, sizeof (zfs_retire_repaired_t), FMD_SLEEP);
zrp->zrr_next = zdp->zrd_repaired; zrp->zrr_next = zdp->zrd_repaired;
zrp->zrr_pool = pool_guid; zrp->zrr_pool = pool_guid;
@@ -326,19 +391,11 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
fmd_hdl_debug(hdl, "zfs_retire_recv: '%s'", class); fmd_hdl_debug(hdl, "zfs_retire_recv: '%s'", class);
nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_STATE, &state);
/* /*
* If this is a resource notifying us of device removal then simply * If this is a resource notifying us of device removal, then simply
* check for an available spare and continue unless the device is a * check for an available spare and continue.
* l2arc vdev, in which case we just offline it.
*/ */
if (strcmp(class, "resource.fs.zfs.removed") == 0 || if (strcmp(class, "resource.fs.zfs.removed") == 0) {
(strcmp(class, "resource.fs.zfs.statechange") == 0 &&
(state == VDEV_STATE_REMOVED || state == VDEV_STATE_FAULTED))) {
char *devtype;
char *devname;
if (nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_POOL_GUID, if (nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_POOL_GUID,
&pool_guid) != 0 || &pool_guid) != 0 ||
nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID, nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_GUID,
@@ -349,20 +406,8 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
&vdev)) == NULL) &vdev)) == NULL)
return; return;
devname = zpool_vdev_name(NULL, zhp, vdev, B_FALSE); if (fmd_prop_get_int32(hdl, "spare_on_remove"))
replace_with_spare(hdl, zhp, vdev);
/* Can't replace l2arc with a spare: offline the device */
if (nvlist_lookup_string(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_TYPE,
&devtype) == 0 && strcmp(devtype, VDEV_TYPE_L2CACHE) == 0) {
fmd_hdl_debug(hdl, "zpool_vdev_offline '%s'", devname);
zpool_vdev_offline(zhp, devname, B_TRUE);
} else if (!fmd_prop_get_int32(hdl, "spare_on_remove") ||
replace_with_spare(hdl, zhp, vdev) == B_FALSE) {
/* Could not handle with spare */
fmd_hdl_debug(hdl, "no spare for '%s'", devname);
}
free(devname);
zpool_close(zhp); zpool_close(zhp);
return; return;
} }
@@ -371,11 +416,12 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
return; return;
/* /*
* Note: on Linux statechange events are more than just * Note: on zfsonlinux statechange events are more than just
* healthy ones so we need to confirm the actual state value. * healthy ones so we need to confirm the actual state value.
*/ */
if (strcmp(class, "resource.fs.zfs.statechange") == 0 && if (strcmp(class, "resource.fs.zfs.statechange") == 0 &&
state == VDEV_STATE_HEALTHY) { nvlist_lookup_uint64(nvl, FM_EREPORT_PAYLOAD_ZFS_VDEV_STATE,
&state) == 0 && state == VDEV_STATE_HEALTHY) {
zfs_vdev_repair(hdl, nvl); zfs_vdev_repair(hdl, nvl);
return; return;
} }
@@ -431,7 +477,39 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
} }
if (is_disk) { if (is_disk) {
#ifdef HAVE_LIBTOPO
/*
* This is a disk fault. Lookup the FRU, convert it to
* an FMRI string, and attempt to find a matching vdev.
*/
if (nvlist_lookup_nvlist(fault, FM_FAULT_FRU,
&fru) != 0 ||
nvlist_lookup_string(fru, FM_FMRI_SCHEME,
&scheme) != 0)
continue;
if (strcmp(scheme, FM_FMRI_SCHEME_HC) != 0)
continue;
thp = fmd_hdl_topo_hold(hdl, TOPO_VERSION);
if (topo_fmri_nvl2str(thp, fru, &fmri, &err) != 0) {
fmd_hdl_topo_rele(hdl, thp);
continue;
}
zhp = find_by_fru(zhdl, fmri, &vdev);
topo_hdl_strfree(thp, fmri);
fmd_hdl_topo_rele(hdl, thp);
if (zhp == NULL)
continue;
(void) nvlist_lookup_uint64(vdev,
ZPOOL_CONFIG_GUID, &vdev_guid);
aux = VDEV_AUX_EXTERNAL;
#else
continue; continue;
#endif
} else { } else {
/* /*
* This is a ZFS fault. Lookup the resource, and * This is a ZFS fault. Lookup the resource, and
@@ -505,8 +583,7 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
/* /*
* Attempt to substitute a hot spare. * Attempt to substitute a hot spare.
*/ */
(void) replace_with_spare(hdl, zhp, vdev); replace_with_spare(hdl, zhp, vdev);
zpool_close(zhp); zpool_close(zhp);
} }
@@ -538,7 +615,7 @@ _zfs_retire_init(fmd_hdl_t *hdl)
zfs_retire_data_t *zdp; zfs_retire_data_t *zdp;
libzfs_handle_t *zhdl; libzfs_handle_t *zhdl;
if ((zhdl = libzfs_init()) == NULL) if ((zhdl = __libzfs_init()) == NULL)
return; return;
if (fmd_hdl_register(hdl, FMD_API_VERSION, &fmd_info) != 0) { if (fmd_hdl_register(hdl, FMD_API_VERSION, &fmd_info) != 0) {
@@ -559,7 +636,7 @@ _zfs_retire_fini(fmd_hdl_t *hdl)
if (zdp != NULL) { if (zdp != NULL) {
zfs_retire_clear_data(hdl, zdp); zfs_retire_clear_data(hdl, zdp);
libzfs_fini(zdp->zrd_hdl); __libzfs_fini(zdp->zrd_hdl);
fmd_hdl_free(hdl, zdp, sizeof (zfs_retire_data_t)); fmd_hdl_free(hdl, zdp, sizeof (zfs_retire_data_t));
} }
} }
+23 -50
View File
@@ -1,9 +1,9 @@
/* /*
* This file is part of the ZFS Event Daemon (ZED). * This file is part of the ZFS Event Daemon (ZED)
* * for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). * Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. * Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution. * Refer to the ZoL git commit log for authoritative copyright attribution.
* *
* The contents of this file are subject to the terms of the * The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0). * Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -60,8 +60,8 @@ _setup_sig_handlers(void)
zed_log_die("Failed to initialize sigset"); zed_log_die("Failed to initialize sigset");
sa.sa_flags = SA_RESTART; sa.sa_flags = SA_RESTART;
sa.sa_handler = SIG_IGN; sa.sa_handler = SIG_IGN;
if (sigaction(SIGPIPE, &sa, NULL) < 0) if (sigaction(SIGPIPE, &sa, NULL) < 0)
zed_log_die("Failed to ignore SIGPIPE"); zed_log_die("Failed to ignore SIGPIPE");
@@ -75,10 +75,6 @@ _setup_sig_handlers(void)
sa.sa_handler = _hup_handler; sa.sa_handler = _hup_handler;
if (sigaction(SIGHUP, &sa, NULL) < 0) if (sigaction(SIGHUP, &sa, NULL) < 0)
zed_log_die("Failed to register SIGHUP handler"); zed_log_die("Failed to register SIGHUP handler");
(void) sigaddset(&sa.sa_mask, SIGCHLD);
if (pthread_sigmask(SIG_BLOCK, &sa.sa_mask, NULL) < 0)
zed_log_die("Failed to block SIGCHLD");
} }
/* /*
@@ -216,20 +212,22 @@ _finish_daemonize(void)
int int
main(int argc, char *argv[]) main(int argc, char *argv[])
{ {
struct zed_conf zcp; struct zed_conf *zcp;
uint64_t saved_eid; uint64_t saved_eid;
int64_t saved_etime[2]; int64_t saved_etime[2];
zed_log_init(argv[0]); zed_log_init(argv[0]);
zed_log_stderr_open(LOG_NOTICE); zed_log_stderr_open(LOG_NOTICE);
zed_conf_init(&zcp); zcp = zed_conf_create();
zed_conf_parse_opts(&zcp, argc, argv); zed_conf_parse_opts(zcp, argc, argv);
if (zcp.do_verbose) if (zcp->do_verbose)
zed_log_stderr_open(LOG_INFO); zed_log_stderr_open(LOG_INFO);
if (geteuid() != 0) if (geteuid() != 0)
zed_log_die("Must be run as root"); zed_log_die("Must be run as root");
zed_conf_parse_file(zcp);
zed_file_close_from(STDERR_FILENO + 1); zed_file_close_from(STDERR_FILENO + 1);
(void) umask(0); (void) umask(0);
@@ -237,72 +235,47 @@ main(int argc, char *argv[])
if (chdir("/") < 0) if (chdir("/") < 0)
zed_log_die("Failed to change to root directory"); zed_log_die("Failed to change to root directory");
if (zed_conf_scan_dir(&zcp) < 0) if (zed_conf_scan_dir(zcp) < 0)
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
if (!zcp.do_foreground) { if (!zcp->do_foreground) {
_start_daemonize(); _start_daemonize();
zed_log_syslog_open(LOG_DAEMON); zed_log_syslog_open(LOG_DAEMON);
} }
_setup_sig_handlers(); _setup_sig_handlers();
if (zcp.do_memlock) if (zcp->do_memlock)
_lock_memory(); _lock_memory();
if ((zed_conf_write_pid(&zcp) < 0) && (!zcp.do_force)) if ((zed_conf_write_pid(zcp) < 0) && (!zcp->do_force))
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
if (!zcp.do_foreground) if (!zcp->do_foreground)
_finish_daemonize(); _finish_daemonize();
zed_log_msg(LOG_NOTICE, zed_log_msg(LOG_NOTICE,
"ZFS Event Daemon %s-%s (PID %d)", "ZFS Event Daemon %s-%s (PID %d)",
ZFS_META_VERSION, ZFS_META_RELEASE, (int)getpid()); ZFS_META_VERSION, ZFS_META_RELEASE, (int)getpid());
if (zed_conf_open_state(&zcp) < 0) if (zed_conf_open_state(zcp) < 0)
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
if (zed_conf_read_state(&zcp, &saved_eid, saved_etime) < 0) if (zed_conf_read_state(zcp, &saved_eid, saved_etime) < 0)
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
idle: zed_event_init(zcp);
/* zed_event_seek(zcp, saved_eid, saved_etime);
* If -I is specified, attempt to open /dev/zfs repeatedly until
* successful.
*/
do {
if (!zed_event_init(&zcp))
break;
/* Wait for some time and try again. tunable? */
sleep(30);
} while (!_got_exit && zcp.do_idle);
if (_got_exit)
goto out;
zed_event_seek(&zcp, saved_eid, saved_etime);
while (!_got_exit) { while (!_got_exit) {
int rv;
if (_got_hup) { if (_got_hup) {
_got_hup = 0; _got_hup = 0;
(void) zed_conf_scan_dir(&zcp); (void) zed_conf_scan_dir(zcp);
} }
rv = zed_event_service(&zcp); zed_event_service(zcp);
/* ENODEV: When kernel module is unloaded (osx) */
if (rv != 0)
break;
} }
zed_log_msg(LOG_NOTICE, "Exiting"); zed_log_msg(LOG_NOTICE, "Exiting");
zed_event_fini(&zcp); zed_event_fini(zcp);
zed_conf_destroy(zcp);
if (zcp.do_idle && !_got_exit)
goto idle;
out:
zed_conf_destroy(&zcp);
zed_log_fini(); zed_log_fini();
exit(EXIT_SUCCESS); exit(EXIT_SUCCESS);
} }
-1
View File
@@ -1 +0,0 @@
history_event-zfs-list-cacher.sh
-57
View File
@@ -1,57 +0,0 @@
include $(top_srcdir)/config/Rules.am
include $(top_srcdir)/config/Substfiles.am
include $(top_srcdir)/config/Shellcheck.am
EXTRA_DIST += README
zedconfdir = $(sysconfdir)/zfs/zed.d
dist_zedconf_DATA = \
zed-functions.sh \
zed.rc
zedexecdir = $(zfsexecdir)/zed.d
dist_zedexec_SCRIPTS = \
all-debug.sh \
all-syslog.sh \
data-notify.sh \
generic-notify.sh \
resilver_finish-notify.sh \
scrub_finish-notify.sh \
statechange-led.sh \
statechange-notify.sh \
vdev_clear-led.sh \
vdev_attach-led.sh \
pool_import-led.sh \
resilver_finish-start-scrub.sh \
trim_finish-notify.sh
nodist_zedexec_SCRIPTS = history_event-zfs-list-cacher.sh
SUBSTFILES += $(nodist_zedexec_SCRIPTS)
zedconfdefaults = \
all-syslog.sh \
data-notify.sh \
history_event-zfs-list-cacher.sh \
resilver_finish-notify.sh \
scrub_finish-notify.sh \
statechange-led.sh \
statechange-notify.sh \
vdev_clear-led.sh \
vdev_attach-led.sh \
pool_import-led.sh \
resilver_finish-start-scrub.sh
install-data-hook:
$(MKDIR_P) "$(DESTDIR)$(zedconfdir)"
for f in $(zedconfdefaults); do \
test -f "$(DESTDIR)$(zedconfdir)/$${f}" -o \
-L "$(DESTDIR)$(zedconfdir)/$${f}" || \
ln -s "$(zedexecdir)/$${f}" "$(DESTDIR)$(zedconfdir)"; \
done
chmod 0600 "$(DESTDIR)$(zedconfdir)/zed.rc"
# False positive: 1>&"${ZED_FLOCK_FD}" looks suspiciously similar to a >&filename bash extension
CHECKBASHISMS_IGNORE = -e 'should be >word 2>&1' -e '&"$${ZED_FLOCK_FD}"'
+10 -6
View File
@@ -12,11 +12,15 @@
zed_exit_if_ignoring_this_event zed_exit_if_ignoring_this_event
zed_lock "${ZED_DEBUG_LOG}" lockfile="$(basename -- "${ZED_DEBUG_LOG}").lock"
{
printenv | sort
echo
} 1>&"${ZED_FLOCK_FD}"
zed_unlock "${ZED_DEBUG_LOG}"
umask 077
zed_lock "${lockfile}"
exec >> "${ZED_DEBUG_LOG}"
printenv | sort
echo
exec >&-
zed_unlock "${lockfile}"
exit 0 exit 0
+4 -41
View File
@@ -1,51 +1,14 @@
#!/bin/sh #!/bin/sh
#
# Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
# Copyright (c) 2020 by Delphix. All rights reserved.
#
# #
# Log the zevent via syslog. # Log the zevent via syslog.
#
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc" [ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh" . "${ZED_ZEDLET_DIR}/zed-functions.sh"
zed_exit_if_ignoring_this_event zed_exit_if_ignoring_this_event
# build a string of name=value pairs for this event zed_log_msg "eid=${ZEVENT_EID}" "class=${ZEVENT_SUBCLASS}" \
msg="eid=${ZEVENT_EID} class=${ZEVENT_SUBCLASS}" "${ZEVENT_POOL_GUID:+"pool_guid=${ZEVENT_POOL_GUID}"}" \
"${ZEVENT_VDEV_PATH:+"vdev_path=${ZEVENT_VDEV_PATH}"}" \
if [ "${ZED_SYSLOG_DISPLAY_GUIDS}" = "1" ]; then "${ZEVENT_VDEV_STATE_STR:+"vdev_state=${ZEVENT_VDEV_STATE_STR}"}"
[ -n "${ZEVENT_POOL_GUID}" ] && msg="${msg} pool_guid=${ZEVENT_POOL_GUID}"
[ -n "${ZEVENT_VDEV_GUID}" ] && msg="${msg} vdev_guid=${ZEVENT_VDEV_GUID}"
else
[ -n "${ZEVENT_POOL}" ] && msg="${msg} pool='${ZEVENT_POOL}'"
[ -n "${ZEVENT_VDEV_PATH}" ] && msg="${msg} vdev=${ZEVENT_VDEV_PATH##*/}"
fi
# log pool state if state is anything other than 'ACTIVE'
[ -n "${ZEVENT_POOL_STATE_STR}" ] && [ "$ZEVENT_POOL_STATE" -ne 0 ] && \
msg="${msg} pool_state=${ZEVENT_POOL_STATE_STR}"
# Log the following payload nvpairs if they are present
[ -n "${ZEVENT_VDEV_STATE_STR}" ] && msg="${msg} vdev_state=${ZEVENT_VDEV_STATE_STR}"
[ -n "${ZEVENT_CKSUM_ALGORITHM}" ] && msg="${msg} algorithm=${ZEVENT_CKSUM_ALGORITHM}"
[ -n "${ZEVENT_ZIO_SIZE}" ] && msg="${msg} size=${ZEVENT_ZIO_SIZE}"
[ -n "${ZEVENT_ZIO_OFFSET}" ] && msg="${msg} offset=${ZEVENT_ZIO_OFFSET}"
[ -n "${ZEVENT_ZIO_PRIORITY}" ] && msg="${msg} priority=${ZEVENT_ZIO_PRIORITY}"
[ -n "${ZEVENT_ZIO_ERR}" ] && msg="${msg} err=${ZEVENT_ZIO_ERR}"
[ -n "${ZEVENT_ZIO_FLAGS}" ] && msg="${msg} flags=$(printf '0x%x' "${ZEVENT_ZIO_FLAGS}")"
# log delays that are >= 10 milisec
[ -n "${ZEVENT_ZIO_DELAY}" ] && [ "$ZEVENT_ZIO_DELAY" -gt 10000000 ] && \
msg="${msg} delay=$((ZEVENT_ZIO_DELAY / 1000000))ms"
# list the bookmark data together
# shellcheck disable=SC2153
[ -n "${ZEVENT_ZIO_OBJSET}" ] && \
msg="${msg} bookmark=${ZEVENT_ZIO_OBJSET}:${ZEVENT_ZIO_OBJECT}:${ZEVENT_ZIO_LEVEL}:${ZEVENT_ZIO_BLKID}"
zed_log_msg "${msg}"
exit 0 exit 0
+1 -1
View File
@@ -25,7 +25,7 @@ zed_rate_limit "${rate_limit_tag}" || exit 3
umask 077 umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} error for ${ZEVENT_POOL} on $(hostname)" note_subject="ZFS ${ZEVENT_SUBCLASS} error for ${ZEVENT_POOL} on $(hostname)"
note_pathname="$(mktemp)" note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
{ {
echo "ZFS has detected a data error:" echo "ZFS has detected a data error:"
echo echo
+2 -2
View File
@@ -23,7 +23,7 @@
# Rate-limit the notification based in part on the filename. # Rate-limit the notification based in part on the filename.
# #
rate_limit_tag="${ZEVENT_POOL};${ZEVENT_SUBCLASS};${0##*/}" rate_limit_tag="${ZEVENT_POOL};${ZEVENT_SUBCLASS};$(basename -- "$0")"
rate_limit_interval="${ZED_NOTIFY_INTERVAL_SECS}" rate_limit_interval="${ZED_NOTIFY_INTERVAL_SECS}"
zed_rate_limit "${rate_limit_tag}" "${rate_limit_interval}" || exit 3 zed_rate_limit "${rate_limit_tag}" "${rate_limit_interval}" || exit 3
@@ -31,7 +31,7 @@ umask 077
pool_str="${ZEVENT_POOL:+" for ${ZEVENT_POOL}"}" pool_str="${ZEVENT_POOL:+" for ${ZEVENT_POOL}"}"
host_str=" on $(hostname)" host_str=" on $(hostname)"
note_subject="ZFS ${ZEVENT_SUBCLASS} event${pool_str}${host_str}" note_subject="ZFS ${ZEVENT_SUBCLASS} event${pool_str}${host_str}"
note_pathname="$(mktemp)" note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
{ {
echo "ZFS has posted the following event:" echo "ZFS has posted the following event:"
echo echo
@@ -1,84 +0,0 @@
#!/bin/sh
#
# Track changes to enumerated pools for use in early-boot
set -ef
FSLIST="@sysconfdir@/zfs/zfs-list.cache/${ZEVENT_POOL}"
FSLIST_TMP="@runstatedir@/zfs-list.cache@${ZEVENT_POOL}"
# If the pool specific cache file is not writeable, abort
[ -w "${FSLIST}" ] || exit 0
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
[ "$ZEVENT_SUBCLASS" != "history_event" ] && exit 0
zed_check_cmd "${ZFS}" sort diff
# If we are acting on a snapshot, we have nothing to do
[ "${ZEVENT_HISTORY_DSNAME%@*}" = "${ZEVENT_HISTORY_DSNAME}" ] || exit 0
# We lock the output file to avoid simultaneous writes.
# If we run into trouble, log and drop the lock
abort_alter() {
zed_log_msg "Error updating zfs-list.cache for ${ZEVENT_POOL}!"
zed_unlock "${FSLIST}"
}
finished() {
zed_unlock "${FSLIST}"
trap - EXIT
exit 0
}
case "${ZEVENT_HISTORY_INTERNAL_NAME}" in
create|"finish receiving"|import|destroy|rename)
;;
export)
zed_lock "${FSLIST}"
trap abort_alter EXIT
echo > "${FSLIST}"
finished
;;
set|inherit)
# Only act if one of the tracked properties is altered.
case "${ZEVENT_HISTORY_INTERNAL_STR%%=*}" in
canmount|mountpoint|atime|relatime|devices|exec|readonly| \
setuid|nbmand|encroot|keylocation|org.openzfs.systemd:requires| \
org.openzfs.systemd:requires-mounts-for| \
org.openzfs.systemd:before|org.openzfs.systemd:after| \
org.openzfs.systemd:wanted-by|org.openzfs.systemd:required-by| \
org.openzfs.systemd:nofail|org.openzfs.systemd:ignore \
) ;;
*) exit 0 ;;
esac
;;
*)
# Ignore all other events.
exit 0
;;
esac
zed_lock "${FSLIST}"
trap abort_alter EXIT
PROPS="name,mountpoint,canmount,atime,relatime,devices,exec\
,readonly,setuid,nbmand,encroot,keylocation\
,org.openzfs.systemd:requires,org.openzfs.systemd:requires-mounts-for\
,org.openzfs.systemd:before,org.openzfs.systemd:after\
,org.openzfs.systemd:wanted-by,org.openzfs.systemd:required-by\
,org.openzfs.systemd:nofail,org.openzfs.systemd:ignore"
"${ZFS}" list -H -t filesystem -o $PROPS -r "${ZEVENT_POOL}" > "${FSLIST_TMP}"
# Sort the output so that it is stable
sort "${FSLIST_TMP}" -o "${FSLIST_TMP}"
# Don't modify the file if it hasn't changed
diff -q "${FSLIST_TMP}" "${FSLIST}" || cat "${FSLIST_TMP}" > "${FSLIST}"
rm -f "${FSLIST_TMP}"
finished
@@ -5,12 +5,10 @@
# Exit codes: # Exit codes:
# 1: Internal error # 1: Internal error
# 2: Script wasn't enabled in zed.rc # 2: Script wasn't enabled in zed.rc
# 3: Scrubs are automatically started for sequential resilvers
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc" [ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh" . "${ZED_ZEDLET_DIR}/zed-functions.sh"
[ "${ZED_SCRUB_AFTER_RESILVER}" = "1" ] || exit 2 [ "${ZED_SCRUB_AFTER_RESILVER}" = "1" ] || exit 2
[ "${ZEVENT_RESILVER_TYPE}" != "sequential" ] || exit 3
[ -n "${ZEVENT_POOL}" ] || exit 1 [ -n "${ZEVENT_POOL}" ] || exit 1
[ -n "${ZEVENT_SUBCLASS}" ] || exit 1 [ -n "${ZEVENT_SUBCLASS}" ] || exit 1
zed_check_cmd "${ZPOOL}" || exit 1 zed_check_cmd "${ZPOOL}" || exit 1
+1 -1
View File
@@ -41,7 +41,7 @@ fi
umask 077 umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} event for ${ZEVENT_POOL} on $(hostname)" note_subject="ZFS ${ZEVENT_SUBCLASS} event for ${ZEVENT_POOL} on $(hostname)"
note_pathname="$(mktemp)" note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
{ {
echo "ZFS has finished a ${action}:" echo "ZFS has finished a ${action}:"
echo echo
+53 -116
View File
@@ -1,26 +1,26 @@
#!/bin/sh #!/bin/sh
# #
# Turn off/on vdevs' enclosure fault LEDs when their pool's state changes. # Turn off/on the VDEV's enclosure fault LEDs when the pool's state changes.
# #
# Turn a vdev's fault LED on if it becomes FAULTED, DEGRADED or UNAVAIL. # Turn the VDEV's fault LED on if it becomes FAULTED, DEGRADED or UNAVAIL.
# Turn its LED off when it's back ONLINE again. # Turn the LED off when it's back ONLINE again.
# #
# This script run in two basic modes: # This script run in two basic modes:
# #
# 1. If $ZEVENT_VDEV_ENC_SYSFS_PATH and $ZEVENT_VDEV_STATE_STR are set, then # 1. If $ZEVENT_VDEV_ENC_SYSFS_PATH and $ZEVENT_VDEV_STATE_STR are set, then
# only set the LED for that particular vdev. This is the case for statechange # only set the LED for that particular VDEV. This is the case for statechange
# events and some vdev_* events. # events and some vdev_* events.
# #
# 2. If those vars are not set, then check the state of all vdevs in the pool # 2. If those vars are not set, then check the state of all VDEVs in the pool
# and set the LEDs accordingly. This is the case for pool_import events. # and set the LEDs accordingly. This is the case for pool_import events.
# #
# Note that this script requires that your enclosure be supported by the # Note that this script requires that your enclosure be supported by the
# Linux SCSI Enclosure services (SES) driver. The script will do nothing # Linux SCSI enclosure services (ses) driver. The script will do nothing
# if you have no enclosure, or if your enclosure isn't supported. # if you have no enclosure, or if your enclosure isn't supported.
# #
# Exit codes: # Exit codes:
# 0: enclosure led successfully set # 0: enclosure led successfully set
# 1: enclosure leds not available # 1: enclosure leds not not available
# 2: enclosure leds administratively disabled # 2: enclosure leds administratively disabled
# 3: The led sysfs path passed from ZFS does not exist # 3: The led sysfs path passed from ZFS does not exist
# 4: $ZPOOL not set # 4: $ZPOOL not set
@@ -29,8 +29,7 @@
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc" [ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh" . "${ZED_ZEDLET_DIR}/zed-functions.sh"
if [ ! -d /sys/class/enclosure ] && [ ! -d /sys/bus/pci/slots ] ; then if [ ! -d /sys/class/enclosure ] ; then
# No JBOD enclosure or NVMe slots
exit 1 exit 1
fi fi
@@ -60,10 +59,6 @@ check_and_set_led()
file="$1" file="$1"
val="$2" val="$2"
if [ -z "$val" ]; then
return 0
fi
if [ ! -e "$file" ] ; then if [ ! -e "$file" ] ; then
return 3 return 3
fi fi
@@ -71,11 +66,11 @@ check_and_set_led()
# If another process is accessing the LED when we attempt to update it, # If another process is accessing the LED when we attempt to update it,
# the update will be lost so retry until the LED actually changes or we # the update will be lost so retry until the LED actually changes or we
# timeout. # timeout.
for _ in 1 2 3 4 5; do for _ in $(seq 1 5); do
# We want to check the current state first, since writing to the # We want to check the current state first, since writing to the
# 'fault' entry always causes a SES command, even if the # 'fault' entry always always causes a SES command, even if the
# current state is already what you want. # current state is already what you want.
read -r current < "${file}" current=$(cat "${file}")
# On some enclosures if you write 1 to fault, and read it back, # On some enclosures if you write 1 to fault, and read it back,
# it will return 2. Treat all non-zero values as 1 for # it will return 2. Treat all non-zero values as 1 for
@@ -90,84 +85,27 @@ check_and_set_led()
else else
break break
fi fi
done done
}
# Fault LEDs for JBODs and NVMe drives are handled a little differently.
#
# On JBODs the fault LED is called 'fault' and on a path like this:
#
# /sys/class/enclosure/0:0:1:0/SLOT 10/fault
#
# On NVMe it's called 'attention' and on a path like this:
#
# /sys/bus/pci/slot/0/attention
#
# This function returns the full path to the fault LED file for a given
# enclosure/slot directory.
#
path_to_led()
{
dir=$1
if [ -f "$dir/fault" ] ; then
echo "$dir/fault"
elif [ -f "$dir/attention" ] ; then
echo "$dir/attention"
fi
} }
state_to_val() state_to_val()
{ {
state="$1" state="$1"
case "$state" in if [ "$state" = "FAULTED" ] || [ "$state" = "DEGRADED" ] || \
FAULTED|DEGRADED|UNAVAIL) [ "$state" = "UNAVAIL" ] ; then
echo 1 echo 1
;; elif [ "$state" = "ONLINE" ] ; then
ONLINE) echo 0
echo 0 fi
;;
esac
} }
# process_pool ([pool])
# #
# Given a nvme name like 'nvme0n1', pass back its slot directory # Iterate through a pool (or pools) and set the VDEV's enclosure slot LEDs to
# like "/sys/bus/pci/slots/0" # the VDEV's state.
#
nvme_dev_to_slot()
{
dev="$1"
# Get the address "0000:01:00.0"
address=$(cat "/sys/class/block/$dev/device/address")
# For each /sys/bus/pci/slots subdir that is an actual number
# (rather than weird directories like "1-3/").
# shellcheck disable=SC2010
for i in $(ls /sys/bus/pci/slots/ | grep -E "^[0-9]+$") ; do
this_address=$(cat "/sys/bus/pci/slots/$i/address")
# The format of address is a little different between
# /sys/class/block/$dev/device/address and
# /sys/bus/pci/slots/
#
# address= "0000:01:00.0"
# this_address = "0000:01:00"
#
if echo "$address" | grep -Eq ^"$this_address" ; then
echo "/sys/bus/pci/slots/$i"
break
fi
done
}
# process_pool (pool)
#
# Iterate through a pool and set the vdevs' enclosure slot LEDs to
# those vdevs' state.
# #
# Arguments # Arguments
# pool: Pool name. # pool: Optional pool name. If not specified, iterate though all pools.
# #
# Return # Return
# 0 on success, 3 on missing sysfs path # 0 on success, 3 on missing sysfs path
@@ -175,27 +113,19 @@ nvme_dev_to_slot()
process_pool() process_pool()
{ {
pool="$1" pool="$1"
# The output will be the vdevs only (from "grep '/dev/'"):
#
# U45 ONLINE 0 0 0 /dev/sdk 0
# U46 ONLINE 0 0 0 /dev/sdm 0
# U47 ONLINE 0 0 0 /dev/sdn 0
# U50 ONLINE 0 0 0 /dev/sdbn 0
#
ZPOOL_SCRIPTS_AS_ROOT=1 $ZPOOL status -c upath,fault_led "$pool" | grep '/dev/' | (
rc=0 rc=0
while read -r vdev state _ _ _ therest; do
# Read out current LED value and path
# Get dev name (like 'sda')
dev=$(basename "$(echo "$therest" | awk '{print $(NF-1)}')")
vdev_enc_sysfs_path=$(realpath "/sys/class/block/$dev/device/enclosure_device"*)
if [ ! -d "$vdev_enc_sysfs_path" ] ; then
# This is not a JBOD disk, but it could be a PCI NVMe drive
vdev_enc_sysfs_path=$(nvme_dev_to_slot "$dev")
fi
current_val=$(echo "$therest" | awk '{print $NF}') # Lookup all the current LED values and paths in parallel
#shellcheck disable=SC2016
cmd='echo led_token=$(cat "$VDEV_ENC_SYSFS_PATH/fault"),"$VDEV_ENC_SYSFS_PATH",'
out=$($ZPOOL status -vc "$cmd" "$pool" | grep 'led_token=')
#shellcheck disable=SC2034
echo "$out" | while read -r vdev state read write chksum therest; do
# Read out current LED value and path
tmp=$(echo "$therest" | sed 's/^.*led_token=//g')
vdev_enc_sysfs_path=$(echo "$tmp" | awk -F ',' '{print $2}')
current_val=$(echo "$tmp" | awk -F ',' '{print $1}')
if [ "$current_val" != "0" ] ; then if [ "$current_val" != "0" ] ; then
current_val=1 current_val=1
@@ -206,33 +136,40 @@ process_pool()
continue continue
fi fi
led_path=$(path_to_led "$vdev_enc_sysfs_path") if [ ! -e "$vdev_enc_sysfs_path/fault" ] ; then
if [ ! -e "$led_path" ] ; then #shellcheck disable=SC2030
rc=3 rc=1
zed_log_msg "vdev $vdev '$led_path' doesn't exist" zed_log_msg "vdev $vdev '$file/fault' doesn't exist"
continue continue;
fi fi
val=$(state_to_val "$state") val=$(state_to_val "$state")
if [ "$current_val" = "$val" ] ; then if [ "$current_val" = "$val" ] ; then
# LED is already set correctly # LED is already set correctly
continue continue;
fi fi
if ! check_and_set_led "$led_path" "$val"; then if ! check_and_set_led "$vdev_enc_sysfs_path/fault" "$val"; then
rc=3 rc=1
fi fi
done done
exit "$rc"; )
#shellcheck disable=SC2031
if [ "$rc" = "0" ] ; then
return 0
else
# We didn't see a sysfs entry that we wanted to set
return 3
fi
} }
if [ -n "$ZEVENT_VDEV_ENC_SYSFS_PATH" ] && [ -n "$ZEVENT_VDEV_STATE_STR" ] ; then if [ ! -z "$ZEVENT_VDEV_ENC_SYSFS_PATH" ] && [ ! -z "$ZEVENT_VDEV_STATE_STR" ] ; then
# Got a statechange for an individual vdev # Got a statechange for an individual VDEV
val=$(state_to_val "$ZEVENT_VDEV_STATE_STR") val=$(state_to_val "$ZEVENT_VDEV_STATE_STR")
vdev=$(basename "$ZEVENT_VDEV_PATH") vdev=$(basename "$ZEVENT_VDEV_PATH")
ledpath=$(path_to_led "$ZEVENT_VDEV_ENC_SYSFS_PATH") check_and_set_led "$ZEVENT_VDEV_ENC_SYSFS_PATH/fault" "$val"
check_and_set_led "$ledpath" "$val"
else else
# Process the entire pool # Process the entire pool
poolname=$(zed_guid_to_pool "$ZEVENT_POOL_GUID") poolname=$(zed_guid_to_pool "$ZEVENT_POOL_GUID")
+5 -6
View File
@@ -15,7 +15,7 @@
# Send notification in response to a fault induced statechange # Send notification in response to a fault induced statechange
# #
# ZEVENT_SUBCLASS: 'statechange' # ZEVENT_SUBCLASS: 'statechange'
# ZEVENT_VDEV_STATE_STR: 'DEGRADED', 'FAULTED', 'REMOVED', or 'UNAVAIL' # ZEVENT_VDEV_STATE_STR: 'DEGRADED', 'FAULTED' or 'REMOVED'
# #
# Exit codes: # Exit codes:
# 0: notification sent # 0: notification sent
@@ -31,14 +31,13 @@
if [ "${ZEVENT_VDEV_STATE_STR}" != "FAULTED" ] \ if [ "${ZEVENT_VDEV_STATE_STR}" != "FAULTED" ] \
&& [ "${ZEVENT_VDEV_STATE_STR}" != "DEGRADED" ] \ && [ "${ZEVENT_VDEV_STATE_STR}" != "DEGRADED" ] \
&& [ "${ZEVENT_VDEV_STATE_STR}" != "REMOVED" ] \ && [ "${ZEVENT_VDEV_STATE_STR}" != "REMOVED" ]; then
&& [ "${ZEVENT_VDEV_STATE_STR}" != "UNAVAIL" ]; then
exit 3 exit 3
fi fi
umask 077 umask 077
note_subject="ZFS device fault for pool ${ZEVENT_POOL} on $(hostname)" note_subject="ZFS device fault for pool ${ZEVENT_POOL_GUID} on $(hostname)"
note_pathname="$(mktemp)" note_pathname="${TMPDIR:="/tmp"}/$(basename -- "$0").${ZEVENT_EID}.$$"
{ {
if [ "${ZEVENT_VDEV_STATE_STR}" = "FAULTED" ] ; then if [ "${ZEVENT_VDEV_STATE_STR}" = "FAULTED" ] ; then
echo "The number of I/O errors associated with a ZFS device exceeded" echo "The number of I/O errors associated with a ZFS device exceeded"
@@ -65,7 +64,7 @@ note_pathname="$(mktemp)"
[ -n "${ZEVENT_VDEV_GUID}" ] && echo " vguid: ${ZEVENT_VDEV_GUID}" [ -n "${ZEVENT_VDEV_GUID}" ] && echo " vguid: ${ZEVENT_VDEV_GUID}"
[ -n "${ZEVENT_VDEV_DEVID}" ] && echo " devid: ${ZEVENT_VDEV_DEVID}" [ -n "${ZEVENT_VDEV_DEVID}" ] && echo " devid: ${ZEVENT_VDEV_DEVID}"
echo " pool: ${ZEVENT_POOL} (${ZEVENT_POOL_GUID})" echo " pool: ${ZEVENT_POOL_GUID}"
} > "${note_pathname}" } > "${note_pathname}"
-37
View File
@@ -1,37 +0,0 @@
#!/bin/sh
#
# Send notification in response to a TRIM_FINISH. The event
# will be received for each vdev in the pool which was trimmed.
#
# Exit codes:
# 0: notification sent
# 1: notification failed
# 2: notification not configured
# 9: internal error
[ -f "${ZED_ZEDLET_DIR}/zed.rc" ] && . "${ZED_ZEDLET_DIR}/zed.rc"
. "${ZED_ZEDLET_DIR}/zed-functions.sh"
[ -n "${ZEVENT_POOL}" ] || exit 9
[ -n "${ZEVENT_SUBCLASS}" ] || exit 9
zed_check_cmd "${ZPOOL}" || exit 9
umask 077
note_subject="ZFS ${ZEVENT_SUBCLASS} event for ${ZEVENT_POOL} on $(hostname)"
note_pathname="$(mktemp)"
{
echo "ZFS has finished a trim:"
echo
echo " eid: ${ZEVENT_EID}"
echo " class: ${ZEVENT_SUBCLASS}"
echo " host: $(hostname)"
echo " time: ${ZEVENT_TIME_STRING}"
"${ZPOOL}" status -t "${ZEVENT_POOL}"
} > "${note_pathname}"
zed_notify "${note_subject}" "${note_pathname}"; rv=$?
rm -f "${note_pathname}"
exit "${rv}"
+18 -185
View File
@@ -77,7 +77,7 @@ zed_log_msg()
zed_log_err() zed_log_err()
{ {
logger -p "${ZED_SYSLOG_PRIORITY}" -t "${ZED_SYSLOG_TAG}" -- "error:" \ logger -p "${ZED_SYSLOG_PRIORITY}" -t "${ZED_SYSLOG_TAG}" -- "error:" \
"${0##*/}:""${ZEVENT_EID:+" eid=${ZEVENT_EID}:"}" "$@" "$(basename -- "$0"):""${ZEVENT_EID:+" eid=${ZEVENT_EID}:"}" "$@"
} }
@@ -126,8 +126,10 @@ zed_lock()
# Obtain a lock on the file bound to the given file descriptor. # Obtain a lock on the file bound to the given file descriptor.
# #
eval "exec ${fd}>> '${lockfile}'" eval "exec ${fd}> '${lockfile}'"
if ! err="$(flock --exclusive "${fd}" 2>&1)"; then err="$(flock --exclusive "${fd}" 2>&1)"
# shellcheck disable=SC2181
if [ $? -ne 0 ]; then
zed_log_err "failed to lock \"${lockfile}\": ${err}" zed_log_err "failed to lock \"${lockfile}\": ${err}"
fi fi
@@ -163,7 +165,9 @@ zed_unlock()
fi fi
# Release the lock and close the file descriptor. # Release the lock and close the file descriptor.
if ! err="$(flock --unlock "${fd}" 2>&1)"; then err="$(flock --unlock "${fd}" 2>&1)"
# shellcheck disable=SC2181
if [ $? -ne 0 ]; then
zed_log_err "failed to unlock \"${lockfile}\": ${err}" zed_log_err "failed to unlock \"${lockfile}\": ${err}"
fi fi
eval "exec ${fd}>&-" eval "exec ${fd}>&-"
@@ -198,14 +202,6 @@ zed_notify()
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1)) [ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1)) [ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
zed_notify_slack_webhook "${subject}" "${pathname}"; rv=$?
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
zed_notify_pushover "${subject}" "${pathname}"; rv=$?
[ "${rv}" -eq 0 ] && num_success=$((num_success + 1))
[ "${rv}" -eq 1 ] && num_failure=$((num_failure + 1))
[ "${num_success}" -gt 0 ] && return 0 [ "${num_success}" -gt 0 ] && return 0
[ "${num_failure}" -gt 0 ] && return 1 [ "${num_failure}" -gt 0 ] && return 1
return 2 return 2
@@ -224,8 +220,6 @@ zed_notify()
# ZED_EMAIL_OPTS. This undergoes the following keyword substitutions: # ZED_EMAIL_OPTS. This undergoes the following keyword substitutions:
# - @ADDRESS@ is replaced with the space-delimited recipient email address(es) # - @ADDRESS@ is replaced with the space-delimited recipient email address(es)
# - @SUBJECT@ is replaced with the notification subject # - @SUBJECT@ is replaced with the notification subject
# If @SUBJECT@ was omited here, a "Subject: ..." header will be added to notification
#
# #
# Arguments # Arguments
# subject: notification subject # subject: notification subject
@@ -243,7 +237,7 @@ zed_notify()
# #
zed_notify_email() zed_notify_email()
{ {
local subject="${1:-"ZED notification"}" local subject="$1"
local pathname="${2:-"/dev/null"}" local pathname="${2:-"/dev/null"}"
: "${ZED_EMAIL_PROG:="mail"}" : "${ZED_EMAIL_PROG:="mail"}"
@@ -260,30 +254,19 @@ zed_notify_email()
[ -n "${subject}" ] || return 1 [ -n "${subject}" ] || return 1
if [ ! -r "${pathname}" ]; then if [ ! -r "${pathname}" ]; then
zed_log_err \ zed_log_err \
"${ZED_EMAIL_PROG##*/} cannot read \"${pathname}\"" "$(basename "${ZED_EMAIL_PROG}") cannot read \"${pathname}\""
return 1 return 1
fi fi
# construct cmdline options ZED_EMAIL_OPTS="$(echo "${ZED_EMAIL_OPTS}" \
ZED_EMAIL_OPTS_PARSED="$(echo "${ZED_EMAIL_OPTS}" \
| sed -e "s/@ADDRESS@/${ZED_EMAIL_ADDR}/g" \ | sed -e "s/@ADDRESS@/${ZED_EMAIL_ADDR}/g" \
-e "s/@SUBJECT@/${subject}/g")" -e "s/@SUBJECT@/${subject}/g")"
# pipe message to email prog # shellcheck disable=SC2086
# shellcheck disable=SC2086,SC2248 eval "${ZED_EMAIL_PROG}" ${ZED_EMAIL_OPTS} < "${pathname}" >/dev/null 2>&1
{
# no subject passed as option?
if [ "${ZED_EMAIL_OPTS%@SUBJECT@*}" = "${ZED_EMAIL_OPTS}" ] ; then
# inject subject header
printf "Subject: %s\n" "${subject}"
fi
# output message
cat "${pathname}"
} |
eval ${ZED_EMAIL_PROG} ${ZED_EMAIL_OPTS_PARSED} >/dev/null 2>&1
rv=$? rv=$?
if [ "${rv}" -ne 0 ]; then if [ "${rv}" -ne 0 ]; then
zed_log_err "${ZED_EMAIL_PROG##*/} exit=${rv}" zed_log_err "$(basename "${ZED_EMAIL_PROG}") exit=${rv}"
return 1 return 1
fi fi
return 0 return 0
@@ -376,158 +359,6 @@ zed_notify_pushbullet()
} }
# zed_notify_slack_webhook (subject, pathname)
#
# Notification via Slack Webhook <https://api.slack.com/incoming-webhooks>.
# The Webhook URL (ZED_SLACK_WEBHOOK_URL) identifies this client to the
# Slack channel.
#
# Requires awk, curl, and sed executables to be installed in the standard PATH.
#
# References
# https://api.slack.com/incoming-webhooks
#
# Arguments
# subject: notification subject
# pathname: pathname containing the notification message (OPTIONAL)
#
# Globals
# ZED_SLACK_WEBHOOK_URL
#
# Return
# 0: notification sent
# 1: notification failed
# 2: not configured
#
zed_notify_slack_webhook()
{
[ -n "${ZED_SLACK_WEBHOOK_URL}" ] || return 2
local subject="$1"
local pathname="${2:-"/dev/null"}"
local msg_body
local msg_tag
local msg_json
local msg_out
local msg_err
local url="${ZED_SLACK_WEBHOOK_URL}"
[ -n "${subject}" ] || return 1
if [ ! -r "${pathname}" ]; then
zed_log_err "slack webhook cannot read \"${pathname}\""
return 1
fi
zed_check_cmd "awk" "curl" "sed" || return 1
# Escape the following characters in the message body for JSON:
# newline, backslash, double quote, horizontal tab, vertical tab,
# and carriage return.
#
msg_body="$(awk '{ ORS="\\n" } { gsub(/\\/, "\\\\"); gsub(/"/, "\\\"");
gsub(/\t/, "\\t"); gsub(/\f/, "\\f"); gsub(/\r/, "\\r"); print }' \
"${pathname}")"
# Construct the JSON message for posting.
#
msg_json="$(printf '{"text": "*%s*\\n%s"}' "${subject}" "${msg_body}" )"
# Send the POST request and check for errors.
#
msg_out="$(curl -X POST "${url}" \
--header "Content-Type: application/json" --data-binary "${msg_json}" \
2>/dev/null)"; rv=$?
if [ "${rv}" -ne 0 ]; then
zed_log_err "curl exit=${rv}"
return 1
fi
msg_err="$(echo "${msg_out}" \
| sed -n -e 's/.*"error" *:.*"message" *: *"\([^"]*\)".*/\1/p')"
if [ -n "${msg_err}" ]; then
zed_log_err "slack webhook \"${msg_err}"\"
return 1
fi
return 0
}
# zed_notify_pushover (subject, pathname)
#
# Send a notification via Pushover <https://pushover.net/>.
# The access token (ZED_PUSHOVER_TOKEN) identifies this client to the
# Pushover server. The user token (ZED_PUSHOVER_USER) defines the user or
# group to which the notification will be sent.
#
# Requires curl and sed executables to be installed in the standard PATH.
#
# References
# https://pushover.net/api
#
# Arguments
# subject: notification subject
# pathname: pathname containing the notification message (OPTIONAL)
#
# Globals
# ZED_PUSHOVER_TOKEN
# ZED_PUSHOVER_USER
#
# Return
# 0: notification sent
# 1: notification failed
# 2: not configured
#
zed_notify_pushover()
{
local subject="$1"
local pathname="${2:-"/dev/null"}"
local msg_body
local msg_out
local msg_err
local url="https://api.pushover.net/1/messages.json"
[ -n "${ZED_PUSHOVER_TOKEN}" ] && [ -n "${ZED_PUSHOVER_USER}" ] || return 2
if [ ! -r "${pathname}" ]; then
zed_log_err "pushover cannot read \"${pathname}\""
return 1
fi
zed_check_cmd "curl" "sed" || return 1
# Read the message body in.
#
msg_body="$(cat "${pathname}")"
if [ -z "${msg_body}" ]
then
msg_body=$subject
subject=""
fi
# Send the POST request and check for errors.
#
msg_out="$( \
curl \
--form-string "token=${ZED_PUSHOVER_TOKEN}" \
--form-string "user=${ZED_PUSHOVER_USER}" \
--form-string "message=${msg_body}" \
--form-string "title=${subject}" \
"${url}" \
2>/dev/null \
)"; rv=$?
if [ "${rv}" -ne 0 ]; then
zed_log_err "curl exit=${rv}"
return 1
fi
msg_err="$(echo "${msg_out}" \
| sed -n -e 's/.*"errors" *:.*\[\(.*\)\].*/\1/p')"
if [ -n "${msg_err}" ]; then
zed_log_err "pushover \"${msg_err}"\"
return 1
fi
return 0
}
# zed_rate_limit (tag, [interval]) # zed_rate_limit (tag, [interval])
# #
# Check whether an event of a given type [tag] has already occurred within the # Check whether an event of a given type [tag] has already occurred within the
@@ -602,8 +433,10 @@ zed_guid_to_pool()
return return
fi fi
guid="$(printf "%u" "$1")" guid=$(printf "%llu" "$1")
$ZPOOL get -H -ovalue,name guid | awk '$1 == '"$guid"' {print $2; exit}' if [ ! -z "$guid" ] ; then
$ZPOOL get -H -ovalue,name guid | awk '$1=='"$guid"' {print $2}'
fi
} }
# zed_exit_if_ignoring_this_event # zed_exit_if_ignoring_this_event
+8 -40
View File
@@ -13,9 +13,9 @@
# Email address of the zpool administrator for receipt of notifications; # Email address of the zpool administrator for receipt of notifications;
# multiple addresses can be specified if they are delimited by whitespace. # multiple addresses can be specified if they are delimited by whitespace.
# Email will only be sent if ZED_EMAIL_ADDR is defined. # Email will only be sent if ZED_EMAIL_ADDR is defined.
# Enabled by default; comment to disable. # Disabled by default; uncomment to enable.
# #
ZED_EMAIL_ADDR="root" #ZED_EMAIL_ADDR="root"
## ##
# Name or path of executable responsible for sending notifications via email; # Name or path of executable responsible for sending notifications via email;
@@ -30,7 +30,6 @@ ZED_EMAIL_ADDR="root"
# The string @SUBJECT@ will be replaced with the notification subject; # The string @SUBJECT@ will be replaced with the notification subject;
# this should be protected with quotes to prevent word-splitting. # this should be protected with quotes to prevent word-splitting.
# Email will only be sent if ZED_EMAIL_ADDR is defined. # Email will only be sent if ZED_EMAIL_ADDR is defined.
# If @SUBJECT@ was omited here, a "Subject: ..." header will be added to notification
# #
#ZED_EMAIL_OPTS="-s '@SUBJECT@' @ADDRESS@" #ZED_EMAIL_OPTS="-s '@SUBJECT@' @ADDRESS@"
@@ -53,9 +52,9 @@ ZED_EMAIL_ADDR="root"
## ##
# Send notifications for 'ereport.fs.zfs.data' events. # Send notifications for 'ereport.fs.zfs.data' events.
# Disabled by default, any non-empty value will enable the feature. # Disabled by default
# #
#ZED_NOTIFY_DATA= #ZED_NOTIFY_DATA=1
## ##
# Pushbullet access token. # Pushbullet access token.
@@ -75,31 +74,6 @@ ZED_EMAIL_ADDR="root"
# #
#ZED_PUSHBULLET_CHANNEL_TAG="" #ZED_PUSHBULLET_CHANNEL_TAG=""
##
# Slack Webhook URL.
# This allows posting to the given channel and includes an access token.
# <https://api.slack.com/incoming-webhooks>
# Disabled by default; uncomment to enable.
#
#ZED_SLACK_WEBHOOK_URL=""
##
# Pushover token.
# This defines the application from which the notification will be sent.
# <https://pushover.net/api#registration>
# Disabled by default; uncomment to enable.
# ZED_PUSHOVER_USER, below, must also be configured.
#
#ZED_PUSHOVER_TOKEN=""
##
# Pushover user key.
# This defines which user or group will receive Pushover notifications.
# <https://pushover.net/api#identifiers>
# Disabled by default; uncomment to enable.
# ZED_PUSHOVER_TOKEN, above, must also be configured.
#ZED_PUSHOVER_USER=""
## ##
# Default directory for zed state files. # Default directory for zed state files.
# #
@@ -107,15 +81,14 @@ ZED_EMAIL_ADDR="root"
## ##
# Turn on/off enclosure LEDs when drives get DEGRADED/FAULTED. This works for # Turn on/off enclosure LEDs when drives get DEGRADED/FAULTED. This works for
# device mapper and multipath devices as well. This works with JBOD enclosures # device mapper and multipath devices as well. Your enclosure must be
# and NVMe PCI drives (assuming they're supported by Linux in sysfs). # supported by the Linux SES driver for this to work.
# #
ZED_USE_ENCLOSURE_LEDS=1 ZED_USE_ENCLOSURE_LEDS=1
## ##
# Run a scrub after every resilver # Run a scrub after every resilver
# Disabled by default, 1 to enable and 0 to disable. #ZED_SCRUB_AFTER_RESILVER=1
#ZED_SCRUB_AFTER_RESILVER=0
## ##
# The syslog priority (e.g., specified as a "facility.level" pair). # The syslog priority (e.g., specified as a "facility.level" pair).
@@ -136,10 +109,5 @@ ZED_USE_ENCLOSURE_LEDS=1
# Otherwise, if ZED_SYSLOG_SUBCLASS_EXCLUDE is set, the # Otherwise, if ZED_SYSLOG_SUBCLASS_EXCLUDE is set, the
# matching subclasses are excluded from logging. # matching subclasses are excluded from logging.
#ZED_SYSLOG_SUBCLASS_INCLUDE="checksum|scrub_*|vdev.*" #ZED_SYSLOG_SUBCLASS_INCLUDE="checksum|scrub_*|vdev.*"
ZED_SYSLOG_SUBCLASS_EXCLUDE="history_event" #ZED_SYSLOG_SUBCLASS_EXCLUDE="statechange|config_*|history_event"
##
# Use GUIDs instead of names when logging pool and vdevs
# Disabled by default, 1 to enable and 0 to disable.
#ZED_SYSLOG_DISPLAY_GUIDS=1
+18 -3
View File
@@ -1,9 +1,9 @@
/* /*
* This file is part of the ZFS Event Daemon (ZED). * This file is part of the ZFS Event Daemon (ZED)
* * for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). * Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. * Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution. * Refer to the ZoL git commit log for authoritative copyright attribution.
* *
* The contents of this file are subject to the terms of the * The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0). * Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -15,6 +15,11 @@
#ifndef ZED_H #ifndef ZED_H
#define ZED_H #define ZED_H
/*
* Absolute path for the default zed configuration file.
*/
#define ZED_CONF_FILE SYSCONFDIR "/zfs/zed.conf"
/* /*
* Absolute path for the default zed pid file. * Absolute path for the default zed pid file.
*/ */
@@ -30,6 +35,16 @@
*/ */
#define ZED_ZEDLET_DIR SYSCONFDIR "/zfs/zed.d" #define ZED_ZEDLET_DIR SYSCONFDIR "/zfs/zed.d"
/*
* Reserved for future use.
*/
#define ZED_MAX_EVENTS 0
/*
* Reserved for future use.
*/
#define ZED_MIN_EVENTS 0
/* /*
* String prefix for ZED variables passed via environment variables. * String prefix for ZED variables passed via environment variables.
*/ */
+124 -100
View File
@@ -1,9 +1,9 @@
/* /*
* This file is part of the ZFS Event Daemon (ZED). * This file is part of the ZFS Event Daemon (ZED)
* * for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). * Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. * Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution. * Refer to the ZoL git commit log for authoritative copyright attribution.
* *
* The contents of this file are subject to the terms of the * The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0). * Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -22,7 +22,6 @@
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>
#include <string.h> #include <string.h>
#include <sys/types.h>
#include <sys/stat.h> #include <sys/stat.h>
#include <sys/uio.h> #include <sys/uio.h>
#include <unistd.h> #include <unistd.h>
@@ -33,26 +32,43 @@
#include "zed_strings.h" #include "zed_strings.h"
/* /*
* Initialise the configuration with default values. * Return a new configuration with default values.
*/ */
void struct zed_conf *
zed_conf_init(struct zed_conf *zcp) zed_conf_create(void)
{ {
memset(zcp, 0, sizeof (*zcp)); struct zed_conf *zcp;
/* zcp->zfs_hdl opened in zed_event_init() */ zcp = calloc(1, sizeof (*zcp));
/* zcp->zedlets created in zed_conf_scan_dir() */ if (!zcp)
goto nomem;
zcp->pid_fd = -1; /* opened in zed_conf_write_pid() */ zcp->syslog_facility = LOG_DAEMON;
zcp->state_fd = -1; /* opened in zed_conf_open_state() */ zcp->min_events = ZED_MIN_EVENTS;
zcp->zevent_fd = -1; /* opened in zed_event_init() */ zcp->max_events = ZED_MAX_EVENTS;
zcp->pid_fd = -1;
zcp->zedlets = NULL; /* created via zed_conf_scan_dir() */
zcp->state_fd = -1; /* opened via zed_conf_open_state() */
zcp->zfs_hdl = NULL; /* opened via zed_event_init() */
zcp->zevent_fd = -1; /* opened via zed_event_init() */
zcp->max_jobs = 16; if (!(zcp->conf_file = strdup(ZED_CONF_FILE)))
goto nomem;
if (!(zcp->pid_file = strdup(ZED_PID_FILE)) || if (!(zcp->pid_file = strdup(ZED_PID_FILE)))
!(zcp->zedlet_dir = strdup(ZED_ZEDLET_DIR)) || goto nomem;
!(zcp->state_file = strdup(ZED_STATE_FILE)))
zed_log_die("Failed to create conf: %s", strerror(errno)); if (!(zcp->zedlet_dir = strdup(ZED_ZEDLET_DIR)))
goto nomem;
if (!(zcp->state_file = strdup(ZED_STATE_FILE)))
goto nomem;
return (zcp);
nomem:
zed_log_die("Failed to create conf: %s", strerror(errno));
return (NULL);
} }
/* /*
@@ -63,6 +79,9 @@ zed_conf_init(struct zed_conf *zcp)
void void
zed_conf_destroy(struct zed_conf *zcp) zed_conf_destroy(struct zed_conf *zcp)
{ {
if (!zcp)
return;
if (zcp->state_fd >= 0) { if (zcp->state_fd >= 0) {
if (close(zcp->state_fd) < 0) if (close(zcp->state_fd) < 0)
zed_log_msg(LOG_WARNING, zed_log_msg(LOG_WARNING,
@@ -83,6 +102,10 @@ zed_conf_destroy(struct zed_conf *zcp)
zcp->pid_file, strerror(errno)); zcp->pid_file, strerror(errno));
zcp->pid_fd = -1; zcp->pid_fd = -1;
} }
if (zcp->conf_file) {
free(zcp->conf_file);
zcp->conf_file = NULL;
}
if (zcp->pid_file) { if (zcp->pid_file) {
free(zcp->pid_file); free(zcp->pid_file);
zcp->pid_file = NULL; zcp->pid_file = NULL;
@@ -99,6 +122,7 @@ zed_conf_destroy(struct zed_conf *zcp)
zed_strings_destroy(zcp->zedlets); zed_strings_destroy(zcp->zedlets);
zcp->zedlets = NULL; zcp->zedlets = NULL;
} }
free(zcp);
} }
/* /*
@@ -108,52 +132,44 @@ zed_conf_destroy(struct zed_conf *zcp)
* otherwise, output to stderr and exit with a failure status. * otherwise, output to stderr and exit with a failure status.
*/ */
static void static void
_zed_conf_display_help(const char *prog, boolean_t got_err) _zed_conf_display_help(const char *prog, int got_err)
{ {
struct opt { const char *o, *d, *v; };
FILE *fp = got_err ? stderr : stdout; FILE *fp = got_err ? stderr : stdout;
int w1 = 4; /* width of leading whitespace */
struct opt *oo; int w2 = 8; /* width of L-justified option field */
struct opt iopts[] = {
{ .o = "-h", .d = "Display help" },
{ .o = "-L", .d = "Display license information" },
{ .o = "-V", .d = "Display version information" },
{},
};
struct opt nopts[] = {
{ .o = "-v", .d = "Be verbose" },
{ .o = "-f", .d = "Force daemon to run" },
{ .o = "-F", .d = "Run daemon in the foreground" },
{ .o = "-I",
.d = "Idle daemon until kernel module is (re)loaded" },
{ .o = "-M", .d = "Lock all pages in memory" },
{ .o = "-P", .d = "$PATH for ZED to use (only used by ZTS)" },
{ .o = "-Z", .d = "Zero state file" },
{},
};
struct opt vopts[] = {
{ .o = "-d DIR", .d = "Read enabled ZEDLETs from DIR.",
.v = ZED_ZEDLET_DIR },
{ .o = "-p FILE", .d = "Write daemon's PID to FILE.",
.v = ZED_PID_FILE },
{ .o = "-s FILE", .d = "Write daemon's state to FILE.",
.v = ZED_STATE_FILE },
{ .o = "-j JOBS", .d = "Start at most JOBS at once.",
.v = "16" },
{},
};
fprintf(fp, "Usage: %s [OPTION]...\n", (prog ? prog : "zed")); fprintf(fp, "Usage: %s [OPTION]...\n", (prog ? prog : "zed"));
fprintf(fp, "\n"); fprintf(fp, "\n");
for (oo = iopts; oo->o; ++oo) fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-h",
fprintf(fp, " %*s %s\n", -8, oo->o, oo->d); "Display help.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-L",
"Display license information.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-V",
"Display version information.");
fprintf(fp, "\n"); fprintf(fp, "\n");
for (oo = nopts; oo->o; ++oo) fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-v",
fprintf(fp, " %*s %s\n", -8, oo->o, oo->d); "Be verbose.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-f",
"Force daemon to run.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-F",
"Run daemon in the foreground.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-M",
"Lock all pages in memory.");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-P",
"$PATH for ZED to use (only used by ZTS).");
fprintf(fp, "%*c%*s %s\n", w1, 0x20, -w2, "-Z",
"Zero state file.");
fprintf(fp, "\n"); fprintf(fp, "\n");
for (oo = vopts; oo->o; ++oo) #if 0
fprintf(fp, " %*s %s [%s]\n", -8, oo->o, oo->d, oo->v); fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-c FILE",
"Read configuration from FILE.", ZED_CONF_FILE);
#endif
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-d DIR",
"Read enabled ZEDLETs from DIR.", ZED_ZEDLET_DIR);
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-p FILE",
"Write daemon's PID to FILE.", ZED_PID_FILE);
fprintf(fp, "%*c%*s %s [%s]\n", w1, 0x20, -w2, "-s FILE",
"Write daemon's state to FILE.", ZED_STATE_FILE);
fprintf(fp, "\n"); fprintf(fp, "\n");
exit(got_err ? EXIT_FAILURE : EXIT_SUCCESS); exit(got_err ? EXIT_FAILURE : EXIT_SUCCESS);
@@ -165,14 +181,20 @@ _zed_conf_display_help(const char *prog, boolean_t got_err)
static void static void
_zed_conf_display_license(void) _zed_conf_display_license(void)
{ {
printf( const char **pp;
"The ZFS Event Daemon (ZED) is distributed under the terms of the\n" const char *text[] = {
" Common Development and Distribution License (CDDL-1.0)\n" "The ZFS Event Daemon (ZED) is distributed under the terms of the",
" <http://opensource.org/licenses/CDDL-1.0>.\n" " Common Development and Distribution License (CDDL-1.0)",
"\n" " <http://opensource.org/licenses/CDDL-1.0>.",
"",
"Developed at Lawrence Livermore National Laboratory" "Developed at Lawrence Livermore National Laboratory"
" (LLNL-CODE-403049).\n" " (LLNL-CODE-403049).",
"\n"); "",
NULL
};
for (pp = text; *pp; pp++)
printf("%s\n", *pp);
exit(EXIT_SUCCESS); exit(EXIT_SUCCESS);
} }
@@ -207,19 +229,16 @@ _zed_conf_parse_path(char **resultp, const char *path)
if (path[0] == '/') { if (path[0] == '/') {
*resultp = strdup(path); *resultp = strdup(path);
} else if (!getcwd(buf, sizeof (buf))) {
zed_log_die("Failed to get current working dir: %s",
strerror(errno));
} else if (strlcat(buf, "/", sizeof (buf)) >= sizeof (buf)) {
zed_log_die("Failed to copy path: %s", strerror(ENAMETOOLONG));
} else if (strlcat(buf, path, sizeof (buf)) >= sizeof (buf)) {
zed_log_die("Failed to copy path: %s", strerror(ENAMETOOLONG));
} else { } else {
if (!getcwd(buf, sizeof (buf)))
zed_log_die("Failed to get current working dir: %s",
strerror(errno));
if (strlcat(buf, "/", sizeof (buf)) >= sizeof (buf) ||
strlcat(buf, path, sizeof (buf)) >= sizeof (buf))
zed_log_die("Failed to copy path: %s",
strerror(ENAMETOOLONG));
*resultp = strdup(buf); *resultp = strdup(buf);
} }
if (!*resultp) if (!*resultp)
zed_log_die("Failed to copy path: %s", strerror(ENOMEM)); zed_log_die("Failed to copy path: %s", strerror(ENOMEM));
} }
@@ -230,9 +249,8 @@ _zed_conf_parse_path(char **resultp, const char *path)
void void
zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv) zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
{ {
const char * const opts = ":hLVd:p:P:s:vfFMZIj:"; const char * const opts = ":hLVc:d:p:P:s:vfFMZ";
int opt; int opt;
unsigned long raw;
if (!zcp || !argv || !argv[0]) if (!zcp || !argv || !argv[0])
zed_log_die("Failed to parse options: Internal error"); zed_log_die("Failed to parse options: Internal error");
@@ -242,7 +260,7 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
while ((opt = getopt(argc, argv, opts)) != -1) { while ((opt = getopt(argc, argv, opts)) != -1) {
switch (opt) { switch (opt) {
case 'h': case 'h':
_zed_conf_display_help(argv[0], B_FALSE); _zed_conf_display_help(argv[0], EXIT_SUCCESS);
break; break;
case 'L': case 'L':
_zed_conf_display_license(); _zed_conf_display_license();
@@ -250,12 +268,12 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
case 'V': case 'V':
_zed_conf_display_version(); _zed_conf_display_version();
break; break;
case 'c':
_zed_conf_parse_path(&zcp->conf_file, optarg);
break;
case 'd': case 'd':
_zed_conf_parse_path(&zcp->zedlet_dir, optarg); _zed_conf_parse_path(&zcp->zedlet_dir, optarg);
break; break;
case 'I':
zcp->do_idle = 1;
break;
case 'p': case 'p':
_zed_conf_parse_path(&zcp->pid_file, optarg); _zed_conf_parse_path(&zcp->pid_file, optarg);
break; break;
@@ -280,30 +298,31 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
case 'Z': case 'Z':
zcp->do_zero = 1; zcp->do_zero = 1;
break; break;
case 'j':
errno = 0;
raw = strtoul(optarg, NULL, 0);
if (errno == ERANGE || raw > INT16_MAX) {
zed_log_die("%lu is too many jobs", raw);
} if (raw == 0) {
zed_log_die("0 jobs makes no sense");
} else {
zcp->max_jobs = raw;
}
break;
case '?': case '?':
default: default:
if (optopt == '?') if (optopt == '?')
_zed_conf_display_help(argv[0], B_FALSE); _zed_conf_display_help(argv[0], EXIT_SUCCESS);
fprintf(stderr, "%s: Invalid option '-%c'\n\n", fprintf(stderr, "%s: %s '-%c'\n\n", argv[0],
argv[0], optopt); "Invalid option", optopt);
_zed_conf_display_help(argv[0], B_TRUE); _zed_conf_display_help(argv[0], EXIT_FAILURE);
break; break;
} }
} }
} }
/*
* Parse the configuration file into the configuration [zcp].
*
* FIXME: Not yet implemented.
*/
void
zed_conf_parse_file(struct zed_conf *zcp)
{
if (!zcp)
zed_log_die("Failed to parse config: %s", strerror(EINVAL));
}
/* /*
* Scan the [zcp] zedlet_dir for files to exec based on the event class. * Scan the [zcp] zedlet_dir for files to exec based on the event class.
* Files must be executable by user, but not writable by group or other. * Files must be executable by user, but not writable by group or other.
@@ -311,6 +330,8 @@ zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv)
* *
* Return 0 on success with an updated set of zedlets, * Return 0 on success with an updated set of zedlets,
* or -1 on error with errno set. * or -1 on error with errno set.
*
* FIXME: Check if zedlet_dir and all parent dirs are secure.
*/ */
int int
zed_conf_scan_dir(struct zed_conf *zcp) zed_conf_scan_dir(struct zed_conf *zcp)
@@ -426,6 +447,8 @@ zed_conf_scan_dir(struct zed_conf *zcp)
int int
zed_conf_write_pid(struct zed_conf *zcp) zed_conf_write_pid(struct zed_conf *zcp)
{ {
const mode_t dirmode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
const mode_t filemode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;
char buf[PATH_MAX]; char buf[PATH_MAX];
int n; int n;
char *p; char *p;
@@ -453,7 +476,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
if (p) if (p)
*p = '\0'; *p = '\0';
if ((mkdirp(buf, 0755) < 0) && (errno != EEXIST)) { if ((mkdirp(buf, dirmode) < 0) && (errno != EEXIST)) {
zed_log_msg(LOG_ERR, "Failed to create directory \"%s\": %s", zed_log_msg(LOG_ERR, "Failed to create directory \"%s\": %s",
buf, strerror(errno)); buf, strerror(errno));
goto err; goto err;
@@ -463,7 +486,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
*/ */
mask = umask(0); mask = umask(0);
umask(mask | 022); umask(mask | 022);
zcp->pid_fd = open(zcp->pid_file, O_RDWR | O_CREAT | O_CLOEXEC, 0644); zcp->pid_fd = open(zcp->pid_file, (O_RDWR | O_CREAT), filemode);
umask(mask); umask(mask);
if (zcp->pid_fd < 0) { if (zcp->pid_fd < 0) {
zed_log_msg(LOG_ERR, "Failed to open PID file \"%s\": %s", zed_log_msg(LOG_ERR, "Failed to open PID file \"%s\": %s",
@@ -500,7 +523,7 @@ zed_conf_write_pid(struct zed_conf *zcp)
errno = ERANGE; errno = ERANGE;
zed_log_msg(LOG_ERR, "Failed to write PID file \"%s\": %s", zed_log_msg(LOG_ERR, "Failed to write PID file \"%s\": %s",
zcp->pid_file, strerror(errno)); zcp->pid_file, strerror(errno));
} else if (write(zcp->pid_fd, buf, n) != n) { } else if (zed_file_write_n(zcp->pid_fd, buf, n) != n) {
zed_log_msg(LOG_ERR, "Failed to write PID file \"%s\": %s", zed_log_msg(LOG_ERR, "Failed to write PID file \"%s\": %s",
zcp->pid_file, strerror(errno)); zcp->pid_file, strerror(errno));
} else if (fdatasync(zcp->pid_fd) < 0) { } else if (fdatasync(zcp->pid_fd) < 0) {
@@ -528,6 +551,7 @@ int
zed_conf_open_state(struct zed_conf *zcp) zed_conf_open_state(struct zed_conf *zcp)
{ {
char dirbuf[PATH_MAX]; char dirbuf[PATH_MAX];
mode_t dirmode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
int n; int n;
char *p; char *p;
int rv; int rv;
@@ -549,7 +573,7 @@ zed_conf_open_state(struct zed_conf *zcp)
if (p) if (p)
*p = '\0'; *p = '\0';
if ((mkdirp(dirbuf, 0755) < 0) && (errno != EEXIST)) { if ((mkdirp(dirbuf, dirmode) < 0) && (errno != EEXIST)) {
zed_log_msg(LOG_WARNING, zed_log_msg(LOG_WARNING,
"Failed to create directory \"%s\": %s", "Failed to create directory \"%s\": %s",
dirbuf, strerror(errno)); dirbuf, strerror(errno));
@@ -567,7 +591,7 @@ zed_conf_open_state(struct zed_conf *zcp)
(void) unlink(zcp->state_file); (void) unlink(zcp->state_file);
zcp->state_fd = open(zcp->state_file, zcp->state_fd = open(zcp->state_file,
O_RDWR | O_CREAT | O_CLOEXEC, 0644); (O_RDWR | O_CREAT), (S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH));
if (zcp->state_fd < 0) { if (zcp->state_fd < 0) {
zed_log_msg(LOG_WARNING, "Failed to open state file \"%s\": %s", zed_log_msg(LOG_WARNING, "Failed to open state file \"%s\": %s",
zcp->state_file, strerror(errno)); zcp->state_file, strerror(errno));
+23 -20
View File
@@ -1,9 +1,9 @@
/* /*
* This file is part of the ZFS Event Daemon (ZED). * This file is part of the ZFS Event Daemon (ZED)
* * for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). * Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. * Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution. * Refer to the ZoL git commit log for authoritative copyright attribution.
* *
* The contents of this file are subject to the terms of the * The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0). * Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -20,39 +20,42 @@
#include "zed_strings.h" #include "zed_strings.h"
struct zed_conf { struct zed_conf {
unsigned do_force:1; /* true if force enabled */
unsigned do_foreground:1; /* true if run in foreground */
unsigned do_memlock:1; /* true if locking memory */
unsigned do_verbose:1; /* true if verbosity enabled */
unsigned do_zero:1; /* true if zeroing state */
int syslog_facility; /* syslog facility value */
int min_events; /* RESERVED FOR FUTURE USE */
int max_events; /* RESERVED FOR FUTURE USE */
char *conf_file; /* abs path to config file */
char *pid_file; /* abs path to pid file */ char *pid_file; /* abs path to pid file */
char *zedlet_dir; /* abs path to zedlet dir */
char *state_file; /* abs path to state file */
libzfs_handle_t *zfs_hdl; /* handle to libzfs */
zed_strings_t *zedlets; /* names of enabled zedlets */
char *path; /* custom $PATH for zedlets to use */
int pid_fd; /* fd to pid file for lock */ int pid_fd; /* fd to pid file for lock */
char *zedlet_dir; /* abs path to zedlet dir */
zed_strings_t *zedlets; /* names of enabled zedlets */
char *state_file; /* abs path to state file */
int state_fd; /* fd to state file */ int state_fd; /* fd to state file */
libzfs_handle_t *zfs_hdl; /* handle to libzfs */
int zevent_fd; /* fd for access to zevents */ int zevent_fd; /* fd for access to zevents */
char *path; /* custom $PATH for zedlets to use */
int16_t max_jobs; /* max zedlets to run at one time */
boolean_t do_force:1; /* true if force enabled */
boolean_t do_foreground:1; /* true if run in foreground */
boolean_t do_memlock:1; /* true if locking memory */
boolean_t do_verbose:1; /* true if verbosity enabled */
boolean_t do_zero:1; /* true if zeroing state */
boolean_t do_idle:1; /* true if idle enabled */
}; };
void zed_conf_init(struct zed_conf *zcp); struct zed_conf *zed_conf_create(void);
void zed_conf_destroy(struct zed_conf *zcp); void zed_conf_destroy(struct zed_conf *zcp);
void zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv); void zed_conf_parse_opts(struct zed_conf *zcp, int argc, char **argv);
void zed_conf_parse_file(struct zed_conf *zcp);
int zed_conf_scan_dir(struct zed_conf *zcp); int zed_conf_scan_dir(struct zed_conf *zcp);
int zed_conf_write_pid(struct zed_conf *zcp); int zed_conf_write_pid(struct zed_conf *zcp);
int zed_conf_open_state(struct zed_conf *zcp); int zed_conf_open_state(struct zed_conf *zcp);
int zed_conf_read_state(struct zed_conf *zcp, uint64_t *eidp, int64_t etime[]); int zed_conf_read_state(struct zed_conf *zcp, uint64_t *eidp, int64_t etime[]);
int zed_conf_write_state(struct zed_conf *zcp, uint64_t eid, int64_t etime[]); int zed_conf_write_state(struct zed_conf *zcp, uint64_t eid, int64_t etime[]);
#endif /* !ZED_CONF_H */ #endif /* !ZED_CONF_H */
+10 -64
View File
@@ -21,7 +21,6 @@
#include <libnvpair.h> #include <libnvpair.h>
#include <libudev.h> #include <libudev.h>
#include <libzfs.h> #include <libzfs.h>
#include <libzutil.h>
#include <pthread.h> #include <pthread.h>
#include <stdlib.h> #include <stdlib.h>
#include <string.h> #include <string.h>
@@ -38,7 +37,7 @@
* A libudev monitor is established to monitor block device actions and pass * A libudev monitor is established to monitor block device actions and pass
* them on to internal ZED logic modules. Initially, zfs_mod.c is the only * them on to internal ZED logic modules. Initially, zfs_mod.c is the only
* consumer and is the Linux equivalent for the illumos syseventd ZFS SLM * consumer and is the Linux equivalent for the illumos syseventd ZFS SLM
* module responsible for handling disk events for ZFS. * module responsible for handeling disk events for ZFS.
*/ */
pthread_t g_mon_tid; pthread_t g_mon_tid;
@@ -72,14 +71,10 @@ zed_udev_event(const char *class, const char *subclass, nvlist_t *nvl)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_PATH, strval); zed_log_msg(LOG_INFO, "\t%s: %s", DEV_PATH, strval);
if (nvlist_lookup_string(nvl, DEV_IDENTIFIER, &strval) == 0) if (nvlist_lookup_string(nvl, DEV_IDENTIFIER, &strval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_IDENTIFIER, strval); zed_log_msg(LOG_INFO, "\t%s: %s", DEV_IDENTIFIER, strval);
if (nvlist_lookup_boolean(nvl, DEV_IS_PART) == B_TRUE)
zed_log_msg(LOG_INFO, "\t%s: B_TRUE", DEV_IS_PART);
if (nvlist_lookup_string(nvl, DEV_PHYS_PATH, &strval) == 0) if (nvlist_lookup_string(nvl, DEV_PHYS_PATH, &strval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %s", DEV_PHYS_PATH, strval); zed_log_msg(LOG_INFO, "\t%s: %s", DEV_PHYS_PATH, strval);
if (nvlist_lookup_uint64(nvl, DEV_SIZE, &numval) == 0) if (nvlist_lookup_uint64(nvl, DEV_SIZE, &numval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %llu", DEV_SIZE, numval); zed_log_msg(LOG_INFO, "\t%s: %llu", DEV_SIZE, numval);
if (nvlist_lookup_uint64(nvl, DEV_PARENT_SIZE, &numval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %llu", DEV_PARENT_SIZE, numval);
if (nvlist_lookup_uint64(nvl, ZFS_EV_POOL_GUID, &numval) == 0) if (nvlist_lookup_uint64(nvl, ZFS_EV_POOL_GUID, &numval) == 0)
zed_log_msg(LOG_INFO, "\t%s: %llu", ZFS_EV_POOL_GUID, numval); zed_log_msg(LOG_INFO, "\t%s: %llu", ZFS_EV_POOL_GUID, numval);
if (nvlist_lookup_uint64(nvl, ZFS_EV_VDEV_GUID, &numval) == 0) if (nvlist_lookup_uint64(nvl, ZFS_EV_VDEV_GUID, &numval) == 0)
@@ -132,20 +127,6 @@ dev_event_nvlist(struct udev_device *dev)
numval *= strtoull(value, NULL, 10); numval *= strtoull(value, NULL, 10);
(void) nvlist_add_uint64(nvl, DEV_SIZE, numval); (void) nvlist_add_uint64(nvl, DEV_SIZE, numval);
/*
* If the device has a parent, then get the parent block
* device's size as well. For example, /dev/sda1's parent
* is /dev/sda.
*/
struct udev_device *parent_dev = udev_device_get_parent(dev);
if ((value = udev_device_get_sysattr_value(parent_dev, "size"))
!= NULL) {
uint64_t numval = DEV_BSIZE;
numval *= strtoull(value, NULL, 10);
(void) nvlist_add_uint64(nvl, DEV_PARENT_SIZE, numval);
}
} }
/* /*
@@ -185,7 +166,7 @@ zed_udev_monitor(void *arg)
while (1) { while (1) {
struct udev_device *dev; struct udev_device *dev;
const char *action, *type, *part, *sectors; const char *action, *type, *part, *sectors;
const char *bus, *uuid, *devpath; const char *bus, *uuid;
const char *class, *subclass; const char *class, *subclass;
nvlist_t *nvl; nvlist_t *nvl;
boolean_t is_zfs = B_FALSE; boolean_t is_zfs = B_FALSE;
@@ -224,12 +205,6 @@ zed_udev_monitor(void *arg)
* if this is a disk and it is partitioned, then the * if this is a disk and it is partitioned, then the
* zfs label will reside in a DEVTYPE=partition and * zfs label will reside in a DEVTYPE=partition and
* we can skip passing this event * we can skip passing this event
*
* Special case: Blank disks are sometimes reported with
* an erroneous 'atari' partition, and should not be
* excluded from being used as an autoreplace disk:
*
* https://github.com/openzfs/zfs/issues/13497
*/ */
type = udev_device_get_property_value(dev, "DEVTYPE"); type = udev_device_get_property_value(dev, "DEVTYPE");
part = udev_device_get_property_value(dev, part = udev_device_get_property_value(dev,
@@ -237,23 +212,9 @@ zed_udev_monitor(void *arg)
if (type != NULL && type[0] != '\0' && if (type != NULL && type[0] != '\0' &&
strcmp(type, "disk") == 0 && strcmp(type, "disk") == 0 &&
part != NULL && part[0] != '\0') { part != NULL && part[0] != '\0') {
const char *devname = /* skip and wait for partition event */
udev_device_get_property_value(dev, "DEVNAME"); udev_device_unref(dev);
continue;
if (strcmp(part, "atari") == 0) {
zed_log_msg(LOG_INFO,
"%s: %s is reporting an atari partition, "
"but we're going to assume it's a false "
"positive and still use it (issue #13497)",
__func__, devname);
} else {
zed_log_msg(LOG_INFO,
"%s: skip %s since it has a %s partition "
"already", __func__, devname, part);
/* skip and wait for partition event */
udev_device_unref(dev);
continue;
}
} }
/* /*
@@ -265,11 +226,6 @@ zed_udev_monitor(void *arg)
sectors = udev_device_get_sysattr_value(dev, "size"); sectors = udev_device_get_sysattr_value(dev, "size");
if (sectors != NULL && if (sectors != NULL &&
strtoull(sectors, NULL, 10) < MINIMUM_SECTORS) { strtoull(sectors, NULL, 10) < MINIMUM_SECTORS) {
zed_log_msg(LOG_INFO,
"%s: %s sectors %s < %llu (minimum)",
__func__,
udev_device_get_property_value(dev, "DEVNAME"),
sectors, MINIMUM_SECTORS);
udev_device_unref(dev); udev_device_unref(dev);
continue; continue;
} }
@@ -279,19 +235,10 @@ zed_udev_monitor(void *arg)
* device id string is required in the message schema * device id string is required in the message schema
* for matching with vdevs. Preflight here for expected * for matching with vdevs. Preflight here for expected
* udev information. * udev information.
*
* Special case:
* NVMe devices don't have ID_BUS set (at least on RHEL 7-8),
* but they are valid for autoreplace. Add a special case for
* them by searching for "/nvme/" in the udev DEVPATH:
*
* DEVPATH=/devices/pci0000:00/0000:00:1e.0/nvme/nvme2/nvme2n1
*/ */
bus = udev_device_get_property_value(dev, "ID_BUS"); bus = udev_device_get_property_value(dev, "ID_BUS");
uuid = udev_device_get_property_value(dev, "DM_UUID"); uuid = udev_device_get_property_value(dev, "DM_UUID");
devpath = udev_device_get_devpath(dev); if (!is_zfs && (bus == NULL && uuid == NULL)) {
if (!is_zfs && (bus == NULL && uuid == NULL &&
strstr(devpath, "/nvme/") == NULL)) {
zed_log_msg(LOG_INFO, "zed_udev_monitor: %s no devid " zed_log_msg(LOG_INFO, "zed_udev_monitor: %s no devid "
"source", udev_device_get_devnode(dev)); "source", udev_device_get_devnode(dev));
udev_device_unref(dev); udev_device_unref(dev);
@@ -402,7 +349,7 @@ zed_udev_monitor(void *arg)
} }
int int
zed_disk_event_init(void) zed_disk_event_init()
{ {
int fd, fflags; int fd, fflags;
@@ -431,14 +378,13 @@ zed_disk_event_init(void)
return (-1); return (-1);
} }
pthread_setname_np(g_mon_tid, "udev monitor");
zed_log_msg(LOG_INFO, "zed_disk_event_init"); zed_log_msg(LOG_INFO, "zed_disk_event_init");
return (0); return (0);
} }
void void
zed_disk_event_fini(void) zed_disk_event_fini()
{ {
/* cancel monitor thread at recvmsg() */ /* cancel monitor thread at recvmsg() */
(void) pthread_cancel(g_mon_tid); (void) pthread_cancel(g_mon_tid);
@@ -456,13 +402,13 @@ zed_disk_event_fini(void)
#include "zed_disk_event.h" #include "zed_disk_event.h"
int int
zed_disk_event_init(void) zed_disk_event_init()
{ {
return (0); return (0);
} }
void void
zed_disk_event_fini(void) zed_disk_event_fini()
{ {
} }
+39 -74
View File
@@ -1,9 +1,9 @@
/* /*
* This file is part of the ZFS Event Daemon (ZED). * This file is part of the ZFS Event Daemon (ZED)
* * for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). * Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. * Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution. * Refer to the ZoL git commit log for authoritative copyright attribution.
* *
* The contents of this file are subject to the terms of the * The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0). * Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -15,7 +15,7 @@
#include <ctype.h> #include <ctype.h>
#include <errno.h> #include <errno.h>
#include <fcntl.h> #include <fcntl.h>
#include <libzfs_core.h> #include <libzfs.h> /* FIXME: Replace with libzfs_core. */
#include <paths.h> #include <paths.h>
#include <stdarg.h> #include <stdarg.h>
#include <stdio.h> #include <stdio.h>
@@ -28,7 +28,6 @@
#include "zed.h" #include "zed.h"
#include "zed_conf.h" #include "zed_conf.h"
#include "zed_disk_event.h" #include "zed_disk_event.h"
#include "zed_event.h"
#include "zed_exec.h" #include "zed_exec.h"
#include "zed_file.h" #include "zed_file.h"
#include "zed_log.h" #include "zed_log.h"
@@ -41,36 +40,25 @@
/* /*
* Open the libzfs interface. * Open the libzfs interface.
*/ */
int void
zed_event_init(struct zed_conf *zcp) zed_event_init(struct zed_conf *zcp)
{ {
if (!zcp) if (!zcp)
zed_log_die("Failed zed_event_init: %s", strerror(EINVAL)); zed_log_die("Failed zed_event_init: %s", strerror(EINVAL));
zcp->zfs_hdl = libzfs_init(); zcp->zfs_hdl = libzfs_init();
if (!zcp->zfs_hdl) { if (!zcp->zfs_hdl)
if (zcp->do_idle)
return (-1);
zed_log_die("Failed to initialize libzfs"); zed_log_die("Failed to initialize libzfs");
}
zcp->zevent_fd = open(ZFS_DEV, O_RDWR | O_CLOEXEC); zcp->zevent_fd = open(ZFS_DEV, O_RDWR);
if (zcp->zevent_fd < 0) { if (zcp->zevent_fd < 0)
if (zcp->do_idle)
return (-1);
zed_log_die("Failed to open \"%s\": %s", zed_log_die("Failed to open \"%s\": %s",
ZFS_DEV, strerror(errno)); ZFS_DEV, strerror(errno));
}
zfs_agent_init(zcp->zfs_hdl); zfs_agent_init(zcp->zfs_hdl);
if (zed_disk_event_init() != 0) { if (zed_disk_event_init() != 0)
if (zcp->do_idle)
return (-1);
zed_log_die("Failed to initialize disk events"); zed_log_die("Failed to initialize disk events");
}
return (0);
} }
/* /*
@@ -96,47 +84,6 @@ zed_event_fini(struct zed_conf *zcp)
libzfs_fini(zcp->zfs_hdl); libzfs_fini(zcp->zfs_hdl);
zcp->zfs_hdl = NULL; zcp->zfs_hdl = NULL;
} }
zed_exec_fini();
}
static void
_bump_event_queue_length(void)
{
int zzlm = -1, wr;
char qlen_buf[12] = {0}; /* parameter is int => max "-2147483647\n" */
long int qlen;
zzlm = open("/sys/module/zfs/parameters/zfs_zevent_len_max", O_RDWR);
if (zzlm < 0)
goto done;
if (read(zzlm, qlen_buf, sizeof (qlen_buf)) < 0)
goto done;
qlen_buf[sizeof (qlen_buf) - 1] = '\0';
errno = 0;
qlen = strtol(qlen_buf, NULL, 10);
if (errno == ERANGE)
goto done;
if (qlen <= 0)
qlen = 512; /* default zfs_zevent_len_max value */
else
qlen *= 2;
if (qlen > INT_MAX)
qlen = INT_MAX;
wr = snprintf(qlen_buf, sizeof (qlen_buf), "%ld", qlen);
if (pwrite(zzlm, qlen_buf, wr, 0) < 0)
goto done;
zed_log_msg(LOG_WARNING, "Bumping queue length to %ld", qlen);
done:
if (zzlm > -1)
(void) close(zzlm);
} }
/* /*
@@ -177,7 +124,10 @@ zed_event_seek(struct zed_conf *zcp, uint64_t saved_eid, int64_t saved_etime[])
if (n_dropped > 0) { if (n_dropped > 0) {
zed_log_msg(LOG_WARNING, "Missed %d events", n_dropped); zed_log_msg(LOG_WARNING, "Missed %d events", n_dropped);
_bump_event_queue_length(); /*
* FIXME: Increase max size of event nvlist in
* /sys/module/zfs/parameters/zfs_zevent_len_max ?
*/
} }
if (nvlist_lookup_uint64(nvl, "eid", &eid) != 0) { if (nvlist_lookup_uint64(nvl, "eid", &eid) != 0) {
zed_log_msg(LOG_WARNING, "Failed to lookup zevent eid"); zed_log_msg(LOG_WARNING, "Failed to lookup zevent eid");
@@ -249,7 +199,7 @@ _zed_event_value_is_hex(const char *name)
* *
* All environment variables in [zsp] should be added through this function. * All environment variables in [zsp] should be added through this function.
*/ */
static __attribute__((format(printf, 5, 6))) int static int
_zed_event_add_var(uint64_t eid, zed_strings_t *zsp, _zed_event_add_var(uint64_t eid, zed_strings_t *zsp,
const char *prefix, const char *name, const char *fmt, ...) const char *prefix, const char *name, const char *fmt, ...)
{ {
@@ -624,6 +574,8 @@ _zed_event_add_string_array(uint64_t eid, zed_strings_t *zsp,
* Convert the nvpair [nvp] to a string which is added to the environment * Convert the nvpair [nvp] to a string which is added to the environment
* of the child process. * of the child process.
* Return 0 on success, -1 on error. * Return 0 on success, -1 on error.
*
* FIXME: Refactor with cmd/zpool/zpool_main.c:zpool_do_events_nvprint()?
*/ */
static void static void
_zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp) _zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
@@ -722,11 +674,23 @@ _zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
_zed_event_add_var(eid, zsp, prefix, name, _zed_event_add_var(eid, zsp, prefix, name,
"%llu", (u_longlong_t)i64); "%llu", (u_longlong_t)i64);
break; break;
case DATA_TYPE_NVLIST:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_STRING: case DATA_TYPE_STRING:
(void) nvpair_value_string(nvp, &str); (void) nvpair_value_string(nvp, &str);
_zed_event_add_var(eid, zsp, prefix, name, _zed_event_add_var(eid, zsp, prefix, name,
"%s", (str ? str : "<NULL>")); "%s", (str ? str : "<NULL>"));
break; break;
case DATA_TYPE_BOOLEAN_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_BYTE_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break;
case DATA_TYPE_INT8_ARRAY: case DATA_TYPE_INT8_ARRAY:
_zed_event_add_int8_array(eid, zsp, prefix, nvp); _zed_event_add_int8_array(eid, zsp, prefix, nvp);
break; break;
@@ -754,11 +718,9 @@ _zed_event_add_nvpair(uint64_t eid, zed_strings_t *zsp, nvpair_t *nvp)
case DATA_TYPE_STRING_ARRAY: case DATA_TYPE_STRING_ARRAY:
_zed_event_add_string_array(eid, zsp, prefix, nvp); _zed_event_add_string_array(eid, zsp, prefix, nvp);
break; break;
case DATA_TYPE_NVLIST:
case DATA_TYPE_BOOLEAN_ARRAY:
case DATA_TYPE_BYTE_ARRAY:
case DATA_TYPE_NVLIST_ARRAY: case DATA_TYPE_NVLIST_ARRAY:
_zed_event_add_var(eid, zsp, prefix, name, "_NOT_IMPLEMENTED_"); _zed_event_add_var(eid, zsp, prefix, name,
"%s", "_NOT_IMPLEMENTED_"); /* FIXME */
break; break;
default: default:
errno = EINVAL; errno = EINVAL;
@@ -910,7 +872,7 @@ _zed_event_add_time_strings(uint64_t eid, zed_strings_t *zsp, int64_t etime[])
/* /*
* Service the next zevent, blocking until one is available. * Service the next zevent, blocking until one is available.
*/ */
int void
zed_event_service(struct zed_conf *zcp) zed_event_service(struct zed_conf *zcp)
{ {
nvlist_t *nvl; nvlist_t *nvl;
@@ -928,17 +890,20 @@ zed_event_service(struct zed_conf *zcp)
errno = EINVAL; errno = EINVAL;
zed_log_msg(LOG_ERR, "Failed to service zevent: %s", zed_log_msg(LOG_ERR, "Failed to service zevent: %s",
strerror(errno)); strerror(errno));
return (EINVAL); return;
} }
rv = zpool_events_next(zcp->zfs_hdl, &nvl, &n_dropped, ZEVENT_NONE, rv = zpool_events_next(zcp->zfs_hdl, &nvl, &n_dropped, ZEVENT_NONE,
zcp->zevent_fd); zcp->zevent_fd);
if ((rv != 0) || !nvl) if ((rv != 0) || !nvl)
return (errno); return;
if (n_dropped > 0) { if (n_dropped > 0) {
zed_log_msg(LOG_WARNING, "Missed %d events", n_dropped); zed_log_msg(LOG_WARNING, "Missed %d events", n_dropped);
_bump_event_queue_length(); /*
* FIXME: Increase max size of event nvlist in
* /sys/module/zfs/parameters/zfs_zevent_len_max ?
*/
} }
if (nvlist_lookup_uint64(nvl, "eid", &eid) != 0) { if (nvlist_lookup_uint64(nvl, "eid", &eid) != 0) {
zed_log_msg(LOG_WARNING, "Failed to lookup zevent eid"); zed_log_msg(LOG_WARNING, "Failed to lookup zevent eid");
@@ -976,12 +941,12 @@ zed_event_service(struct zed_conf *zcp)
_zed_event_add_time_strings(eid, zsp, etime); _zed_event_add_time_strings(eid, zsp, etime);
zed_exec_process(eid, class, subclass, zcp, zsp); zed_exec_process(eid, class, subclass,
zcp->zedlet_dir, zcp->zedlets, zsp, zcp->zevent_fd);
zed_conf_write_state(zcp, eid, etime); zed_conf_write_state(zcp, eid, etime);
zed_strings_destroy(zsp); zed_strings_destroy(zsp);
} }
nvlist_free(nvl); nvlist_free(nvl);
return (0);
} }
+5 -5
View File
@@ -1,9 +1,9 @@
/* /*
* This file is part of the ZFS Event Daemon (ZED). * This file is part of the ZFS Event Daemon (ZED)
* * for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). * Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. * Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution. * Refer to the ZoL git commit log for authoritative copyright attribution.
* *
* The contents of this file are subject to the terms of the * The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0). * Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -17,13 +17,13 @@
#include <stdint.h> #include <stdint.h>
int zed_event_init(struct zed_conf *zcp); void zed_event_init(struct zed_conf *zcp);
void zed_event_fini(struct zed_conf *zcp); void zed_event_fini(struct zed_conf *zcp);
int zed_event_seek(struct zed_conf *zcp, uint64_t saved_eid, int zed_event_seek(struct zed_conf *zcp, uint64_t saved_eid,
int64_t saved_etime[]); int64_t saved_etime[]);
int zed_event_service(struct zed_conf *zcp); void zed_event_service(struct zed_conf *zcp);
#endif /* !ZED_EVENT_H */ #endif /* !ZED_EVENT_H */
+61 -200
View File
@@ -1,9 +1,9 @@
/* /*
* This file is part of the ZFS Event Daemon (ZED). * This file is part of the ZFS Event Daemon (ZED)
* * for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). * Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. * Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution. * Refer to the ZoL git commit log for authoritative copyright attribution.
* *
* The contents of this file are subject to the terms of the * The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0). * Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -18,55 +18,16 @@
#include <fcntl.h> #include <fcntl.h>
#include <stdlib.h> #include <stdlib.h>
#include <string.h> #include <string.h>
#include <stddef.h>
#include <sys/avl.h>
#include <sys/resource.h>
#include <sys/stat.h> #include <sys/stat.h>
#include <sys/wait.h> #include <sys/wait.h>
#include <time.h> #include <time.h>
#include <unistd.h> #include <unistd.h>
#include <pthread.h> #include "zed_file.h"
#include <signal.h>
#include "zed_exec.h"
#include "zed_log.h" #include "zed_log.h"
#include "zed_strings.h" #include "zed_strings.h"
#define ZEVENT_FILENO 3 #define ZEVENT_FILENO 3
struct launched_process_node {
avl_node_t node;
pid_t pid;
uint64_t eid;
char *name;
};
static int
_launched_process_node_compare(const void *x1, const void *x2)
{
pid_t p1;
pid_t p2;
assert(x1 != NULL);
assert(x2 != NULL);
p1 = ((const struct launched_process_node *) x1)->pid;
p2 = ((const struct launched_process_node *) x2)->pid;
if (p1 < p2)
return (-1);
else if (p1 == p2)
return (0);
else
return (1);
}
static pthread_t _reap_children_tid = (pthread_t)-1;
static volatile boolean_t _reap_children_stop;
static avl_tree_t _launched_processes;
static pthread_mutex_t _launched_processes_lock = PTHREAD_MUTEX_INITIALIZER;
static int16_t _launched_processes_limit;
/* /*
* Create an environment string array for passing to execve() using the * Create an environment string array for passing to execve() using the
* NAME=VALUE strings in container [zsp]. * NAME=VALUE strings in container [zsp].
@@ -117,26 +78,20 @@ _zed_exec_create_env(zed_strings_t *zsp)
*/ */
static void static void
_zed_exec_fork_child(uint64_t eid, const char *dir, const char *prog, _zed_exec_fork_child(uint64_t eid, const char *dir, const char *prog,
char *env[], int zfd, boolean_t in_foreground) char *env[], int zfd)
{ {
char path[PATH_MAX]; char path[PATH_MAX];
int n; int n;
pid_t pid; pid_t pid;
int fd; int fd;
struct launched_process_node *node; pid_t wpid;
sigset_t mask; int status;
struct timespec launch_timeout =
{ .tv_sec = 0, .tv_nsec = 200 * 1000 * 1000, };
assert(dir != NULL); assert(dir != NULL);
assert(prog != NULL); assert(prog != NULL);
assert(env != NULL); assert(env != NULL);
assert(zfd >= 0); assert(zfd >= 0);
while (__atomic_load_n(&_launched_processes_limit,
__ATOMIC_SEQ_CST) <= 0)
(void) nanosleep(&launch_timeout, NULL);
n = snprintf(path, sizeof (path), "%s/%s", dir, prog); n = snprintf(path, sizeof (path), "%s/%s", dir, prog);
if ((n < 0) || (n >= sizeof (path))) { if ((n < 0) || (n >= sizeof (path))) {
zed_log_msg(LOG_WARNING, zed_log_msg(LOG_WARNING,
@@ -144,179 +99,100 @@ _zed_exec_fork_child(uint64_t eid, const char *dir, const char *prog,
prog, eid, strerror(ENAMETOOLONG)); prog, eid, strerror(ENAMETOOLONG));
return; return;
} }
(void) pthread_mutex_lock(&_launched_processes_lock);
pid = fork(); pid = fork();
if (pid < 0) { if (pid < 0) {
(void) pthread_mutex_unlock(&_launched_processes_lock);
zed_log_msg(LOG_WARNING, zed_log_msg(LOG_WARNING,
"Failed to fork \"%s\" for eid=%llu: %s", "Failed to fork \"%s\" for eid=%llu: %s",
prog, eid, strerror(errno)); prog, eid, strerror(errno));
return; return;
} else if (pid == 0) { } else if (pid == 0) {
(void) sigemptyset(&mask);
(void) sigprocmask(SIG_SETMASK, &mask, NULL);
(void) umask(022); (void) umask(022);
if (in_foreground && /* we're already devnulled if daemonised */ if ((fd = open("/dev/null", O_RDWR)) != -1) {
(fd = open("/dev/null", O_RDWR | O_CLOEXEC)) != -1) {
(void) dup2(fd, STDIN_FILENO); (void) dup2(fd, STDIN_FILENO);
(void) dup2(fd, STDOUT_FILENO); (void) dup2(fd, STDOUT_FILENO);
(void) dup2(fd, STDERR_FILENO); (void) dup2(fd, STDERR_FILENO);
} }
(void) dup2(zfd, ZEVENT_FILENO); (void) dup2(zfd, ZEVENT_FILENO);
zed_file_close_from(ZEVENT_FILENO + 1);
execle(path, prog, NULL, env); execle(path, prog, NULL, env);
_exit(127); _exit(127);
} }
/* parent process */ /* parent process */
node = calloc(1, sizeof (*node));
if (node) {
node->pid = pid;
node->eid = eid;
node->name = strdup(prog);
avl_add(&_launched_processes, node);
}
(void) pthread_mutex_unlock(&_launched_processes_lock);
__atomic_sub_fetch(&_launched_processes_limit, 1, __ATOMIC_SEQ_CST);
zed_log_msg(LOG_INFO, "Invoking \"%s\" eid=%llu pid=%d", zed_log_msg(LOG_INFO, "Invoking \"%s\" eid=%llu pid=%d",
prog, eid, pid); prog, eid, pid);
}
static void /* FIXME: Timeout rogue child processes with sigalarm? */
_nop(int sig)
{}
static void * /*
_reap_children(void *arg) * Wait for child process using WNOHANG to limit
{ * the time spent waiting to 10 seconds (10,000ms).
struct launched_process_node node, *pnode; */
pid_t pid; for (n = 0; n < 1000; n++) {
int status; wpid = waitpid(pid, &status, WNOHANG);
struct rusage usage; if (wpid == (pid_t)-1) {
struct sigaction sa = {}; if (errno == EINTR)
continue;
zed_log_msg(LOG_WARNING,
"Failed to wait for \"%s\" eid=%llu pid=%d",
prog, eid, pid);
break;
} else if (wpid == 0) {
struct timespec t;
(void) sigfillset(&sa.sa_mask); /* child still running */
(void) sigdelset(&sa.sa_mask, SIGCHLD); t.tv_sec = 0;
(void) pthread_sigmask(SIG_SETMASK, &sa.sa_mask, NULL); t.tv_nsec = 10000000; /* 10ms */
(void) nanosleep(&t, NULL);
(void) sigemptyset(&sa.sa_mask); continue;
sa.sa_handler = _nop;
sa.sa_flags = SA_NOCLDSTOP;
(void) sigaction(SIGCHLD, &sa, NULL);
for (_reap_children_stop = B_FALSE; !_reap_children_stop; ) {
(void) pthread_mutex_lock(&_launched_processes_lock);
pid = wait4(0, &status, WNOHANG, &usage);
if (pid == 0 || pid == (pid_t)-1) {
(void) pthread_mutex_unlock(&_launched_processes_lock);
if (pid == 0 || errno == ECHILD)
pause();
else if (errno != EINTR)
zed_log_msg(LOG_WARNING,
"Failed to wait for children: %s",
strerror(errno));
} else {
memset(&node, 0, sizeof (node));
node.pid = pid;
pnode = avl_find(&_launched_processes, &node, NULL);
if (pnode) {
memcpy(&node, pnode, sizeof (node));
avl_remove(&_launched_processes, pnode);
free(pnode);
}
(void) pthread_mutex_unlock(&_launched_processes_lock);
__atomic_add_fetch(&_launched_processes_limit, 1,
__ATOMIC_SEQ_CST);
usage.ru_utime.tv_sec += usage.ru_stime.tv_sec;
usage.ru_utime.tv_usec += usage.ru_stime.tv_usec;
usage.ru_utime.tv_sec +=
usage.ru_utime.tv_usec / (1000 * 1000);
usage.ru_utime.tv_usec %= 1000 * 1000;
if (WIFEXITED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d "
"time=%llu.%06us exit=%d",
node.name, node.eid, pid,
(unsigned long long) usage.ru_utime.tv_sec,
(unsigned int) usage.ru_utime.tv_usec,
WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d "
"time=%llu.%06us sig=%d/%s",
node.name, node.eid, pid,
(unsigned long long) usage.ru_utime.tv_sec,
(unsigned int) usage.ru_utime.tv_usec,
WTERMSIG(status),
strsignal(WTERMSIG(status)));
} else {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d "
"time=%llu.%06us status=0x%X",
node.name, node.eid,
(unsigned long long) usage.ru_utime.tv_sec,
(unsigned int) usage.ru_utime.tv_usec,
(unsigned int) status);
}
free(node.name);
} }
if (WIFEXITED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d exit=%d",
prog, eid, pid, WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d sig=%d/%s",
prog, eid, pid, WTERMSIG(status),
strsignal(WTERMSIG(status)));
} else {
zed_log_msg(LOG_INFO,
"Finished \"%s\" eid=%llu pid=%d status=0x%X",
prog, eid, (unsigned int) status);
}
break;
} }
return (NULL); /*
} * kill child process after 10 seconds
*/
void if (wpid == 0) {
zed_exec_fini(void) zed_log_msg(LOG_WARNING, "Killing hung \"%s\" pid=%d",
{ prog, pid);
struct launched_process_node *node; (void) kill(pid, SIGKILL);
void *ck = NULL;
if (_reap_children_tid == (pthread_t)-1)
return;
_reap_children_stop = B_TRUE;
(void) pthread_kill(_reap_children_tid, SIGCHLD);
(void) pthread_join(_reap_children_tid, NULL);
while ((node = avl_destroy_nodes(&_launched_processes, &ck)) != NULL) {
free(node->name);
free(node);
} }
avl_destroy(&_launched_processes);
(void) pthread_mutex_destroy(&_launched_processes_lock);
(void) pthread_mutex_init(&_launched_processes_lock, NULL);
_reap_children_tid = (pthread_t)-1;
} }
/* /*
* Process the event [eid] by synchronously invoking all zedlets with a * Process the event [eid] by synchronously invoking all zedlets with a
* matching class prefix. * matching class prefix.
* *
* Each executable in [zcp->zedlets] from the directory [zcp->zedlet_dir] * Each executable in [zedlets] from the directory [dir] is matched against
* is matched against the event's [class], [subclass], and the "all" class * the event's [class], [subclass], and the "all" class (which matches
* (which matches all events). * all events). Every zedlet with a matching class prefix is invoked.
* Every zedlet with a matching class prefix is invoked.
* The NAME=VALUE strings in [envs] will be passed to the zedlet as * The NAME=VALUE strings in [envs] will be passed to the zedlet as
* environment variables. * environment variables.
* *
* The file descriptor [zcp->zevent_fd] is the zevent_fd used to track the * The file descriptor [zfd] is the zevent_fd used to track the
* current cursor location within the zevent nvlist. * current cursor location within the zevent nvlist.
* *
* Return 0 on success, -1 on error. * Return 0 on success, -1 on error.
*/ */
int int
zed_exec_process(uint64_t eid, const char *class, const char *subclass, zed_exec_process(uint64_t eid, const char *class, const char *subclass,
struct zed_conf *zcp, zed_strings_t *envs) const char *dir, zed_strings_t *zedlets, zed_strings_t *envs, int zfd)
{ {
const char *class_strings[4]; const char *class_strings[4];
const char *allclass = "all"; const char *allclass = "all";
@@ -325,22 +201,9 @@ zed_exec_process(uint64_t eid, const char *class, const char *subclass,
char **e; char **e;
int n; int n;
if (!zcp->zedlet_dir || !zcp->zedlets || !envs || zcp->zevent_fd < 0) if (!dir || !zedlets || !envs || zfd < 0)
return (-1); return (-1);
if (_reap_children_tid == (pthread_t)-1) {
_launched_processes_limit = zcp->max_jobs;
if (pthread_create(&_reap_children_tid, NULL,
_reap_children, NULL) != 0)
return (-1);
pthread_setname_np(_reap_children_tid, "reap ZEDLETs");
avl_create(&_launched_processes, _launched_process_node_compare,
sizeof (struct launched_process_node),
offsetof(struct launched_process_node, node));
}
csp = class_strings; csp = class_strings;
if (class) if (class)
@@ -356,13 +219,11 @@ zed_exec_process(uint64_t eid, const char *class, const char *subclass,
e = _zed_exec_create_env(envs); e = _zed_exec_create_env(envs);
for (z = zed_strings_first(zcp->zedlets); z; for (z = zed_strings_first(zedlets); z; z = zed_strings_next(zedlets)) {
z = zed_strings_next(zcp->zedlets)) {
for (csp = class_strings; *csp; csp++) { for (csp = class_strings; *csp; csp++) {
n = strlen(*csp); n = strlen(*csp);
if ((strncmp(z, *csp, n) == 0) && !isalpha(z[n])) if ((strncmp(z, *csp, n) == 0) && !isalpha(z[n]))
_zed_exec_fork_child(eid, zcp->zedlet_dir, _zed_exec_fork_child(eid, dir, z, e, zfd);
z, e, zcp->zevent_fd, zcp->do_foreground);
} }
} }
free(e); free(e);
+5 -8
View File
@@ -1,9 +1,9 @@
/* /*
* This file is part of the ZFS Event Daemon (ZED). * This file is part of the ZFS Event Daemon (ZED)
* * for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). * Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. * Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution. * Refer to the ZoL git commit log for authoritative copyright attribution.
* *
* The contents of this file are subject to the terms of the * The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0). * Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -16,12 +16,9 @@
#define ZED_EXEC_H #define ZED_EXEC_H
#include <stdint.h> #include <stdint.h>
#include "zed_strings.h"
#include "zed_conf.h"
void zed_exec_fini(void);
int zed_exec_process(uint64_t eid, const char *class, const char *subclass, int zed_exec_process(uint64_t eid, const char *class, const char *subclass,
struct zed_conf *zcp, zed_strings_t *envs); const char *dir, zed_strings_t *zedlets, zed_strings_t *envs,
int zevent_fd);
#endif /* !ZED_EXEC_H */ #endif /* !ZED_EXEC_H */
+99 -24
View File
@@ -1,9 +1,9 @@
/* /*
* This file is part of the ZFS Event Daemon (ZED). * This file is part of the ZFS Event Daemon (ZED)
* * for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). * Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. * Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution. * Refer to the ZoL git commit log for authoritative copyright attribution.
* *
* The contents of this file are subject to the terms of the * The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0). * Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -12,17 +12,72 @@
* You may not use this file except in compliance with the license. * You may not use this file except in compliance with the license.
*/ */
#include <dirent.h>
#include <errno.h> #include <errno.h>
#include <fcntl.h> #include <fcntl.h>
#include <limits.h> #include <limits.h>
#include <string.h> #include <string.h>
#include <sys/resource.h>
#include <sys/stat.h> #include <sys/stat.h>
#include <sys/types.h> #include <sys/types.h>
#include <unistd.h> #include <unistd.h>
#include "zed_file.h"
#include "zed_log.h" #include "zed_log.h"
/*
* Read up to [n] bytes from [fd] into [buf].
* Return the number of bytes read, 0 on EOF, or -1 on error.
*/
ssize_t
zed_file_read_n(int fd, void *buf, size_t n)
{
unsigned char *p;
size_t n_left;
ssize_t n_read;
p = buf;
n_left = n;
while (n_left > 0) {
if ((n_read = read(fd, p, n_left)) < 0) {
if (errno == EINTR)
continue;
else
return (-1);
} else if (n_read == 0) {
break;
}
n_left -= n_read;
p += n_read;
}
return (n - n_left);
}
/*
* Write [n] bytes from [buf] out to [fd].
* Return the number of bytes written, or -1 on error.
*/
ssize_t
zed_file_write_n(int fd, void *buf, size_t n)
{
const unsigned char *p;
size_t n_left;
ssize_t n_written;
p = buf;
n_left = n;
while (n_left > 0) {
if ((n_written = write(fd, p, n_left)) < 0) {
if (errno == EINTR)
continue;
else
return (-1);
}
n_left -= n_written;
p += n_written;
}
return (n);
}
/* /*
* Set an exclusive advisory lock on the open file descriptor [fd]. * Set an exclusive advisory lock on the open file descriptor [fd].
* Return 0 on success, 1 if a conflicting lock is held by another process, * Return 0 on success, 1 if a conflicting lock is held by another process,
@@ -104,13 +159,6 @@ zed_file_is_locked(int fd)
return (lock.l_pid); return (lock.l_pid);
} }
#if __APPLE__
#define PROC_SELF_FD "/dev/fd"
#else /* Linux-compatible layout */
#define PROC_SELF_FD "/proc/self/fd"
#endif
/* /*
* Close all open file descriptors greater than or equal to [lowfd]. * Close all open file descriptors greater than or equal to [lowfd].
* Any errors encountered while closing file descriptors are ignored. * Any errors encountered while closing file descriptors are ignored.
@@ -118,24 +166,51 @@ zed_file_is_locked(int fd)
void void
zed_file_close_from(int lowfd) zed_file_close_from(int lowfd)
{ {
int errno_bak = errno; const int maxfd_def = 256;
int maxfd = 0; int errno_bak;
struct rlimit rl;
int maxfd;
int fd; int fd;
DIR *fddir;
struct dirent *fdent;
if ((fddir = opendir(PROC_SELF_FD)) != NULL) { errno_bak = errno;
while ((fdent = readdir(fddir)) != NULL) {
fd = atoi(fdent->d_name); if (getrlimit(RLIMIT_NOFILE, &rl) < 0) {
if (fd > maxfd && fd != dirfd(fddir)) maxfd = maxfd_def;
maxfd = fd; } else if (rl.rlim_max == RLIM_INFINITY) {
} maxfd = maxfd_def;
(void) closedir(fddir);
} else { } else {
maxfd = sysconf(_SC_OPEN_MAX); maxfd = rl.rlim_max;
} }
for (fd = lowfd; fd < maxfd; fd++) for (fd = lowfd; fd < maxfd; fd++)
(void) close(fd); (void) close(fd);
errno = errno_bak; errno = errno_bak;
} }
/*
* Set the CLOEXEC flag on file descriptor [fd] so it will be automatically
* closed upon successful execution of one of the exec functions.
* Return 0 on success, or -1 on error.
*
* FIXME: No longer needed?
*/
int
zed_file_close_on_exec(int fd)
{
int flags;
if (fd < 0) {
errno = EBADF;
return (-1);
}
flags = fcntl(fd, F_GETFD);
if (flags == -1)
return (-1);
flags |= FD_CLOEXEC;
if (fcntl(fd, F_SETFD, flags) == -1)
return (-1);
return (0);
}
+9 -3
View File
@@ -1,9 +1,9 @@
/* /*
* This file is part of the ZFS Event Daemon (ZED). * This file is part of the ZFS Event Daemon (ZED)
* * for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). * Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. * Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution. * Refer to the ZoL git commit log for authoritative copyright attribution.
* *
* The contents of this file are subject to the terms of the * The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0). * Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -18,6 +18,10 @@
#include <sys/types.h> #include <sys/types.h>
#include <unistd.h> #include <unistd.h>
ssize_t zed_file_read_n(int fd, void *buf, size_t n);
ssize_t zed_file_write_n(int fd, void *buf, size_t n);
int zed_file_lock(int fd); int zed_file_lock(int fd);
int zed_file_unlock(int fd); int zed_file_unlock(int fd);
@@ -26,4 +30,6 @@ pid_t zed_file_is_locked(int fd);
void zed_file_close_from(int fd); void zed_file_close_from(int fd);
int zed_file_close_on_exec(int fd);
#endif /* !ZED_FILE_H */ #endif /* !ZED_FILE_H */
+3 -3
View File
@@ -1,9 +1,9 @@
/* /*
* This file is part of the ZFS Event Daemon (ZED). * This file is part of the ZFS Event Daemon (ZED)
* * for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). * Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. * Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution. * Refer to the ZoL git commit log for authoritative copyright attribution.
* *
* The contents of this file are subject to the terms of the * The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0). * Common Development and Distribution License Version 1.0 (CDDL-1.0).
+3 -3
View File
@@ -1,9 +1,9 @@
/* /*
* This file is part of the ZFS Event Daemon (ZED). * This file is part of the ZFS Event Daemon (ZED)
* * for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). * Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. * Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution. * Refer to the ZoL git commit log for authoritative copyright attribution.
* *
* The contents of this file are subject to the terms of the * The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0). * Common Development and Distribution License Version 1.0 (CDDL-1.0).
+4 -4
View File
@@ -1,9 +1,9 @@
/* /*
* This file is part of the ZFS Event Daemon (ZED). * This file is part of the ZFS Event Daemon (ZED)
* * for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). * Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. * Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution. * Refer to the ZoL git commit log for authoritative copyright attribution.
* *
* The contents of this file are subject to the terms of the * The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0). * Common Development and Distribution License Version 1.0 (CDDL-1.0).
@@ -108,7 +108,7 @@ _zed_strings_node_destroy(zed_strings_node_t *np)
* If [key] is specified, it will be used to index the node; otherwise, * If [key] is specified, it will be used to index the node; otherwise,
* the string [val] will be used. * the string [val] will be used.
*/ */
static zed_strings_node_t * zed_strings_node_t *
_zed_strings_node_create(const char *key, const char *val) _zed_strings_node_create(const char *key, const char *val)
{ {
zed_strings_node_t *np; zed_strings_node_t *np;
+3 -3
View File
@@ -1,9 +1,9 @@
/* /*
* This file is part of the ZFS Event Daemon (ZED). * This file is part of the ZFS Event Daemon (ZED)
* * for ZFS on Linux (ZoL) <http://zfsonlinux.org/>.
* Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049). * Developed at Lawrence Livermore National Laboratory (LLNL-CODE-403049).
* Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC. * Copyright (C) 2013-2014 Lawrence Livermore National Security, LLC.
* Refer to the OpenZFS git commit log for authoritative copyright attribution. * Refer to the ZoL git commit log for authoritative copyright attribution.
* *
* The contents of this file are subject to the terms of the * The contents of this file are subject to the terms of the
* Common Development and Distribution License Version 1.0 (CDDL-1.0). * Common Development and Distribution License Version 1.0 (CDDL-1.0).
+11 -14
View File
@@ -1,25 +1,22 @@
include $(top_srcdir)/config/Rules.am include $(top_srcdir)/config/Rules.am
DEFAULT_INCLUDES += \
-I$(top_srcdir)/include \
-I$(top_srcdir)/lib/libspl/include
sbin_PROGRAMS = zfs sbin_PROGRAMS = zfs
zfs_SOURCES = \ zfs_SOURCES = \
zfs_iter.c \ zfs_iter.c \
zfs_iter.h \ zfs_iter.h \
zfs_main.c \ zfs_main.c \
zfs_util.h \ zfs_util.h
zfs_project.c \
zfs_projectutil.h
zfs_LDADD = \ zfs_LDADD = \
$(abs_top_builddir)/lib/libzfs/libzfs.la \ $(top_builddir)/lib/libnvpair/libnvpair.la \
$(abs_top_builddir)/lib/libzfs_core/libzfs_core.la \ $(top_builddir)/lib/libuutil/libuutil.la \
$(abs_top_builddir)/lib/libnvpair/libnvpair.la \ $(top_builddir)/lib/libzpool/libzpool.la \
$(abs_top_builddir)/lib/libuutil/libuutil.la $(top_builddir)/lib/libzfs/libzfs.la \
$(top_builddir)/lib/libzfs_core/libzfs_core.la
zfs_LDADD += $(LTLIBINTL) zfs_LDFLAGS = -pthread
if BUILD_FREEBSD
zfs_LDADD += -lgeom -ljail
endif
include $(top_srcdir)/config/CppCheck.am
+6 -22
View File
@@ -31,7 +31,6 @@
#include <stddef.h> #include <stddef.h>
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>
#include <string.h>
#include <strings.h> #include <strings.h>
#include <libzfs.h> #include <libzfs.h>
@@ -134,31 +133,16 @@ zfs_callback(zfs_handle_t *zhp, void *data)
((cb->cb_flags & ZFS_ITER_DEPTH_LIMIT) == 0 || ((cb->cb_flags & ZFS_ITER_DEPTH_LIMIT) == 0 ||
cb->cb_depth < cb->cb_depth_limit)) { cb->cb_depth < cb->cb_depth_limit)) {
cb->cb_depth++; cb->cb_depth++;
if (zfs_get_type(zhp) == ZFS_TYPE_FILESYSTEM)
/*
* If we are not looking for filesystems, we don't need to
* recurse into filesystems when we are at our depth limit.
*/
if ((cb->cb_depth < cb->cb_depth_limit ||
(cb->cb_flags & ZFS_ITER_DEPTH_LIMIT) == 0 ||
(cb->cb_types &
(ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME))) &&
zfs_get_type(zhp) == ZFS_TYPE_FILESYSTEM) {
(void) zfs_iter_filesystems(zhp, zfs_callback, data); (void) zfs_iter_filesystems(zhp, zfs_callback, data);
}
if (((zfs_get_type(zhp) & (ZFS_TYPE_SNAPSHOT | if (((zfs_get_type(zhp) & (ZFS_TYPE_SNAPSHOT |
ZFS_TYPE_BOOKMARK)) == 0) && include_snaps) { ZFS_TYPE_BOOKMARK)) == 0) && include_snaps)
(void) zfs_iter_snapshots(zhp, (void) zfs_iter_snapshots(zhp,
(cb->cb_flags & ZFS_ITER_SIMPLE) != 0, (cb->cb_flags & ZFS_ITER_SIMPLE) != 0, zfs_callback,
zfs_callback, data, 0, 0); data);
}
if (((zfs_get_type(zhp) & (ZFS_TYPE_SNAPSHOT | if (((zfs_get_type(zhp) & (ZFS_TYPE_SNAPSHOT |
ZFS_TYPE_BOOKMARK)) == 0) && include_bmarks) { ZFS_TYPE_BOOKMARK)) == 0) && include_bmarks)
(void) zfs_iter_bookmarks(zhp, zfs_callback, data); (void) zfs_iter_bookmarks(zhp, zfs_callback, data);
}
cb->cb_depth--; cb->cb_depth--;
} }
@@ -240,7 +224,7 @@ zfs_compare(const void *larg, const void *rarg, void *unused)
*rat = '\0'; *rat = '\0';
ret = strcmp(lname, rname); ret = strcmp(lname, rname);
if (ret == 0 && (lat != NULL || rat != NULL)) { if (ret == 0) {
/* /*
* If we're comparing a dataset to one of its snapshots, we * If we're comparing a dataset to one of its snapshots, we
* always make the full dataset first. * always make the full dataset first.
+315 -1976
View File
File diff suppressed because it is too large Load Diff
-301
View File
@@ -1,301 +0,0 @@
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright (c) 2017, Intle Corporation. All rights reserved.
*/
#include <errno.h>
#include <getopt.h>
#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <unistd.h>
#include <fcntl.h>
#include <dirent.h>
#include <stddef.h>
#include <libintl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/list.h>
#include <sys/zfs_project.h>
#include "zfs_util.h"
#include "zfs_projectutil.h"
typedef struct zfs_project_item {
list_node_t zpi_list;
char zpi_name[0];
} zfs_project_item_t;
static void
zfs_project_item_alloc(list_t *head, const char *name)
{
zfs_project_item_t *zpi;
zpi = safe_malloc(sizeof (zfs_project_item_t) + strlen(name) + 1);
strcpy(zpi->zpi_name, name);
list_insert_tail(head, zpi);
}
static int
zfs_project_sanity_check(const char *name, zfs_project_control_t *zpc,
struct stat *st)
{
int ret;
ret = stat(name, st);
if (ret) {
(void) fprintf(stderr, gettext("failed to stat %s: %s\n"),
name, strerror(errno));
return (ret);
}
if (!S_ISREG(st->st_mode) && !S_ISDIR(st->st_mode)) {
(void) fprintf(stderr, gettext("only support project quota on "
"regular file or directory\n"));
return (-1);
}
if (!S_ISDIR(st->st_mode)) {
if (zpc->zpc_dironly) {
(void) fprintf(stderr, gettext(
"'-d' option on non-dir target %s\n"), name);
return (-1);
}
if (zpc->zpc_recursive) {
(void) fprintf(stderr, gettext(
"'-r' option on non-dir target %s\n"), name);
return (-1);
}
}
return (0);
}
static int
zfs_project_load_projid(const char *name, zfs_project_control_t *zpc)
{
zfsxattr_t fsx;
int ret, fd;
fd = open(name, O_RDONLY | O_NOCTTY);
if (fd < 0) {
(void) fprintf(stderr, gettext("failed to open %s: %s\n"),
name, strerror(errno));
return (fd);
}
ret = ioctl(fd, ZFS_IOC_FSGETXATTR, &fsx);
if (ret)
(void) fprintf(stderr,
gettext("failed to get xattr for %s: %s\n"),
name, strerror(errno));
else
zpc->zpc_expected_projid = fsx.fsx_projid;
close(fd);
return (ret);
}
static int
zfs_project_handle_one(const char *name, zfs_project_control_t *zpc)
{
zfsxattr_t fsx;
int ret, fd;
fd = open(name, O_RDONLY | O_NOCTTY);
if (fd < 0) {
if (errno == ENOENT && zpc->zpc_ignore_noent)
return (0);
(void) fprintf(stderr, gettext("failed to open %s: %s\n"),
name, strerror(errno));
return (fd);
}
ret = ioctl(fd, ZFS_IOC_FSGETXATTR, &fsx);
if (ret) {
(void) fprintf(stderr,
gettext("failed to get xattr for %s: %s\n"),
name, strerror(errno));
goto out;
}
switch (zpc->zpc_op) {
case ZFS_PROJECT_OP_LIST:
(void) printf("%5u %c %s\n", fsx.fsx_projid,
(fsx.fsx_xflags & ZFS_PROJINHERIT_FL) ? 'P' : '-', name);
goto out;
case ZFS_PROJECT_OP_CHECK:
if (fsx.fsx_projid == zpc->zpc_expected_projid &&
fsx.fsx_xflags & ZFS_PROJINHERIT_FL)
goto out;
if (!zpc->zpc_newline) {
char c = '\0';
(void) printf("%s%c", name, c);
goto out;
}
if (fsx.fsx_projid != zpc->zpc_expected_projid)
(void) printf("%s - project ID is not set properly "
"(%u/%u)\n", name, fsx.fsx_projid,
(uint32_t)zpc->zpc_expected_projid);
if (!(fsx.fsx_xflags & ZFS_PROJINHERIT_FL))
(void) printf("%s - project inherit flag is not set\n",
name);
goto out;
case ZFS_PROJECT_OP_CLEAR:
if (!(fsx.fsx_xflags & ZFS_PROJINHERIT_FL) &&
(zpc->zpc_keep_projid ||
fsx.fsx_projid == ZFS_DEFAULT_PROJID))
goto out;
fsx.fsx_xflags &= ~ZFS_PROJINHERIT_FL;
if (!zpc->zpc_keep_projid)
fsx.fsx_projid = ZFS_DEFAULT_PROJID;
break;
case ZFS_PROJECT_OP_SET:
if (fsx.fsx_projid == zpc->zpc_expected_projid &&
(!zpc->zpc_set_flag || fsx.fsx_xflags & ZFS_PROJINHERIT_FL))
goto out;
fsx.fsx_projid = zpc->zpc_expected_projid;
if (zpc->zpc_set_flag)
fsx.fsx_xflags |= ZFS_PROJINHERIT_FL;
break;
default:
ASSERT(0);
break;
}
ret = ioctl(fd, ZFS_IOC_FSSETXATTR, &fsx);
if (ret)
(void) fprintf(stderr,
gettext("failed to set xattr for %s: %s\n"),
name, strerror(errno));
out:
close(fd);
return (ret);
}
static int
zfs_project_handle_dir(const char *name, zfs_project_control_t *zpc,
list_t *head)
{
struct dirent *ent;
DIR *dir;
int ret = 0;
dir = opendir(name);
if (dir == NULL) {
if (errno == ENOENT && zpc->zpc_ignore_noent)
return (0);
ret = -errno;
(void) fprintf(stderr, gettext("failed to opendir %s: %s\n"),
name, strerror(errno));
return (ret);
}
/* Non-top item, ignore the case of being removed or renamed by race. */
zpc->zpc_ignore_noent = B_TRUE;
errno = 0;
while (!ret && (ent = readdir(dir)) != NULL) {
char *fullname;
/* skip "." and ".." */
if (strcmp(ent->d_name, ".") == 0 ||
strcmp(ent->d_name, "..") == 0)
continue;
if (strlen(ent->d_name) + strlen(name) + 1 >= PATH_MAX) {
errno = ENAMETOOLONG;
break;
}
if (asprintf(&fullname, "%s/%s", name, ent->d_name) == -1) {
errno = ENOMEM;
break;
}
ret = zfs_project_handle_one(fullname, zpc);
if (!ret && zpc->zpc_recursive && ent->d_type == DT_DIR)
zfs_project_item_alloc(head, fullname);
free(fullname);
}
if (errno && !ret) {
ret = -errno;
(void) fprintf(stderr, gettext("failed to readdir %s: %s\n"),
name, strerror(errno));
}
closedir(dir);
return (ret);
}
int
zfs_project_handle(const char *name, zfs_project_control_t *zpc)
{
zfs_project_item_t *zpi;
struct stat st;
list_t head;
int ret;
ret = zfs_project_sanity_check(name, zpc, &st);
if (ret)
return (ret);
if ((zpc->zpc_op == ZFS_PROJECT_OP_SET ||
zpc->zpc_op == ZFS_PROJECT_OP_CHECK) &&
zpc->zpc_expected_projid == ZFS_INVALID_PROJID) {
ret = zfs_project_load_projid(name, zpc);
if (ret)
return (ret);
}
zpc->zpc_ignore_noent = B_FALSE;
ret = zfs_project_handle_one(name, zpc);
if (ret || !S_ISDIR(st.st_mode) || zpc->zpc_dironly ||
(!zpc->zpc_recursive &&
zpc->zpc_op != ZFS_PROJECT_OP_LIST &&
zpc->zpc_op != ZFS_PROJECT_OP_CHECK))
return (ret);
list_create(&head, sizeof (zfs_project_item_t),
offsetof(zfs_project_item_t, zpi_list));
zfs_project_item_alloc(&head, name);
while ((zpi = list_remove_head(&head)) != NULL) {
if (!ret)
ret = zfs_project_handle_dir(zpi->zpi_name, zpc, &head);
free(zpi);
}
return (ret);
}
-49
View File
@@ -1,49 +0,0 @@
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright (c) 2017, Intel Corporation. All rights reserved.
*/
#ifndef _ZFS_PROJECTUTIL_H
#define _ZFS_PROJECTUTIL_H
typedef enum {
ZFS_PROJECT_OP_DEFAULT = 0,
ZFS_PROJECT_OP_LIST = 1,
ZFS_PROJECT_OP_CHECK = 2,
ZFS_PROJECT_OP_CLEAR = 3,
ZFS_PROJECT_OP_SET = 4,
} zfs_project_ops_t;
typedef struct zfs_project_control {
uint64_t zpc_expected_projid;
zfs_project_ops_t zpc_op;
boolean_t zpc_dironly;
boolean_t zpc_ignore_noent;
boolean_t zpc_keep_projid;
boolean_t zpc_newline;
boolean_t zpc_recursive;
boolean_t zpc_set_flag;
} zfs_project_control_t;
int zfs_project_handle(const char *name, zfs_project_control_t *zpc);
#endif /* _ZFS_PROJECTUTIL_H */
+1 -1
View File
@@ -33,7 +33,7 @@ extern "C" {
void * safe_malloc(size_t size); void * safe_malloc(size_t size);
void nomem(void); void nomem(void);
extern libzfs_handle_t *g_zfs; libzfs_handle_t *g_zfs;
#ifdef __cplusplus #ifdef __cplusplus
} }

Some files were not shown because too many files have changed in this diff Show More