Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:
7638 Refactor spa_load_impl into several functions
8961 SPA load/import should tell us why it failed
7277 zdb should be able to print zfs_dbgmsg's
To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.
The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.
The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.
When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.
This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.
With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
of data, this feature is for now restricted to a read-only import.
Porting notes (ZTS):
* Fix 'make dist' target in zpool_import
* The maximum path length allowed by tar is 99 characters. Several
of the new test cases exceeded this limit resulting in them not
being included in the tarball. Shorten the names slightly.
* Set/get tunables using accessor functions.
* Get last synced txg via the "zfs_txg_history" mechanism.
* Clear zinject handlers in cleanup for import_cache_device_replaced
and import_rewind_device_replaced in order that the zpool can be
exported if there is an error.
* Increase FILESIZE to 8G in zfs-test.sh to allow for a larger
ext4 file system to be created on ZFS_DISK2. Also, there's
no need to partition ZFS_DISK2 at all. The partitioning had
already been disabled for multipath devices. Among other things,
the partitioning steals some space from the ext4 file system,
makes it difficult to accurately calculate the paramters to
parted and can make some of the tests fail.
* Increase FS_SIZE and FILE_SIZE in the zpool_import test
configuration now that FILESIZE is larger.
* Write more data in order that device evacuation take lonnger in
a couple tests.
* Use mkdir -p to avoid errors when the directory already exists.
* Remove use of sudo in import_rewind_config_changed.
Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://illumos.org/issues/9075
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/619c0123Closes#7459
OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete
This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing operations
on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use
it are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be "remapped"
to their new (concrete) locations if possible. This process can be
accelerated by using the "zfs remap" command to proactively rewrite all
indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of
the data that is copied. This makes the process much faster, but if it
were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror.
At the moment, only mirrors and simple top-level vdevs can be removed
and no removal is allowed if any of the top-level vdevs are raidz.
Porting Notes:
* Avoid zero-sized kmem_alloc() in vdev_compact_children().
The device evacuation code adds a dependency that
vdev_compact_children() be able to properly empty the vdev_child
array by setting it to NULL and zeroing vdev_children. Under Linux,
kmem_alloc() and related functions return a sentinel pointer rather
than NULL for zero-sized allocations.
* Remove comment regarding "mpt" driver where zfs_remove_max_segment
is initialized to SPA_MAXBLOCKSIZE.
Change zfs_condense_indirect_commit_entry_delay_ticks to
zfs_condense_indirect_commit_entry_delay_ms for consistency with
most other tunables in which delays are specified in ms.
* ZTS changes:
Use set_tunable rather than mdb
Use zpool sync as appropriate
Use sync_pool instead of sync
Kill jobs during test_removal_with_operation to allow unmount/export
Don't add non-disk names such as "mirror" or "raidz" to $DISKS
Use $TEST_BASE_DIR instead of /tmp
Increase HZ from 100 to 1000 which is more common on Linux
removal_multiple_indirection.ksh
Reduce iterations in order to not time out on the code
coverage builders.
removal_resume_export:
Functionally, the test case is correct but there exists a race
where the kernel thread hasn't been fully started yet and is
not visible. Wait for up to 1 second for the removal thread
to be started before giving up on it. Also, increase the
amount of data copied in order that the removal not finish
before the export has a chance to fail.
* MMP compatibility, the concept of concrete versus non-concrete devices
has slightly changed the semantics of vdev_writeable(). Update
mmp_random_leaf_impl() accordingly.
* Updated dbuf_remap() to handle the org.zfsonlinux:large_dnode pool
feature which is not supported by OpenZFS.
* Added support for new vdev removal tracepoints.
* Test cases removal_with_zdb and removal_condense_export have been
intentionally disabled. When run manually they pass as intended,
but when running in the automated test environment they produce
unreliable results on the latest Fedora release.
They may work better once the upstream pool import refectoring is
merged into ZoL at which point they will be re-enabled.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Alex Reece <alex@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://www.illumos.org/issues/7614
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f539f1ebCloses#6900
When the pool is suspended, record whether it was due to an I/O error or
due to MMP writes failing to succeed within the required time.
Change spa_suspended from uint8_t to zio_suspend_reason_t to store the
reason.
When userspace queries pool status via spa_tryimport(), report the
reason the pool was suspended in a new key,
ZPOOL_CONFIG_SUSPENDED_REASON.
In libzfs, when interpreting the returned config nvlist, report
suspension due to MMP with a new pool status enum value,
ZPOOL_STATUS_IO_FAILURE_MMP.
In status_callback(), which generates and emits the message when 'zpool
status' is executed, add a case to print an appropriate message for the
new pool status enum value.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#7296
get_format_prompt_string() and zpool_state_to_name() return
a string literal which is read-only, thus they should return
`const char*`.
zpool_get_prop_string() returns a non-const string after
successful nv-lookup, and returns a string literal otherwise.
Since this function is designed to be used for read-only purpose,
the return type should also be `const char*`.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Closes#7285
Add in SMART self-test results to zpool status|iostat -c. This
works for both SAS and SATA drives.
Also, add plumbing to allow the 'smart' script to take smartctl
output from a directory of output text files instead of running
it against the vdevs.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#7178
This adds the SMART attributes required to probe Samsung SSD and NVMe
(and possibly others) disks when using the "zpool status -c" command.
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes#7183Closes#7193
Authored by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported-by: Don Brady <don.brady@delphix.com>
Ported-by: John Kennedy <john.kennedy@delphix.com>
OpenZFS-issue: https://www.illumos.org/issues/7431
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/dfc11533
Porting Notes:
* The CLI long option arguments for '-t' and '-m' don't parse on linux
* Switched from kmem_alloc to vmem_alloc in zcp_lua_alloc
* Lua implementation is built as its own module (zlua.ko)
* Lua headers consumed directly by zfs code moved to 'include/sys/lua/'
* There is no native setjmp/longjump available in stock Linux kernel.
Brought over implementations from illumos and FreeBSD
* The get_temporary_prop() was adapted due to VFS platform differences
* Use of inline functions in lua parser to reduce stack usage per C call
* Skip some ZFS Test Suite ZCP tests on sparc64 to avoid stack overflow
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: DHE <git@dehacked.net>
Closes#7133
The on-disk format for encrypted datasets protects not only
the encrypted and authenticated blocks themselves, but also
the order and interpretation of these blocks. In order to
make this work while maintaining the ability to do raw
sends, the indirect bps maintain a secure checksum of all
the MACs in the block below it along with a few other
fields that determine how the data is interpreted.
Unfortunately, the current on-disk format erroneously
includes some fields which are not portable and thus cannot
support raw sends. It is not possible to easily work around
this issue due to a separate and much smaller bug which
causes indirect blocks for encrypted dnodes to not be
compressed, which conflicts with the previous bug. In
addition, the current code generates incompatible on-disk
formats on big endian and little endian systems due to an
issue with how block pointers are authenticated. Finally,
raw send streams do not currently include dn_maxblkid when
sending both the metadnode and normal dnodes which are
needed in order to ensure that we are correctly maintaining
the portable objset MAC.
This patch zero's out the offending fields when computing
the bp MAC and ensures that these MACs are always
calculated in little endian order (regardless of the host
system's byte order). This patch also registers an errata
for the old on-disk format, which we detect by adding a
"version" field to newly created DSL Crypto Keys. We allow
datasets without a version (version 0) to only be mounted
for read so that they can easily be migrated. We also now
include dn_maxblkid in raw send streams to ensure the MAC
can be maintained correctly.
This patch also contains minor bug fixes and cleanups.
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#6845Closes#6864Closes#7052
* Remove 'zfs snap' from zfs help message (OpenZFS sync)
* Update zfs(8) to suggest 'snap' can be used as an alias for 'snapshot'
* Enforce 80 columns limit in help messages
* Remove zfs_disable_dup_eviction from zfs-module-parameters(5)
* Expose zfs_scan_max_ext_gap as a kernel module parameter.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#7087
usr/src/uts/common/sys/fs/zfs.h
Change ZPROP_INVAL and ZPROP_CONT from macros to enum values. Clang
and GCC both prefer to use unsigned ints to store enums. That was
causing tautological comparison warnings (and likely eliminating
error handling code at compile time) whenever a zfs_prop_t or
zpool_prop_t was compared to ZPROP_INVAL or ZPROP_CONT. Making the
error flags be explicity enum values forces the enum types to be
signed.
ZPROP_INVAL was also compared against two different enum types. I
had to change its name to ZPOOL_PROP_INVAL whenever its compared to
a zpool_prop_t. There are still some places where ZPROP_INVAL or
ZPROP_CONT is compared to a plain int, in code that doesn't know
whether the int is storing a zfs_prop_t or a zpool_prop_t.
usr/src/uts/common/fs/zfs/spa.c
s/ZPROP_INVAL/ZPOOL_PROP_INVAL/
Authored by: Alan Somers <asomers@gmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: George Melikov <mail@gmelikov.ru>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/8652
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c2de80dc74Closes#7061
Resolved unused variable warnings observed after restricting
-Wno-unused-but-set-variable to only libzfs and libzpool.
Reviewed-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6941
When replacing a faulted device which was previously handled by a spare
multiple levels of nested interior VDEVs will be present in the pool
configuration; the following example illustrates one of the possible
situations:
NAME STATE READ WRITE CKSUM
testpool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
spare-0 DEGRADED 0 0 0
replacing-0 DEGRADED 0 0 0
/var/tmp/fault-dev UNAVAIL 0 0 0 cannot open
/var/tmp/replace-dev ONLINE 0 0 0
/var/tmp/spare-dev1 ONLINE 0 0 0
/var/tmp/safe-dev ONLINE 0 0 0
spares
/var/tmp/spare-dev1 INUSE currently in use
This is safe and allowed, but get_replication() needs to handle this
situation gracefully to let zpool add new devices to the pool.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#6678Closes#6996
Fix a segfault when running 'zpool iostat -v 1' while adding
a VDEV.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#6748Closes#6872
When the pool configuration contains a hole due to a previous device
removal ignore this top level vdev. Failure to do so will result in
the current configuration being assessed to have a non-uniform
replication level and the expected warning will be disabled.
The zpool_add_010_pos test case was extended to cover this scenario.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6907Closes#6911
Currently, scrubs and resilvers can take an extremely
long time to complete. This is largely due to the fact
that zfs scans process pools in logical order, as
determined by each block's bookmark. This makes sense
from a simplicity perspective, but blocks in zfs are
often scattered randomly across disks, particularly
due to zfs's copy-on-write mechanisms.
This patch improves performance by splitting scrubs
and resilvers into a metadata scanning phase and an IO
issuing phase. The metadata scan reads through the
structure of the pool and gathers an in-memory queue
of I/Os, sorted by size and offset on disk. The issuing
phase will then issue the scrub I/Os as sequentially as
possible, greatly improving performance.
This patch also updates and cleans up some of the scan
code which has not been updated in several years.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Authored-by: Alek Pinchuk <apinchuk@datto.com>
Authored-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#3625Closes#6256
Additionally add four new tests:
* zpool_events_clear: verify 'zpool events -c' functionality
* zpool_events_cliargs: verify command line options and arguments
* zpool_events_follow: verify 'zpool events -f'
* zpool_events_poolname: verify events filtering by pool name
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#3285Closes#6762
Added -n flag to zpool reopen that allows a running scrub
operation to continue if there is a device with Dirty Time Log.
By default if a component device has a DTL and zpool reopen
is executed all running scan operations will be restarted.
Added functional tests for `zpool reopen`
Tests covers following scenarios:
* `zpool reopen` without arguments,
* `zpool reopen` with pool name as argument,
* `zpool reopen` while scrubbing,
* `zpool reopen -n` while scrubbing,
* `zpool reopen -n` while resilvering,
* `zpool reopen` with bad arguments.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Arkadiusz Bubała <arkadiusz.bubala@open-e.com>
Closes#6076Closes#6746
Currently the 480GB models of this disk do not use ashift=12 by
default. SSDSC2BW48 is also optimized for 4k blocks.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: adisbladis <adis@blad.is>
Closes#6774
CID 161388: Resource Leak (REASOURCE_LEAK)
Jump to errout so that file descriptor gets closed before returning
from function.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Closes#6755
* PBKDF2 implementation changed to OpenSSL implementation.
* HKDF implementation moved to its own file and tests
added to ensure correctness.
* Removed libzfs's now unnecessary dependency on libzpool
and libicp.
* Ztest can now create and test encrypted datasets. This is
currently disabled until issue #6526 is resolved, but
otherwise functions as advertised.
* Several small bug fixes discovered after enabling ztest
to run on encrypted datasets.
* Fixed coverity defects added by the encryption patch.
* Updated man pages for encrypted send / receive behavior.
* Fixed a bug where encrypted datasets could receive
DRR_WRITE_EMBEDDED records.
* Minor code cleanups / consolidation.
Signed-off-by: Tom Caputi <tcaputi@datto.com>
This change incorporates three major pieces:
The first change is a keystore that manages wrapping
and encryption keys for encrypted datasets. These
commands mostly involve manipulating the new
DSL Crypto Key ZAP Objects that live in the MOS. Each
encrypted dataset has its own DSL Crypto Key that is
protected with a user's key. This level of indirection
allows users to change their keys without re-encrypting
their entire datasets. The change implements the new
subcommands "zfs load-key", "zfs unload-key" and
"zfs change-key" which allow the user to manage their
encryption keys and settings. In addition, several new
flags and properties have been added to allow dataset
creation and to make mounting and unmounting more
convenient.
The second piece of this patch provides the ability to
encrypt, decyrpt, and authenticate protected datasets.
Each object set maintains a Merkel tree of Message
Authentication Codes that protect the lower layers,
similarly to how checksums are maintained. This part
impacts the zio layer, which handles the actual
encryption and generation of MACs, as well as the ARC
and DMU, which need to be able to handle encrypted
buffers and protected data.
The last addition is the ability to do raw, encrypted
sends and receives. The idea here is to send raw
encrypted and compressed data and receive it exactly
as is on a backup system. This means that the dataset
on the receiving system is protected using the same
user key that is in use on the sending side. By doing
so, datasets can be efficiently backed up to an
untrusted system without fear of data being
compromised.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#494Closes#5769
OpenZFS provides a library called tpool which implements thread
pools for user space applications. Porting this library means
the zpool utility no longer needs to borrow the kernel mutex and
taskq interfaces from libzpool. This code was updated to use
the tpool library which behaves in a very similar fashion.
Porting libtpool was relatively straight forward and minimal
modifications were needed. The core changes were:
* Fully convert the library to use pthreads.
* Updated signal handling.
* lmalloc/lfree converted to calloc/free
* Implemented portable pthread_attr_clone() function.
Finally, update the build system such that libzpool.so is no
longer linked in to zfs(8), zpool(8), etc. All that is required
is libzfs to which the zcommon soures were added (which is the way
it always should have been). Removing the libzpool dependency
resulted in several build issues which needed to be resolved.
* Moved zfeature support to module/zcommon/zfeature_common.c
* Moved ratelimiting to to module/zfs/zfs_ratelimit.c
* Moved get_system_hostid() to lib/libspl/gethostid.c
* Removed use of cmn_err() in zcommon source
* Removed dprintf_setup() call from zpool_main.c and zfs_main.c
* Removed highbit() and lowbit()
* Removed unnecessary library dependencies from Makefiles
* Removed fletcher-4 kstat in user space
* Added sha2 support explicitly to libzfs
* Added highbit64() and lowbit64() to zpool_util.c
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6442
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Sen Haerens <sen@senhaerens.be>
Closes#6444Closes#6445
Turning the multihost property on requires that a hostid be set to allow
ZFS to determine when a foreign system is attemping to import a pool.
The error message instructing the user to set a hostid refers to
genhostid(1).
Genhostid(1) is not available on SUSE Linux. This commit adds a script
modeled after genhostid(1) for those users.
Zgenhostid checks for an /etc/hostid file; if it does not exist, it
creates one and stores a value. If the user has provided a hostid as an
argument, that value is used. Otherwise, a random hostid is generated
and stored.
This differs from the CENTOS 6/7 versions of genhostid, which overwrite
the /etc/hostid file even though their manpages state otherwise.
A man page for zgenhostid is added. The one for genhostid is in (1), but
I put zgenhostid in (8) because I believe it's more appropriate.
The mmp tests are modified to use zgenhostid to set the hostid instead
of using the spl_hostid module parameter. zgenhostid will not replace
an existing /etc/hostid file, so new mmp_clear_hostid calls are
required.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#6358Closes#6379
When replacing a disk with non-wholedisk spare, we shouldn't zero_label
it. The wholedisk case already skip it. In fact, zero_label function
will fail saying device busy because it's already opened exclusively,
but since there's no error checking, the replace command will succeed,
causing great confusion.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#6369
zpool iostat/status -c is supposed to be restricted
by its search path, but currently isn't. To prevent
arbitrary scripts from being executed, disallow '/'
from commands.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes#6353Closes#6359
CID 165757: Control flow issues (MISSING_BREAK)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes#6348
Add multihost=on|off pool property to control MMP. When enabled
a new thread writes uberblocks to the last slot in each label, at a
set frequency, to indicate to other hosts the pool is actively imported.
These uberblocks are the last synced uberblock with an updated
timestamp. Property defaults to off.
During tryimport, find the "best" uberblock (newest txg and timestamp)
repeatedly, checking for change in the found uberblock. Include the
results of the activity test in the config returned by tryimport.
These results are reported to user in "zpool import".
Allow the user to control the period between MMP writes, and the
duration of the activity test on import, via a new module parameter
zfs_multihost_interval. The period is specified in milliseconds. The
activity test duration is calculated from this value, and from the
mmp_delay in the "best" uberblock found initially.
Add a kstat interface to export statistics about Multiple Modifier
Protection (MMP) updates. Include the last synced txg number, the
timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV
label that received the last MMP update, and the VDEV path. Abbreviated
output below.
$ cat /proc/spl/kstat/zfs/mypool/multihost
31 0 0x01 10 880 105092382393521 105144180101111
txg timestamp mmp_delay vdev_guid vdev_label vdev_path
20468 261337 250274925 68396651780 3 /dev/sda
20468 261339 252023374 6267402363293 1 /dev/sdc
20468 261340 252000858 6698080955233 1 /dev/sdx
20468 261341 251980635 783892869810 2 /dev/sdy
20468 261342 253385953 8923255792467 3 /dev/sdd
20468 261344 253336622 042125143176 0 /dev/sdab
20468 261345 253310522 1200778101278 2 /dev/sde
20468 261346 253286429 0950576198362 2 /dev/sdt
20468 261347 253261545 96209817917 3 /dev/sds
20468 261349 253238188 8555725937673 3 /dev/sdb
Add a new tunable zfs_multihost_history to specify the number of MMP
updates to store history for. By default it is set to zero meaning that
no MMP statistics are stored.
When using ztest to generate activity, for automated tests of the MMP
function, some test functions interfere with the test. For example, the
pool is exported to run zdb and then imported again. Add a new ztest
function, "-M", to alter ztest behavior to prevent this.
Add new tests to verify the new functionality. Tests provided by
Giuseppe Di Natale.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#745Closes#6279
Currently, there is no way to pause a scrub. Pausing may
be useful when the pool is busy with other I/O to preserve
bandwidth.
This patch adds the ability to pause and resume scrubbing.
This is achieved by maintaining a persistent on-disk scrub state.
While the state is 'paused' we do not scrub any more blocks.
We do however perform regular scan housekeeping such as
freeing async destroyed and deadlist blocks while paused.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Reviewed-by: Serapheim Dimitropoulos <serapheimd@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes#6167
This prints dashes instead of zeros for zero latency values in
'zpool iostat -p'. You'll get zero latencies reported when the
disk is idle, but technically a zero latency is invalid, since you
can't measure the latency of doing nothing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#6210
Allow new members to be added to a pool mixing raidz and mirror vdevs
without giving -f, as long as they have matching redundancy. This case
was missed in #5915, which only handled zpool create.
Add zfstest zpool_add_010_pos.ksh, with test of zpool create
followed by zpool add of mixed raidz and mirror vdevs.
Add some more mixed raidz and mirror cases to zpool_create_006_pos.ksh.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Haakan Johansson <f96hajo@chalmers.se>
Issue #5915Closes#6181
Users can now provide their own scripts to be run
with 'zpool iostat/status -c'. User scripts should be
placed in ~/.zpool.d to be included in zpool's
default search path.
Provide a script which can be used with
'zpool iostat|status -c' that will return the type of
device (hdd, sdd, file).
Provide a script to get various values from smartctl
when using 'zpool iostat/status -c'.
Allow users to define the ZPOOL_SCRIPTS_PATH
environment variable which can be used to override
the default 'zpool iostat/status -c' search path.
Allow the ZPOOL_SCRIPTS_ENABLED environment
variable to enable or disable 'zpool status/iostat -c'
functionality.
Use the new smart script to provide the serial command.
Install /etc/sudoers.d/zfs file which contains the sudoer
rule for smartctl as a sample.
Allow 'zpool iostat/status -c' tests to run in tree.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes#6121Closes#6153
This addition will enable us to sync an open TXG to the main pool
on demand. The functionality is similar to 'sync(2)' but 'zpool sync'
will return when data has hit the main storage instead of potentially
just the ZIL as is the case with the 'sync(2)' cmd.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes#6122
This patch adds a '-f' option to 'zpool offline' to fault a vdev
instead of bringing it offline. Unlike the OFFLINE state, the
FAULTED state will trigger the FMA code, allowing for things like
autoreplace and triggering the slot fault LED. The -f faults
persist across imports, unless they were set with the temporary
(-t) flag. Both persistent and temporary faults can be cleared
with zpool clear.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#6094
Enable additional test cases, in most cases this required a few
minor modifications to the test scripts. In a few cases a real
bug was uncovered and fixed. And in a handful of cases where pools
are layered on pools the test case will be skipped until this is
supported. Details below for each test case.
* zpool_add_004_pos - Skip test on Linux until adding zvols to pools
is fully supported and deadlock free.
* zpool_add_005_pos.ksh - Skip dumpadm portion of the test which isn't
relevant for Linux. The find_vfstab_dev, find_mnttab_dev, and
save_dump_dev functions were updated accordingly for Linux. Add
O_EXCL to the in-use check to prevent the -f (force) option from
working for mounted filesystems and improve the resulting error.
* zpool_add_006_pos - Update test case such that it doesn't depend
on nested pools. Switch to truncate from mkfile to reduce space
requirements and speed up the test case.
* zpool_clear_001_pos - Speed up test case by filling filesystem to
25% capacity.
* zpool_create_002_pos, zpool_create_004_pos - Use sparse files for
file vdevs in order to avoid increasing the partition size.
* zpool_create_006_pos - 6ba1ce9 allows raidz+mirror configs with
similar redundancy. Updating the valid_args and forced_args cases.
* zpool_create_008_pos - Disable overlapping partition portion.
* zpool_create_011_neg - Fix to correctly create the extra partition.
Modified zpool_vdev.c to use fstat64_blk() wrapper which includes
the st_size even for block devices.
* zpool_create_012_neg - Updated to properly find swap devices.
* zpool_create_014_neg, zpool_create_015_neg - Updated to use
swap_setup() and swap_cleanup() wrappers which do the right thing
on Linux and Illumos. Removed '-n' option which succeeds under
Linux due to differences in the in-use checks.
* zpool_create_016_pos.ksh - Skipped test case isn't useful.
* zpool_create_020_pos - Added missing / to cleanup() function.
Remove cache file prior to test to ensure a clean environment
and avoid false positives.
* zpool_destroy_001_pos - Removed test case which creates a pool on
a zvol. This is more likely to deadlock under Linux and has never
been completely supported on any platform.
* zpool_destroy_002_pos - 'zpool destroy -f' is unsupported on Linux.
Mount point must not be busy in order to unmount them.
* zfs_destroy_001_pos - Handle EBUSY error which can occur with
volumes when racing with udev.
* zpool_expand_001_pos, zpool_expand_003_neg - Skip test on Linux
until adding zvols to pools is fully supported and deadlock free.
The test could be modified to use loop-back devices but it would
be preferable to use the test case as is for improved coverage.
* zpool_export_004_pos - Updated test case to such that it doesn't
depend on nested pools. Normal file vdev under /var/tmp are fine.
* zpool_import_all_001_pos - Updated to skip partition 1, which is
known as slice 2, on Illumos. This prevents overwriting the
default TESTPOOL which was causing the failure.
* zpool_import_002_pos, zpool_import_012_pos - No changes needed.
* zpool_remove_003_pos - No changes needed
* zpool_upgrade_002_pos, zpool_upgrade_004_pos - Root cause addressed
by upstream OpenZFS commit 3b7f360.
* zpool_upgrade_007_pos - Disabled in test case due to known failure.
Opened issue https://github.com/zfsonlinux/zfs/issues/6112
* zvol_misc_002_pos - Updated to to use ext2.
* zvol_misc_001_neg, zvol_misc_003_neg, zvol_misc_004_pos,
zvol_misc_005_neg, zvol_misc_006_pos - Moved to skip list, these
test case could be updated to use Linux's crash dump facility.
* zvol_swap_* - Updated to use swap_setup/swap_cleanup helpers.
File creation switched from /tmp to /var/tmp. Enabled minimal
useful tests for Linux, skip test cases which aren't applicable.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3484
Issue #5634
Issue #2437
Issue #5202
Issue #4034Closes#6095
CID 161638: Resource leak (RESOURCE_LEAK)
Ensure the string array in print_zpool_script_help
is freed in cases when there is an error.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes#6111
This commit allow higher ashift values (up to 16) in 'zpool create'
The ashift value was previously limited to 13 (8K block) in b41c990
because the limited number of uberblocks we could fit in the
statically sized (128K) vdev label ring buffer could prevent the
ability the safely roll back a pool to recover it.
Since b02fe35 the largest uberblock size we support is 8K: this
allow us to store a minimum number of 16 uberblocks in the vdev
label, even with higher ashift values.
Additionally change 'ashift' pool property behaviour: if set it will
be used as the default hint value in subsequent vdev operations
('zpool add', 'attach' and 'replace'). A custom ashift value can still
be specified from the command line, if desired.
Finally, fix a bug in add-o_ashift.ksh caused by a missing variable.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#2024Closes#4205Closes#4740Closes#5763
* Add zfs_nicebytes() to print human-readable sizes
Some 'zfs', 'zpool' and 'zdb' output strings can be confusing to the
user when no units are specified. This add a new zfs_nicenum_format
"ZFS_NICENUM_BYTES" used to print bytes in their human-readable form.
Additionally, update some test cases to use machine-parsable 'zfs get'.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#2414Closes#3185Closes#3594Closes#6032
This patch updates the "zpool status/iostat -c" commands to only run
"pre-baked" scripts from the /etc/zfs/zpool.d directory (or wherever
you install to). The scripts can only be run from -c as an unprivileged
user (unless the ZPOOL_SCRIPTS_AS_ROOT environment var is
set by root). This was done to encourage scripts to be written is such
a way that normal users can use them, and to be cautious. If your
script needs to run a privileged command, consider adding the
appropriate line in /etc/sudoers. See zpool(8) for an example of how
to do this.
The patch also allows the scripts to output custom column names. If
the script outputs a line like:
name=value
then "name" is used for the column name, and "value" is its value.
Multiple columns can be specified by outputting multiple lines. Column
names and values can have spaces. If the value is empty, a dash (-) is
printed instead.
After all the "name=value" lines are read (if any), zpool will take the
next the next line of output (if any) and print it without a column
header. After that, no more lines will be processed. This can be
useful for printing errors.
Lastly, this patch also disables the -c option with the latency and
request size histograms, since it produced awkward output and made the
code harder to maintain.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#5852
Be sure to invalidate a vdev's cache before performing
a zpool labelclear. There are cases where the cache is
stale because we did some operation that bypassed it,
and since we are doing an open with only O_RDWR, we
should invalidate it to be safe.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes#6009
Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- Updated 'zpool labelclear' and 'zdb -l' such that they attempt
to find a vdev given solely its short name. This behavior is
consistent with the upstream OpenZFS code and the test cases
depend on it. The actual implementation differs slightly due
to device naming conventions on Linux.
- auto_online_001_pos, auto_replace_001_pos and add-o_ashift
test cases updated to expect failure when no label exists.
- read_efi_label() and zpool_label_disk_check() are read-only
operations and should use O_RDONLY at open time to enforce this.
- zpool_label_disk() and zpool_relabel_disk() write the partition
information using O_DIRECT an fsync() and page cache invalidation
to ensure a consistent view of the device.
- dump_label() in zdb should invalidate the page cache in order
to get the authoritative label from disk.
OpenZFS-issue: https://www.illumos.org/issues/6865
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c95076cCloses#5981
Allow a pool to be created with both raidz and mirror members,
without giving -f, as long as they have matching redundancy.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Haakan T Johansson <f96hajo@chalmers.se>
Closes#5915
df83110 added the ability to specify a custom "ashift" value from the command
line in 'zpool add' and 'zpool attach'. This commit adds additional checks to
the provided ashift to prevent invalid values from being used, which could
result in disastrous consequences for the whole pool.
Additionally provide ASHIFT_MAX and ASHIFT_MIN definitions in spa.h.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#5878
When a pool is suspended it's impossible to read the list
of damaged files from disk. This would result in a generic
misleading "insufficient permissions" error message.
Update zpool_get_errlog() to use the standard zpool error
logging functions to generate a useful error message. In
this case:
errors: List of errors unavailable: pool I/O is currently suspended
This patch does not address the related issue of potentially
not being able to resume a suspend pool when the underlying
device names have changed.
Additionally, remove the error handling from zfs_alloc()
in zpool_get_errlog() for readability since this function
can never fail.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4031Closes#5731Closes#5907
- Pass $VDEV_ENC_SYSFS_PATH to 'zpool [iostat|status] -c' to include
enclosure LED sysfs path.
- Set LEDs correctly after import. This includes clearing any erroniously
set LEDs prior to the import, and setting the LED for any UNAVAIL drives.
- Include symlink for vdev_attach-led.sh in Makefile.am.
- Print the VDEV path in all-syslog.sh, and fix it so the pool GUID actually
prints.
Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#5716Closes#5751
Porting notes:
- Many changes were already made in ZoL (for ex. in d4ed66734).
Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>
OpenZFS-issue: https://www.illumos.org/issues/6872
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/f83b46bCloses#5640
Porting Notes:
- Many of the fixes proposed by this patch were already applied.
In the cases where a different but equivalent fix was made the
code was updated with the OpenZFS version to minimize differences.
- The zpool_get_vdev_by_name() function was previously removed
by commit 235db0a.
Authored by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Haakan T Johansson <f96hajo@chalmers.se>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/6551
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b327cd3Closes#5590
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Haakan T Johansson <f96hajo@chalmers.se>
Closes#5547Closes#5543
zpool iostat allows you to specify only certain vdevs to display.
Currently, if you run 'zpool iostat -c CMD vdev1 vdev2 ...'
on specific vdevs, it will actually run the command on *all* vdevs,
and just display the results for the vdevs you specify. This patch
corrects the behavior to only run the command on the specified vdevs,
and also enables the zpool_iostat_005_pos.ksh tests.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#5443
Enable picky cstyle checks and resolve the new warnings. The vast
majority of the changes needed were to handle minor issues with
whitespace formatting. This patch contains no functional changes.
Non-whitespace changes are as follows:
* 8 times ; to { } in for/while loop
* fix missing ; in cmd/zed/agents/zfs_diagnosis.c
* comment (confim -> confirm)
* change endline , to ; in cmd/zpool/zpool_main.c
* a number of /* BEGIN CSTYLED */ /* END CSTYLED */ blocks
* /* CSTYLED */ markers
* change == 0 to !
* ulong to unsigned long in module/zfs/dsl_scan.c
* rearrangement of module_param lines in module/zfs/metaslab.c
* add { } block around statement after for_each_online_node
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Håkan Johansson <f96hajo@chalmers.se>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#5465
Do not force VDEV_NAME_TYPE_ID in max_width(), instead add it
in the relevant calls to max_width().
The first location of max_width() where VDEV_NAME_TYPE_ID is
now added in show_import() is followed by print_import_config() and
print_logs(). Both these print children vdev names that have been
retrieved using an explicit VDEV_NAME_TYPE_ID added.
The second location is in status_callback(). This is followed by
print_status_config(), print_logs(), print_l2cache(), and
print_spares(). For l2cache and spares it should not matter as there
are no mirror-X or raidz-X involved. print_status_config() as above
retrieves the name using explicit VDEV_NAME_TYPE_ID before
calling itself to print children.
The call of max_width() in get_namewidth() is not changed, as this is
used by zpool_do_iostat(), followed by print_iostat(), which does not
add VDEV_NAME_TYPE_ID.
Overall, we should consider adding VDEV_NAME_TYPE_ID to the
relevant name_flags / cb_name_flags fields, and remove the explicit
adding in called routines.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Haakan T Johansson <f96hajo@chalmers.se>
Closes#5401
This patch adds a command (-c) option to zpool status and zpool iostat. The
-c option allows you to run an arbitrary command on each vdev and display
the first line of output in zpool status/iostat. The environment vars
VDEV_PATH and VDEV_UPATH are set to the vdev's path and "underlying path"
before running the command. For device mapper, multipath, or partitioned
vdevs, VDEV_UPATH is the actual underlying /dev/sd* disk. This can be useful
if the command you're running requires a /dev/sd* device.
The patch also uses /sys/block/<dev>/slaves/ to lookup the underlying device
instead of using libdevmapper. This not only removes the libdevmapper
requirement at build time, but also allows you to resolve device mapper
devices without being root. This means that UDEV_UPATH get set correctly
when running zpool status/iostat as an unprivileged user.
Example:
$ zpool status -c 'echo I am $VDEV_PATH, $VDEV_UPATH'
NAME STATE READ WRITE CKSUM
mypool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
mpatha ONLINE 0 0 0 I am /dev/mapper/mpatha, /dev/sdc
sdb ONLINE 0 0 0 I am /dev/sdb1, /dev/sdb
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#5368
Sometimes it is desirable to specifically disable one or several
features directly on the 'zpool create' command line.
$ zpool create -o feature@<feature>=disabled ...
Original-patch-by: Turbo Fredriksson <turbo@bayour.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#3460Closes#5142Closes#5324
First rename spare_cbdata_t cb -> spare_cb in print_status_config(),
to free up cb.
Using the structure removes the explicit parameters namewidth
and name_flags from several functions. Also use status_cbdata_t
for print_import_config(). This simplifies print_logs().
Remove the parameter 'verbose' for print_logs(). It does not really
mean verbose, it selected between the print_status_config and
print_import_config() paths. This selection is now done by
cb_print_config of spare_cbdata_t.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Håkan Johansson <f96hajo@chalmers.se>
Closes#5259
When array is passed as a parameter it degenerates into a
pointer so the sizeof(path) in is_shorthand_path() and always
get return value of 8, instead of the string length we want.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Closes#5198
1.coverity scan CID:147445 function zfs_do_send in zfs_main.c
Buffer not null terminated (BUFFER_SIZE_WARNING)
2.coverity scan CID:147443 function zfs_do_bookmark in zfs_main.c
Buffer not null terminated (BUFFER_SIZE_WARNING)
3.coverity scan CID:147660 function main in zinject.c
Passing string argv[0] of unknown size to strcpy
By the way, the leak of g_zfs is fixed.
4.coverity scan CID: 147442 function make_disks in zpool_vdev.c
Buffer not null terminated (BUFFER_SIZE_WARNING)
5.coverity scan CID: 147661 function main in dir_rd_update.c
passing string cp1 of unknown size to strcpy
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes#5130
This first phase brings over the ZFS SLM module, zfs_mod.c, to handle
auto operations in response to disk events. Disk event monitoring is
provided from libudev and generates the expected payload schema for
zfs_mod. This work leverages the recently added devid and phys_path
strings in the vdev label.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#4673
Commit 519129f added support to multi-thread 'zpool import' for
the case where block devices are scanned for under /dev/. This
commit generalizes that logic and applies it to the case where
device names are acquired from libblkid.
The zpool_find_import_scan() and zpool_find_import_blkid()
functions create an AVL tree containing each device name. Each
entry in this tree is dispatched to a taskq where the function
zpool_open_func() validates the device by opening it and reading
the label. This may result in additional entries being added
to the tree and those device paths being verified.
This is largely how the upstream OpenZFS code behaves but due to
significant differences the non-Linux code has been dropped for
readability. Additionally, this code makes use of taskqs and
kmutexs which are normally not available to the command line tools.
Special care has been taken to allow their use in the import
functions.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#4794
ZFS allows for specific permissions to be delegated to normal users
with the `zfs allow` and `zfs unallow` commands. In addition, non-
privileged users should be able to run all of the following commands:
* zpool [list | iostat | status | get]
* zfs [list | get]
Historically this functionality was not available on Linux. In order
to add it the secpolicy_* functions needed to be implemented and mapped
to the equivalent Linux capability. Only then could the permissions on
the `/dev/zfs` be relaxed and the internal ZFS permission checks used.
Even with this change some limitations remain. Under Linux only the
root user is allowed to modify the namespace (unless it's a private
namespace). This means the mount, mountpoint, canmount, unmount,
and remount delegations cannot be supported with the existing code. It
may be possible to add this functionality in the future.
This functionality was validated with the cli_user and delegation test
cases from the ZFS Test Suite. These tests exhaustively verify each
of the supported permissions which can be delegated and ensures only
an authorized user can perform it.
Two minor bug fixes were required for test-running.py. First, the
Timer() object cannot be safely created in a `try:` block when there
is an unconditional `finally` block which references it. Second,
when running as a normal user also check for scripts using the
both the .ksh and .sh suffixes.
Finally, existing users who are simulating delegations by setting
group permissions on the /dev/zfs device should revert that
customization when updating to a version with this change.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#362Closes#434Closes#4100Closes#4394Closes#4410Closes#4487
This is a purely cosmetical change, to consistently prefer one of
two (both acceptable) choises for the word parsable in documentation and
code. I don't really care which to use, but acording to wiktionary
https://en.wiktionary.org/wiki/parsable#English parsable is preferred.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4682
Add argument format to print_one_column(), and use it to call
zfs_nicenum_format with, instead of just zfs_nicenum. Don't print "%"
for fragmentation or capacity percent values.
The calls to print_one_colum is made with ZFS_NICENUM_RAW if
cb->cb_literal (zpool list called with -p), and ZFS_NICENUM_1024 if not.
Also zpool_get_prop is modified to don't add "%" or "x" if literal.
Signed-off-by: Christer Ekholm <che@chrekh.se>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov
Closes#4657
3993 zpool(1M) and zfs(1M) should support -p for "list" and "get"
4700 "zpool get" doesn't support -H or -o options
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/3993
OpenZFS-issue: https://www.illumos.org/issues/4700
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c58b352
Porting notes:
I removed ZoL's zpool_get_prop_literal() in favor of
zpool_get_prop(..., boolean_t literal) since that's what OpenZFS
uses. The functionality is the same.
5669 altroot not set in zpool create when specified with -o
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/5669
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c423721Closes#4594
6659 nvlist_free(NULL) is a no-op
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Marcel Telka <marcel@telka.sk>
Approved by: Robert Mustacchi <rm@joyent.com>
References:
https://www.illumos.org/issues/6659https://github.com/illumos/illumos-gate/commit/aab83bb
Ported-by: David Quigley <dpquigl@davequigley.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4566
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks. This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc. Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.
Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable. At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.
In order to address this the zpool_label_disk_wait() function
has been updated to use libudev. Until the registered system
device acknowledges that it in fully initialized the function
will wait. Once fully initialized all device links are checked
and allowed to settle for 50ms. This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.
For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time. In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked. However, if the rare event it is needed
it will prevent a failure.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes#4523Closes#3708Closes#4077Closes#4144Closes#4214Closes#4517
When partitioning a device a name may be specified for each partition.
Internally zfs doesn't use this partition name for anything so it
has always just been set to "zfs".
However this isn't optimal because udev will create symlinks using
this name in /dev/disk/by-partlabel/. If the name isn't unique
then all the links cannot be created.
Therefore a random 64-bit value has been added to the partition
label, i.e "zfs-1234567890abcdef". Additional information could
be encoded here but since partitions may be reused that might
result in confusion and it was decided against.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes#4517
This is foundational work for ZED.
Updates a leaf vdev's persistent device strings on Linux platform
* only applies for a dedicated leaf vdev (aka whole disk)
* updated during pool create|add|attach|import
* used for matching device matching during auto-{online,expand,replace}
* stored in a leaf disk config label (i.e. alongside 'path' NVP)
* can opt-out using env var ZFS_VDEV_DEVID_OPT_OUT=YES
Some examples:
path: '/dev/sdb1'
devid: 'scsi-350000394a8ca4fbc-part1'
phys_path: 'pci-0000:04:00.0-sas-0x50000394a8ca4fbf-lun-0'
path: '/dev/mapper/mpatha'
devid: 'dm-uuid-mpath-35000c5006304de3f'
Signed-off-by: Don Brady <don.brady@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2856Closes#3978Closes#4416
This issue was caused by calling `thread_init()` and `thread_fini()`
multiple times resulting in `kthread_key` being invalid. To resolve
the issue the explicit calls to `thread_init()` and `thread_fini()`
required by the `zpool` command have been moved in to the command.
Consumers such as `zdb` and `zhack` perform the same initialized
through `kernel_init()` and `kernel_fini()`.
Resolving this issue allows multiple additional test cases to
be enabled.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes#4331
When checking a whole disk to see if it can be safely added to
the pool a variety of checks are done. One of those checks is
to attempt to determine the partition information and scan all
the partitions for existing filesystems.
Since ZoL contains a EFI library this partition scanning is
easy to do for GPT partitioned disks. However, for non-GPT
partitioned disks (MBR/EBR) things are a bit harder. The lack of
a convenient library means non-GPT partitioned disks will not
have all their partitions checked. For this reason, the default
behavior was to require the force option. For example:
invalid vdev specification
use '-f' to override the following errors:
/dev/vdb does not contain an GPT label but it may contain partition
information in the MBR.
However in practice requiring the force option for this case is
counter-intuitively less safe. The reason is because only the first
error is returned. By passing the force option it will suppress
this first warning and potentially others you were not aware of.
Therefore this patch inverts the default behavior for non-GPT
formated disks (unformatted, MBR/EBR, etc). If no GPT table is
detected and there is no file system detected on the provided
block device. Then it will be assumed that block device is safe
to use.
Longer term it would be nice to see MBR/EBR scanning added to
the utilities. This should be fairly straight forward to do.
However these days it's somewhat less critical because Linux
defaults to GPT partition tables for devices 2TB or larger.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2660Closes#2274
Historically libblkid support was detected as part of configure
and optionally enabled. This was done because at the time support
for detecting ZFS pool vdevs had just be added to libblkid and
those updated packages were not yet part of many distributions.
This is no longer the case and any reasonably current distribution
will ship a version of libblkid which can detect ZFS pool vdevs.
This patch makes libblkid mandatory at build time and libblkid
the preferred method of scanning for ZFS pools. For distributions
which include a modern version of libblkid there is no change in
behavior. Explicitly scanning the default search paths is still
supported and can be enabled with the '-s' command line option.
Additionally making libblkid mandatory means that the 'zpool create'
command can reliably detect if a specified device has an existing
non-ZFS filesystem (ext4, xfs) and print a warning.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2448
Commit d2f3e29 introduced the -p option which outputs full paths
for vdevs to multiple zpool subcommands. When this was merged
there was no conflict for this flag letter. However it's certain
there will be a conflict with the -p (parsable) flag used by other
subcommands. Therefore, -p is being changed to -P to avoid this.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4368
The following options have been added to the zpool add, iostat,
list, status, and split subcommands. The default behavior was
not modified, from zfs(8).
-g Display vdev GUIDs instead of the normal short
device names. These GUIDs can be used in-place of
device names for the zpool detach/off‐
line/remove/replace commands.
-L Display real paths for vdevs resolving all symbolic
links. This can be used to lookup the current block
device name regardless of the /dev/disk/ path used
to open it.
-p Display full paths for vdevs instead of only the
last component of the path. This can be used in
conjunction with the -L flag.
This behavior may also be enabled using the following environment
variables.
ZPOOL_VDEV_NAME_GUID
ZPOOL_VDEV_NAME_FOLLOW_LINKS
ZPOOL_VDEV_NAME_PATH
This change is based on worked originally started by Richard Yao
to add a -g option. Then extended by @ilovezfs to add a -L option
for openzfsonosx. Those changes have been merged, re-factored,
a -p option added and extended to all relevant zpool subcommands.
Original-patch-by: Richard Yao <ryao@gentoo.org>
Extended-by: ilovezfs <ilovezfs@icloud.com>
Extended-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ilovezfs <ilovezfs@icloud.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2011Closes#4341
5767 fix several problems with zfs test suite
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
References:
https://www.illumos.org/issues/5767https://github.com/illumos/illumos-gate/commit/52244c0
Porting Notes:
- Only the updates to zpool_main.c were kept because the ZFS test
suite is not currently part of the ZoL source tree. The test
suite itself should be updated to include the latest versions
of the tests once we're running it for every commit
- Fixes `zpool list` output.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
References:
https://www.illumos.org/issues/5959https://github.com/illumos/illumos-gate/commit/ca0cc39
Porting notes:
illumos code doesn't check for feature_get_refcount() returning
ENOTSUP (which means feature is disabled) in zdb. zfsonlinux added
a check in https://github.com/zfsonlinux/zfs/commit/784652c
due to #3468. The check was reintroduced here.
Ported-by: Witaut Bajaryn <vitaut.bayaryn@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3965
All other sections use a tab, which makes them easy to parse. Only the
"scan:" section had its continuation lines indented with four spaces.
This makes them consistent with the others.
Signed-off-by: Remy Blank <remy.blank@pobox.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes#3769
Build products from an out of tree build should be written
relative to the build directory. Sources should be referred
to by their locations in the source directory.
This is accomplished by adding the 'src' and 'obj' variables
for the module Makefile.am, using relative paths to reference
source files, and by setting VPATH when source files are not
co-located with the Makefile. This enables the following:
$ mkdir build
$ cd build
$ ../configure \
--with-spl=$HOME/src/git/spl/ \
--with-spl-obj=$HOME/src/git/spl/build
$ make -s
This change also has the advantage of resolving the following
warning which is generated by modern versions of automake.
Makefile.am:00: warning: source file 'xxx' is in a subdirectory,
Makefile.am:00: but option 'subdir-objects' is disabled
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1082
5118 When verifying or creating a storage pool, error messages
only show one device
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Boris Protopopov <boris.protopopov@me.com>
Approved by: Dan McDonald <danmcd@omniti.com>
References:
https://github.com/illumos/illumos-gate/commit/75fbdf9https://www.illumos.org/issues/5118
Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3567
4966 zpool list iterator does not update output
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Garrett D'Amore <garrett@damore.org>
References:
https://github.com/illumos/illumos-gate/commit/cd67d23https://www.illumos.org/issues/4966
Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3566
sysstat's iostat omits the first report when the -y option is used.
This patch adds that functionality and omits the first report with
statistics since system boot.
Signed-off-by: Hajo Möller <dasjoe@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3439
All fprintf() error messages are moved out of the libzfs_init()
library function where they never belonged in the first place. A
libzfs_error_init() function is added to provide useful error
messages for the most common causes of failure.
Additionally, in libzfs_run_process() the 'rc' variable was renamed
to 'error' for consistency with the rest of the code base.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>