ZFS allows for specific permissions to be delegated to normal users
with the `zfs allow` and `zfs unallow` commands. In addition, non-
privileged users should be able to run all of the following commands:
* zpool [list | iostat | status | get]
* zfs [list | get]
Historically this functionality was not available on Linux. In order
to add it the secpolicy_* functions needed to be implemented and mapped
to the equivalent Linux capability. Only then could the permissions on
the `/dev/zfs` be relaxed and the internal ZFS permission checks used.
Even with this change some limitations remain. Under Linux only the
root user is allowed to modify the namespace (unless it's a private
namespace). This means the mount, mountpoint, canmount, unmount,
and remount delegations cannot be supported with the existing code. It
may be possible to add this functionality in the future.
This functionality was validated with the cli_user and delegation test
cases from the ZFS Test Suite. These tests exhaustively verify each
of the supported permissions which can be delegated and ensures only
an authorized user can perform it.
Two minor bug fixes were required for test-running.py. First, the
Timer() object cannot be safely created in a `try:` block when there
is an unconditional `finally` block which references it. Second,
when running as a normal user also check for scripts using the
both the .ksh and .sh suffixes.
Finally, existing users who are simulating delegations by setting
group permissions on the /dev/zfs device should revert that
customization when updating to a version with this change.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#362Closes#434Closes#4100Closes#4394Closes#4410Closes#4487
zfsonlinux issue #2217 - zvol minor operations: check snapdev
property before traversing snapshots of a dataset
zfsonlinux issue #3681 - lock order inversion between zvol_open()
and dsl_pool_sync()...zvol_rename_minors()
Create a per-pool zvol taskq for asynchronous zvol tasks.
There are a few key design decisions to be aware of.
* Each taskq must be single threaded to ensure tasks are always
processed in the order in which they were dispatched.
* There is a taskq per-pool in order to keep the pools independent.
This way if one pool is suspended it will not impact another.
* The preferred location to dispatch a zvol minor task is a sync
task. In this context there is easy access to the spa_t and
minimal error handling is required because the sync task must
succeed.
Support for asynchronous zvol minor operations address issue #3681.
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2217Closes#3678Closes#3681
When loading the ZFS kernel modules they should not populate the
spa namespace using the cache file. This behavior isn't consistent
with other Linux kernel modules and we need to move away from it.
Removing this makes the whole startup process predictable with four
basic steps which are driven by the init system.
1) modprobe
2) zpool import
3) zfs mount
4) zfs share
This change also helps lay the groundwork for eventually removing
the kobj_* compatibility code on the kernel side. It may need to
be preserved in userspace because libzfs_init() depends on it.
This is why the conditional must be wrapped with an #ifdef _KERNEL.
Signed-off-by: Dan Swartzendruber <dswartz@druber.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2820
Commit ba6a240 adjusted the behavior of 'zfs create -V'. The
caller is no longer guaranteed that udev will have finished
creating the /dev/ entries by the time to command exits. It
is therefore required that we explicitly block waiting for
udev to settle for this test to run reliably.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
When running zconfig.sh test 7 and 8 cause the following warning to
be printed to the console. It's caused because we're snapshoting a
mounted ext2 filesystem which is not in a 'clean' state. This is
to be expected since we have no guarentees about the on-disk
consistency of the filesystem.
EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
To silence the warning and preserve the intent of these test cases
they have been updated to unmount the filesystem prior to snapshoting
them. This ensures the ext2 filesystem is in a consistent state
when the snapshot is taken.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes#1972
One of the side effects of calling zvol_create_minors() in
zvol_init() is that all pools listed in the cache file will
be opened. Depending on the state and contents of your pool
this operation can take a considerable length of time.
Doing this at load time is undesirable because the kernel
is holding a global module lock. This prevents other modules
from loading and can serialize an otherwise parallel boot
process. Doing this after module inititialization also
reduces the chances of accidentally introducing a race
during module init.
To ensure that /dev/zvol/<pool>/<dataset> devices are
still automatically created after the module load completes
a udev rules has been added. When udev notices that the
/dev/zfs device has been create the 'zpool list' command
will be run. This then will cause all the pools listed
in the zpool.cache file to be opened.
Because this process in now driven asynchronously by udev
there is the risk of problems in downstream distributions.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #756
Issue #1020
Issue #1234
The new snapdev dataset property may be set to control the
visibility of zvol snapshot devices. By default this value
is set to 'hidden' which will prevent zvol snapshots from
appearing under /dev/zvol/ and /dev/<dataset>/. When set to
'visible' all zvol snapshots for the dataset will be visible.
This functionality was largely added because when automatic
snapshoting is enabled large numbers of read-only zvol snapshots
will be created. When creating these devices the kernel will
attempt to read their partition tables, and blkid will attempt
to identify any filesystems on those partitions. This leads
to a variety of issues:
1) The zvol partition tables will be read in the context of
the `modprobe zfs` for automatically imported pools. This
is undesirable and should be done asynchronously, but for
now reducing the number of visible devices helps.
2) Udev expects to be able to complete its work for a new
block devices fairly quickly. When many zvol devices are
added at the same time this is no longer be true. It can
lead to udev timeouts and missing /dev/zvol links.
3) Simply having lots of devices in /dev/ can be aukward from
a management standpoint. Hidding the devices your unlikely
to ever use helps with this. Any snapshot device which is
needed can be made visible by changing the snapdev property.
NOTE: This patch changes the default behavior for zvols which
was effectively 'snapdev=visible'.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1235Closes#945
Issue #956
Issue #756
Test 5, 6, 7, and 7 in zconfig.sh use /bin/ as a source of random
directories and files for their test. This has lead to unexpected
tests failures because the total size of /bin/ on the test system
isn't checked and it is entirely possible for it to be larger than
the target filesystem.
To resolve this issue we create a somewhat random collection of
files and directories in /var/tmp to use. On average we expect
about 5MB of data with the worst case being 20MB. This is large
enough to be interesting and small enough to always fit in the
default test datasets.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1113
Parted version 3.0 doesn't allow us to specify the start and end
percentages as 50% and 100% respectively. This results in:
Error: The location 100% is outside the device /dev/zd0
Therefore we change the syntax to 51% and -1 for end of device.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The recent zvol improvements have changed default suggested alignment
for zvols from 512b (default) to 8k (zvol blocksize). Because of this
the zconfig.sh tests which create paritions are now generating a
warning about non-optimal alignments.
This change updates the need zconfig.sh tests such that a partition
will be properly aligned. In the process, it shifts from using the
sfdisk utility to the parted utility to create partitions. It also
moves the creation of labels, partitions, and filesystems in to
generic functions in common.sh.in.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
When running the zconfig.sh, zpios-sanity.sh, and zfault.sh
from the installed packages the 90-zfs.rules can cause failures.
These will occur because the test suite assumes it has full
control over loading/unloading the module stack. If the stack
gets asynchronously loaded by the udev rule the test suite
will treat it as a failure. Resolve the issue by disabling
the offending rule during the tests and enabling it on exit.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
If a pool was not cleanly exported passing the -f flag may be required
at 'zpool import' time. Since this test is simply validating that the
pool can be successfully imported in the absense of the cache file
always pass the -f to ensure it succeeds. This failure was observed
under RHEL6.1.
This change should have occured when we commited the new udev
rules for zvols. Basically, the test script is just out of date.
We need to update it to use the /dev/zvol/ device names, and
to expect the more common -partN suffixes.
I added a udev_trigger() call in zconfig_partition() and
zconfig_zvol_device_stat() to ensure that all the udev rules have
run before. This ensures the devices are available to subsequent
commands and closes a small race.
Finally, I was forced added a small 'sleep 1' to test 10. I
was observing occassional failures in my VM due to the device
still claiming to be busy. Delaying betwen the various methods
of adding/removing a vdev avoids the issue.
Closes#207
When adding this functionality originally the options to only
run specific tests (-t), or conversely skip specific tests (-s)
were omitted from the usage page. This commit adds the missing
documentation.
The idea behind the '-c' flag is to cleanup everything from a
previous test run which might cause the test script to fail.
This should also include removing the previously loaded module.
This makes it a little easier to run 'zconfig.sh -c', however
remember this is a test script and it will take all of your
other zpools offline for the purposes of the test. This notion
has also been extended to the default 'make check' behavior.
This test performs a sanity check of the zpool add and remove commands. It
tests adding and removing both a cache disk and a log disk to and from a zpool.
Usage of both a shorthand device path and a full path is covered. The test
uses a scsi_debug device as the disk to be added and removed. This is done so
that zpool will see it as a whole disk and partition it, which it does not
currently done for loopback devices. We want to verify that the manipulation
done to whole disks paths to hide the parition information does not break the
add/remove interface.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Eleven new zpool configurations were added to allow testing of various
failure cases. The first 5 zpool configurations leverage the 'faulty'
md device type which allow us to simuluate IO errors at the block layer.
The last 6 zpool configurations leverage the scsi_debug module provided
by modern kernels. This device allows you to create virtual scsi
devices which are backed by a ram disk. With this setup we can verify
the full IO stack by injecting faults at the lowest layer. Both methods
of fault injection are important to verifying the IO stack.
The zfs code itself also provides a mechanism for error injection
via the zinject command line tool. While we should also take advantage
of this appraoch to validate the code it does not address any of the
Linux integration issues which are the most concerning. For the
moment we're trusting that the upstream Solaris guys are running
zinject and would have caught internal zfs logic errors.
Currently, there are 6 r/w test cases layered on top of the 'faulty'
md devices. They include 3 writes tests for soft/transient errors,
hard/permenant errors, and all writes error to the device. There
are 3 matching read tests for soft/transient errors, hard/permenant
errors, and fixable read error with a write. Although for this last
case zfs doesn't do anything special.
The seventh test case verifies zfs detects and corrects checksum
errors. In this case one of the drives is extensively damaged and
by dd'ing over large sections of it. We then ensure zfs logs the
issue and correctly rebuilds the damage.
The next test cases use the scsi_debug configuration to injects error
at the bottom of the scsi stack. This ensures we find any flaws in the
scsi midlayer or our usage of it. Plus it stresses the device specific
retry, timeout, and error handling outside of zfs's control.
The eighth test case is to verify that the system correctly handles an
intermittent device timeout. Here the scsi_debug device drops 1 in N
requests resulting in a retry either at the block level. The ZFS code
does specify the FAILFAST option but it turns out that for this case
the Linux IO stack with still retry the command. The FAILFAST logic
located in scsi_noretry_cmd() does no seem to apply to the simply
timeout case. It appears to be more targeted to specific device or
transport errors from the lower layers.
The ninth test case handles a persistent failure in which the device
is removed from the system by Linux. The test verifies that the failure
is detected, the device is made unavailable, and then can be successfully
re-add when brought back online. Additionally, it ensures that errors
and events are logged to the correct places and the no data corruption
has occured due to the failure.
Occasional failures were observed in zconfig.sh because udev
could be delayed for a few seconds. To handle this the wait_udev
function has been added to wait for timeout seconds for an
expected device before returning an error. By default callers
currently use a 30 seconds timeout which should be much longer
than udev ever needs but not so long to worry the test suite
is hung.
This branch contains the majority of the changes required to cleanly
intergrate with Linux style special devices (/dev/zfs). Mainly this
means dropping all the Solaris style callbacks and replacing them
with the Linux equivilants.
This patch also adds the onexit infrastructure needed to track
some minimal state between ioctls. Under Linux it would be easy
to do this simply using the file->private_data. But under Solaris
they apparent need to pass the file descriptor as part of the ioctl
data and then perform a lookup in the kernel. Once again to keep
code change to a minimum I've implemented the Solaris solution.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add autoconf style build infrastructure to the ZFS tree. This
includes autogen.sh, configure.ac, m4 macros, some scripts/*,
and makefiles for all the core ZFS components.