Ensure an error message is logged when the 'zfs.sh' script fails
to either load a module or if udev fails to create the /dev/zfs
device. Error messages for missing KERNEL_MODULES are suppressed
because that functionality may just be built-in to the kernel.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Several of the in-tree regression tests depend on the availability
of loop devices. If for some reason no loop devices are available
the tests will fail.
Normally this isn't an issue because most Linux distributions create
8 loop devices by default. This is enough for our purposes. However,
recent Fedora releases have only been creating a single loop device
and this leads to failures. Alternately, if something else of the
system is using the loop devices we may see failures.
The fix for this is to update the support scripts to dynamically
create loop devices as needed. The scripts need only create a node
under /dev/ and the loop driver with create the minor. This behavior
has been supported by the loop driver for ages.
Additionally this patch updates cleanup_loop_devices() to cleanup
loop devices which have already had their file store deleted. This
helps prevent stale loop devices from accumulating on the system due
to test failures.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Closes#2249
zed monitors ZFS events. When a zevent is posted, zed will run any
scripts that have been enabled for the corresponding zevent class.
Multiple scripts may be invoked for a given zevent. The zevent
nvpairs are passed to the scripts as environment variables.
Events are processed synchronously by the single thread, and there is
no maximum timeout for script execution. Consequently, a misbehaving
script can delay (or forever block) the processing of subsequent
zevents. Plans are to address this in future commits.
Initial scripts have been developed to log events to syslog
and send email in response to checksum/data/io errors and
resilver.finish/scrub.finish events. By default, email will only
be sent if the ZED_EMAIL variable is configured in zed.rc (which is
serving as a config file of sorts until a proper configuration file
is implemented).
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2
Early versions of ZFS coordinated the creation and destruction
of device minors from userspace. This was inherently racy and
in late 2009 these ioctl()s were removed leaving everything up
to the kernel. This significantly simplified the code.
However, we never picked up these changes in ZoL since we'd
already significantly adjusted this code for Linux. This patch
aims to rectify that by finally removing ZFC_IOC_*_MINOR ioctl()s
and moving all the functionality down in to the kernel. Since
this cleanup will change the kernel/user ABI it's being done
in the same tag as the previous libzfs_core ABI changes. This
will minimize, but not eliminate, the disruption to end users.
Once merged ZoL, Illumos, and FreeBSD will basically be back
in sync in regards to handling ZVOLs in the common code. While
each platform must have its own custom zvol.c implemenation the
interfaces provided are consistent.
NOTES:
1) This patch introduces one subtle change in behavior which
could not be easily avoided. Prior to this change callers
of 'zfs create -V ...' were guaranteed that upon exit the
/dev/zvol/ block device link would be created or an error
returned. That's no longer the case. The utilities will no
longer block waiting for the symlink to be created. Callers
are now responsible for blocking, this is why a 'udev_wait'
call was added to the 'label' function in scripts/common.sh.
2) The read-only behavior of a ZVOL now solely depends on if
the ZVOL_RDONLY bit is set in zv->zv_flags. The redundant
policy setting in the gendisk structure was removed. This
both simplifies the code and allows us to safely leverage
set_disk_ro() to issue a KOBJ_CHANGE uevent. See the
comment in the code for futher details on this.
3) Because __zvol_create_minor() and zvol_alloc() may now be
called in a sync task they must use KM_PUSHPAGE.
References:
illumos/illumos-gate@681d9761e8
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes#1969
According to the FHS. Testing scripts and examples which are all
architecture independent should be installed in a subdirectory
under /usr/share.
http://www.pathname.com/fhs/2.2/fhs-4.11.html
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
In the interest of maintaining only one udev helper to give vdevs
user friendly names, the zpool_id and zpool_layout infrastructure
is being retired. They are superseded by vdev_id which incorporates
all the previous functionality.
Documentation for the new vdev_id(8) helper and its configuration
file, vdev_id.conf(5), can be found in their respective man pages.
Several useful example files are installed under /etc/zfs/.
/etc/zfs/vdev_id.conf.alias.example
/etc/zfs/vdev_id.conf.multipath.example
/etc/zfs/vdev_id.conf.sas_direct.example
/etc/zfs/vdev_id.conf.sas_switch.example
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#981
The -q option should quiet the mkfs.ext2 output but certain
versions of e2fsprogs appear to ignore it. This can result in
an extra 'done' message in the test output. To keep this noise
from distracting just direct stdout to /dev/null.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Test 5, 6, 7, and 7 in zconfig.sh use /bin/ as a source of random
directories and files for their test. This has lead to unexpected
tests failures because the total size of /bin/ on the test system
isn't checked and it is entirely possible for it to be larger than
the target filesystem.
To resolve this issue we create a somewhat random collection of
files and directories in /var/tmp to use. On average we expect
about 5MB of data with the worst case being 20MB. This is large
enough to be interesting and small enough to always fit in the
default test datasets.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1113
The 'exit $?' command in the INT TERM EXIT trap was overwritting
the expected error code with the error code from mv. Fix the
issue by removing the 'exit $?'. It's important the we preserve
the original error code so failures are easily noticed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The function unused_loop_device in /usr/libexec/zfs/common.sh
returns /dev/loop-control on the first call. This device is NOT
a loop device (https://github.com/torvalds/linux/commit/770fe30)
it is a control device. This in turn causes the script zconfig.sh
to fail with:
zpool-create.sh: Error 1 creating /tmp/zpool-vdev0 ->
/dev/loop-control loopback
The patch makes the function return /dev/loop[0-9]* which are
loop devices.
Signed-off-by: Andrew Reid <ColdCanuck@nailedtotheperch.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#797
Currently, mkfs.ext2 on zconfig.sh zvols tries to use a 8K blocksize,
probably because by default zvol exposes an optimal I/O size of 8K.
Unfortunately, a ext2 blocksize of 8K is not supported by the kernel,
so the resulting filesystem is unmountable.
This patch fixes the issue by making sure the blocksize is 4K. We have
to use -F to force it else mkfs.ext2 won't allow us to use a blocksize
smaller than the optimal I/O size.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#979
The recent zvol improvements have changed default suggested alignment
for zvols from 512b (default) to 8k (zvol blocksize). Because of this
the zconfig.sh tests which create paritions are now generating a
warning about non-optimal alignments.
This change updates the need zconfig.sh tests such that a partition
will be properly aligned. In the process, it shifts from using the
sfdisk utility to the parted utility to create partitions. It also
moves the creation of labels, partitions, and filesystems in to
generic functions in common.sh.in.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
When running the zconfig.sh, zpios-sanity.sh, and zfault.sh
from the installed packages the 90-zfs.rules can cause failures.
These will occur because the test suite assumes it has full
control over loading/unloading the module stack. If the stack
gets asynchronously loaded by the udev rule the test suite
will treat it as a failure. Resolve the issue by disabling
the offending rule during the tests and enabling it on exit.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
When your kernel is built with kernel stack tracing enabled and you
have the debugfs filesystem mounted. Then the zfs.sh script will clear
the worst observed kernel stack depth on module load and check the worst
case usage on module removal. If the stack depth ever exceeds 7000
bytes the full stack will be printed for debugging. This is dangerously
close to overrunning the default 8k stack.
This additional advisory debugging is particularly valuable when running
the regression tests on a kernel built with 16k stacks. In this case,
almost no matter how bad the stack overrun is you will see be able to
get a clean stack trace for debugging. Since the worst case stack usage
can be highly variable it's helpful to always check the worst case usage.
Some udev hooks are not designed to be idempotent, so calling udevadm
trigger outside of the distribution's initialization scripts can have
unexpected (and potentially dangerous) side effects. For example, the
system time may change or devices may appear multiple times. See Ubuntu
launchpad bug 320200 and this mailing list post for more details:
https://lists.ubuntu.com/archives/ubuntu-devel/2009-January/027260.html
To avoid these problems we call udevadm trigger with --action=change
--subsystem-match=block. The first argument tells udev just to refresh
devices, and make sure everything's as it should be. The second
argument limits the scope to block devices, so devices belonging to
other subsystems cannot be affected.
This doesn't fix the problem on older udev implementations that don't
provide udevadm but instead have udevtrigger as a standalone program.
In this case the above options aren't available so there's no way to
call call udevtrigger safely. But we can live with that since this
issue only exists in optional test and helper scripts, and most
zfs-on-linux users are running newer systems anyways.
Loading and unloading the zlib modules as part of the zfs.sh
script has proven a little problematic for a few reasons.
* First, your kernel may not need to load either zlib_inflate
or zlib_deflate. This functionality may be built directly in
to your kernel. It depends entirely on what your distribution
decided was the right thing to do.
* Second, even if you do manage to load the correct modules you
may not be able to unload them. There may other consumers
of the modules with a reference preventing the unload.
To avoid both of these issues the test scripts have been updated to
attempt to unconditionally load all modules listed in KERNEL_MODULES.
If the module is successfully loaded you must have needed it. If
the module can't be loaded that almost certainly means either it is
built in to your kernel or is already being used by another consumer.
In both cases this is not an issue and we can move on to the spl/zfs
modules.
Finally, by removing these kernel modules from the MODULES list
we ensure they are never unloaded during 'zfs.sh -u'. This avoids
the issue of the script failing because there is another consumer
using the module we were not aware of. In other words the script
restricts unloading modules to only the spl/zfs modules.
Closes#78
Eleven new zpool configurations were added to allow testing of various
failure cases. The first 5 zpool configurations leverage the 'faulty'
md device type which allow us to simuluate IO errors at the block layer.
The last 6 zpool configurations leverage the scsi_debug module provided
by modern kernels. This device allows you to create virtual scsi
devices which are backed by a ram disk. With this setup we can verify
the full IO stack by injecting faults at the lowest layer. Both methods
of fault injection are important to verifying the IO stack.
The zfs code itself also provides a mechanism for error injection
via the zinject command line tool. While we should also take advantage
of this appraoch to validate the code it does not address any of the
Linux integration issues which are the most concerning. For the
moment we're trusting that the upstream Solaris guys are running
zinject and would have caught internal zfs logic errors.
Currently, there are 6 r/w test cases layered on top of the 'faulty'
md devices. They include 3 writes tests for soft/transient errors,
hard/permenant errors, and all writes error to the device. There
are 3 matching read tests for soft/transient errors, hard/permenant
errors, and fixable read error with a write. Although for this last
case zfs doesn't do anything special.
The seventh test case verifies zfs detects and corrects checksum
errors. In this case one of the drives is extensively damaged and
by dd'ing over large sections of it. We then ensure zfs logs the
issue and correctly rebuilds the damage.
The next test cases use the scsi_debug configuration to injects error
at the bottom of the scsi stack. This ensures we find any flaws in the
scsi midlayer or our usage of it. Plus it stresses the device specific
retry, timeout, and error handling outside of zfs's control.
The eighth test case is to verify that the system correctly handles an
intermittent device timeout. Here the scsi_debug device drops 1 in N
requests resulting in a retry either at the block level. The ZFS code
does specify the FAILFAST option but it turns out that for this case
the Linux IO stack with still retry the command. The FAILFAST logic
located in scsi_noretry_cmd() does no seem to apply to the simply
timeout case. It appears to be more targeted to specific device or
transport errors from the lower layers.
The ninth test case handles a persistent failure in which the device
is removed from the system by Linux. The test verifies that the failure
is detected, the device is made unavailable, and then can be successfully
re-add when brought back online. Additionally, it ensures that errors
and events are logged to the correct places and the no data corruption
has occured due to the failure.
Occasional failures were observed in zconfig.sh because udev
could be delayed for a few seconds. To handle this the wait_udev
function has been added to wait for timeout seconds for an
expected device before returning an error. By default callers
currently use a 30 seconds timeout which should be much longer
than udev ever needs but not so long to worry the test suite
is hung.
One of the neat tricks an autoconf style project is capable of
is allow configurion/building in a directory other than the
source directory. The major advantage to this is that you can
build the project various different ways while making changes
in a single source tree.
For example, this project is designed to work on various different
Linux distributions each of which work slightly differently. This
means that changes need to verified on each of those supported
distributions perferably before the change is committed to the
public git repo.
Using nfs and custom build directories makes this much easier.
I now have a single source tree in nfs mounted on several different
systems each running a supported distribution. When I make a
change to the source base I suspect may break things I can
concurrently build from the same source on all the systems each
in their own subdirectory.
wget -c http://github.com/downloads/behlendorf/zfs/zfs-x.y.z.tar.gz
tar -xzf zfs-x.y.z.tar.gz
cd zfs-x-y-z
------------------------- run concurrently ----------------------
<ubuntu system> <fedora system> <debian system> <rhel6 system>
mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6
cd ubuntu cd fedora cd debian cd rhel6
../configure ../configure ../configure ../configure
make make make make
make check make check make check make check
This change also moves many of the include headers from individual
incude/sys directories under the modules directory in to a single
top level include directory. This has the advantage of making
the build rules cleaner and logically it makes a bit more sense.
This branch contains the majority of the changes required to cleanly
intergrate with Linux style special devices (/dev/zfs). Mainly this
means dropping all the Solaris style callbacks and replacing them
with the Linux equivilants.
This patch also adds the onexit infrastructure needed to track
some minimal state between ioctls. Under Linux it would be easy
to do this simply using the file->private_data. But under Solaris
they apparent need to pass the file descriptor as part of the ioctl
data and then perform a lookup in the kernel. Once again to keep
code change to a minimum I've implemented the Solaris solution.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add autoconf style build infrastructure to the ZFS tree. This
includes autogen.sh, configure.ac, m4 macros, some scripts/*,
and makefiles for all the core ZFS components.