This patch leverages Linux tracepoints from within the ZFS on Linux
code base. It also refactors the debug code to bring it back in sync
with Illumos.
The information exported via tracepoints can be used for a variety of
reasons (e.g. debugging, tuning, general exploration/understanding,
etc). It is advantageous to use Linux tracepoints as the mechanism to
export this kind of information (as opposed to something else) for a
number of reasons:
* A number of external tools can make use of our tracepoints
"automatically" (e.g. perf, systemtap)
* Tracepoints are designed to be extremely cheap when disabled
* It's one of the "accepted" ways to export this kind of
information; many other kernel subsystems use tracepoints too.
Unfortunately, though, there are a few caveats as well:
* Linux tracepoints appear to only be available to GPL licensed
modules due to the way certain kernel functions are exported.
Thus, to actually make use of the tracepoints introduced by this
patch, one might have to patch and re-compile the kernel;
exporting the necessary functions to non-GPL modules.
* Prior to upstream kernel version v3.14-rc6-30-g66cc69e, Linux
tracepoints are not available for unsigned kernel modules
(tracepoints will get disabled due to the module's 'F' taint).
Thus, one either has to sign the zfs kernel module prior to
loading it, or use a kernel versioned v3.14-rc6-30-g66cc69e or
newer.
Assuming the above two requirements are satisfied, lets look at an
example of how this patch can be used and what information it exposes
(all commands run as 'root'):
# list all zfs tracepoints available
$ ls /sys/kernel/debug/tracing/events/zfs
enable filter zfs_arc__delete
zfs_arc__evict zfs_arc__hit zfs_arc__miss
zfs_l2arc__evict zfs_l2arc__hit zfs_l2arc__iodone
zfs_l2arc__miss zfs_l2arc__read zfs_l2arc__write
zfs_new_state__mfu zfs_new_state__mru
# enable all zfs tracepoints, clear the tracepoint ring buffer
$ echo 1 > /sys/kernel/debug/tracing/events/zfs/enable
$ echo 0 > /sys/kernel/debug/tracing/trace
# import zpool called 'tank', inspect tracepoint data (each line was
# truncated, they're too long for a commit message otherwise)
$ zpool import tank
$ cat /sys/kernel/debug/tracing/trace | head -n35
# tracer: nop
#
# entries-in-buffer/entries-written: 1219/1219 #P:8
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss: hdr...
z_rd_int/0-30156 [003] .... 91344.200611: zfs_new_state__mru...
lt-zpool-30132 [003] .... 91344.201173: zfs_arc__miss: hdr...
z_rd_int/1-30157 [003] .... 91344.201756: zfs_new_state__mru...
lt-zpool-30132 [003] .... 91344.201795: zfs_arc__miss: hdr...
z_rd_int/2-30158 [003] .... 91344.202099: zfs_new_state__mru...
lt-zpool-30132 [003] .... 91344.202126: zfs_arc__hit: hdr ...
lt-zpool-30132 [003] .... 91344.202130: zfs_arc__hit: hdr ...
lt-zpool-30132 [003] .... 91344.202134: zfs_arc__hit: hdr ...
lt-zpool-30132 [003] .... 91344.202146: zfs_arc__miss: hdr...
z_rd_int/3-30159 [003] .... 91344.202457: zfs_new_state__mru...
lt-zpool-30132 [003] .... 91344.202484: zfs_arc__miss: hdr...
z_rd_int/4-30160 [003] .... 91344.202866: zfs_new_state__mru...
lt-zpool-30132 [003] .... 91344.202891: zfs_arc__hit: hdr ...
lt-zpool-30132 [001] .... 91344.203034: zfs_arc__miss: hdr...
z_rd_iss/1-30149 [001] .... 91344.203749: zfs_new_state__mru...
lt-zpool-30132 [001] .... 91344.203789: zfs_arc__hit: hdr ...
lt-zpool-30132 [001] .... 91344.203878: zfs_arc__miss: hdr...
z_rd_iss/3-30151 [001] .... 91344.204315: zfs_new_state__mru...
lt-zpool-30132 [001] .... 91344.204332: zfs_arc__hit: hdr ...
lt-zpool-30132 [001] .... 91344.204337: zfs_arc__hit: hdr ...
lt-zpool-30132 [001] .... 91344.204352: zfs_arc__hit: hdr ...
lt-zpool-30132 [001] .... 91344.204356: zfs_arc__hit: hdr ...
lt-zpool-30132 [001] .... 91344.204360: zfs_arc__hit: hdr ...
To highlight the kind of detailed information that is being exported
using this infrastructure, I've taken the first tracepoint line from the
output above and reformatted it such that it fits in 80 columns:
lt-zpool-30132 [003] .... 91344.200050: zfs_arc__miss:
hdr {
dva 0x1:0x40082
birth 15491
cksum0 0x163edbff3a
flags 0x640
datacnt 1
type 1
size 2048
spa 3133524293419867460
state_type 0
access 0
mru_hits 0
mru_ghost_hits 0
mfu_hits 0
mfu_ghost_hits 0
l2_hits 0
refcount 1
} bp {
dva0 0x1:0x40082
dva1 0x1:0x3000e5
dva2 0x1:0x5a006e
cksum 0x163edbff3a:0x75af30b3dd6:0x1499263ff5f2b:0x288bd118815e00
lsize 2048
} zb {
objset 0
object 0
level -1
blkid 0
}
For the specific tracepoint shown here, 'zfs_arc__miss', data is
exported detailing the arc_buf_hdr_t (hdr), blkptr_t (bp), and
zbookmark_t (zb) that caused the ARC miss (down to the exact DVA!).
This kind of precise and detailed information can be extremely valuable
when trying to answer certain kinds of questions.
For anybody unfamiliar but looking to build on this, I found the XFS
source code along with the following three web links to be extremely
helpful:
* http://lwn.net/Articles/379903/
* http://lwn.net/Articles/381064/
* http://lwn.net/Articles/383362/
I should also node the more "boring" aspects of this patch:
* The ZFS_LINUX_COMPILE_IFELSE autoconf macro was modified to
support a sixth paramter. This parameter is used to populate the
contents of the new conftest.h file. If no sixth parameter is
provided, conftest.h will be empty.
* The ZFS_LINUX_TRY_COMPILE_HEADER autoconf macro was introduced.
This macro is nearly identical to the ZFS_LINUX_TRY_COMPILE macro,
except it has support for a fifth option that is then passed as
the sixth parameter to ZFS_LINUX_COMPILE_IFELSE.
These autoconf changes were needed to test the availability of the Linux
tracepoint macros. Due to the odd nature of the Linux tracepoint macro
API, a separate ".h" must be created (the path and filename is used
internally by the kernel's define_trace.h file).
* The HAVE_DECLARE_EVENT_CLASS autoconf macro was introduced. This
is to determine if we can safely enable the Linux tracepoint
functionality. We need to selectively disable the tracepoint code
due to the kernel exporting certain functions as GPL only. Without
this check, the build process will fail at link time.
In addition, the SET_ERROR macro was modified into a tracepoint as well.
To do this, the 'sdt.h' file was moved into the 'include/sys' directory
and now contains a userspace portion and a kernel space portion. The
dprintf and zfs_dbgmsg* interfaces are now implemented as tracepoint as
well.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Make the style checker script accept right parentheses on their own
lines. This is motivated by the Linux tracepoints macro
DECLARE_EVENT_CLASS.
The code within TP_fast_assign() (a parameter of DECLARE_EVENT_CLASS)
is normal C assignments terminated by semicolons. But the style
checker forbids us from following a semicolon with a non-blank and
from preceding a right parenthesis with white space. Therefore the
closing parenthesis must go on the next line, yet the style checker
foribs us from indenting it for readability. Relaxing the
no-non-blank-after-semicolon rule would open the door to too many bad
style practices. So instead we relax the
no-white-space-before-right-paren rule if the parenthesis is on its
own line. The relaxation is overriden with the -p option so we still
have a way to catch misuse of this style.
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The source_tree variable in the previous commit had an extra $.
Remove it so that source_tree is expanded properly. An identical
fix has been applied in the original patch to the stable branch.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2776
Signed-off-by: Tom Prince <tom.prince@clusterhq.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2776
New versions of dkms clean up the build directory after installing.
It appears that this was always intended, but had rm -rf "/path/to/build/*"
(note the quotes), which prevented it from working.
Also, the build step is already installing stuff into the directory where
these files go, so installing our stuff there as part of build rather than
install makes sense.
Signed-off-by: Tom Prince <tom.prince@clusterhq.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2776
The zimport.sh script makes use of the zpool-create.sh script
to construct test pools for importing with older versions of
ZoL. It is desirable to have a way to disable all the features
so new pools can be imported with older code.
The simplest and most flexible way to achieve this was to merge
the VERBOSE_FLAG and FORCE_FLAG in to a single ZPOOL_FLAGS
variable. The contents of this variable will be used in the
'zpool create' allowing us to easily pass arbitrary flags.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Closes#2524
Set LANG=C before calling 'rpmbuild' to avoid rpmbuild failing on
the translated date string in the changelog.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: zfsonlinux/spl#306
1) $SPLSRC and $SRCDIR should be changed to $SRC_DIR. These are
vestiges of an earlier version of the script and were missed when
it was updated. Additionally ensure the directory is created.
2) The 'fail' function should take an integer argument for the
error code to return. Otherwise 0 (success) will be mistakenly
returned and errors will we incorrectly suppressed. The error
code should be meaningful enough to determine where the script
failed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Unfortunately, the zimport.sh test script really does depend on
bash. Moving to /bin/sh should be possible once the shared
infrastructure scripts it depends on is made portable.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ensure an error message is logged when the 'zfs.sh' script fails
to either load a module or if udev fails to create the /dev/zfs
device. Error messages for missing KERNEL_MODULES are suppressed
because that functionality may just be built-in to the kernel.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Several of the in-tree regression tests depend on the availability
of loop devices. If for some reason no loop devices are available
the tests will fail.
Normally this isn't an issue because most Linux distributions create
8 loop devices by default. This is enough for our purposes. However,
recent Fedora releases have only been creating a single loop device
and this leads to failures. Alternately, if something else of the
system is using the loop devices we may see failures.
The fix for this is to update the support scripts to dynamically
create loop devices as needed. The scripts need only create a node
under /dev/ and the loop driver with create the minor. This behavior
has been supported by the loop driver for ages.
Additionally this patch updates cleanup_loop_devices() to cleanup
loop devices which have already had their file store deleted. This
helps prevent stale loop devices from accumulating on the system due
to test failures.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Closes#2249
zed monitors ZFS events. When a zevent is posted, zed will run any
scripts that have been enabled for the corresponding zevent class.
Multiple scripts may be invoked for a given zevent. The zevent
nvpairs are passed to the scripts as environment variables.
Events are processed synchronously by the single thread, and there is
no maximum timeout for script execution. Consequently, a misbehaving
script can delay (or forever block) the processing of subsequent
zevents. Plans are to address this in future commits.
Initial scripts have been developed to log events to syslog
and send email in response to checksum/data/io errors and
resilver.finish/scrub.finish events. By default, email will only
be sent if the ZED_EMAIL variable is configured in zed.rc (which is
serving as a config file of sorts until a proper configuration file
is implemented).
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2
Verify that an assortment of known good reference pools can be imported
using different versions of the ZoL code.
By default references pools for the major ZFS implementation will be
checked against the most recent ZoL tags and the master development branch.
Alternate tags or branches may be verified with the '-s <src-tag> option.
Passing the keyword "installed" will instruct the script to test whatever
version is installed.
Preferentially a reference pool is used for all tests. However, if one
does not exist and the pool-tag matches one of the src-tags then a new
reference pool will be created using binaries from that source build.
This is particularly useful when you need to test your changes before
opening a pull request.
New reference pools may be added by placing a bzip2 compressed tarball
of the pool in the scripts/zpool-example directory and then passing
the -p <pool-tag> option. To increase the test coverage reference pools
should be collected for all the major ZFS implementations. Having these
pools easily available is also helpful to the developers.
Care should be taken to run these tests with a kernel supported by all
the listed tags. Otherwise build failure will cause false positives.
EXAMPLES:
The following example will verify the zfs-0.6.2 tag, the master branch,
and the installed zfs version can correctly import the listed pools.
Note there is no reference pool available for master and installed but
because binaries are available one is automatically constructed. The
working directory is also preserved between runs (-k) preventing the
need to rebuild from source for multiple runs.
zimport.sh -k -f /var/tmp/zimport \
-s "zfs-0.6.1 zfs-0.6.2 master installed" \
-p "all master installed"
--------------------- ZFS on Linux Source Versions --------------
zfs-0.6.1 zfs-0.6.2 master 0.6.2-180
-----------------------------------------------------------------
Clone SPL Skip Skip Skip Skip
Clone ZFS Skip Skip Skip Skip
Build SPL Skip Skip Skip Skip
Build ZFS Skip Skip Skip Skip
-----------------------------------------------------------------
zevo-1.1.1 Pass Pass Pass Pass
zol-0.6.1 Pass Pass Pass Pass
zol-0.6.2-173 Fail Fail Pass Pass
zol-0.6.2 Pass Pass Pass Pass
master Fail Fail Pass Pass
installed Pass Pass Pass Pass
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Issue #2094
Allow the caller of the zpool-create.sh script to override
the default path where file vdevs are created. This allows
for greated flexibilty when scripting.
Additionally, update the default path from /tmp/ to /var/tmp/
because these days /tmp/ is likely a ramdisk. Even though
these files are sparse they may grow large in which case they
should be backed by a physical device.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2120
Commit ba6a240 adjusted the behavior of 'zfs create -V'. The
caller is no longer guaranteed that udev will have finished
creating the /dev/ entries by the time to command exits. It
is therefore required that we explicitly block waiting for
udev to settle for this test to run reliably.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Update the cstyle.pl script to allow pictures in all comments not
just header comments. Recent changes from Illumos such as d3cc8b1
have relocated various pictures in the standard block comments to
make the code more readable.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1821
When running zconfig.sh test 7 and 8 cause the following warning to
be printed to the console. It's caused because we're snapshoting a
mounted ext2 filesystem which is not in a 'clean' state. This is
to be expected since we have no guarentees about the on-disk
consistency of the filesystem.
EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
To silence the warning and preserve the intent of these test cases
they have been updated to unmount the filesystem prior to snapshoting
them. This ensures the ext2 filesystem is in a consistent state
when the snapshot is taken.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes#1972
Early versions of ZFS coordinated the creation and destruction
of device minors from userspace. This was inherently racy and
in late 2009 these ioctl()s were removed leaving everything up
to the kernel. This significantly simplified the code.
However, we never picked up these changes in ZoL since we'd
already significantly adjusted this code for Linux. This patch
aims to rectify that by finally removing ZFC_IOC_*_MINOR ioctl()s
and moving all the functionality down in to the kernel. Since
this cleanup will change the kernel/user ABI it's being done
in the same tag as the previous libzfs_core ABI changes. This
will minimize, but not eliminate, the disruption to end users.
Once merged ZoL, Illumos, and FreeBSD will basically be back
in sync in regards to handling ZVOLs in the common code. While
each platform must have its own custom zvol.c implemenation the
interfaces provided are consistent.
NOTES:
1) This patch introduces one subtle change in behavior which
could not be easily avoided. Prior to this change callers
of 'zfs create -V ...' were guaranteed that upon exit the
/dev/zvol/ block device link would be created or an error
returned. That's no longer the case. The utilities will no
longer block waiting for the symlink to be created. Callers
are now responsible for blocking, this is why a 'udev_wait'
call was added to the 'label' function in scripts/common.sh.
2) The read-only behavior of a ZVOL now solely depends on if
the ZVOL_RDONLY bit is set in zv->zv_flags. The redundant
policy setting in the gendisk structure was removed. This
both simplifies the code and allows us to safely leverage
set_disk_ro() to issue a KOBJ_CHANGE uevent. See the
comment in the code for futher details on this.
3) Because __zvol_create_minor() and zvol_alloc() may now be
called in a sync task they must use KM_PUSHPAGE.
References:
illumos/illumos-gate@681d9761e8
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes#1969
Cstyle is the C source style checker used by Illumos. Since the
original ZFS source was written using these style guidelines they
must also be followed by ZoL for consistency.
The checker has been added to the scripts directory and may be
run on a per file basis. New patches should be careful to avoid
introducing new style warnings.
Additionally, the 'checkstyle' target has been added to the top
level Makefile and can be used to check the entire source tree.
While Zol has historically attempted to follow the SunOS style
guide the lack of a rigorous style checker has allowed various
warning to be introduced. Currently there are 2211 reported
style violations and we want to gradually eliminate these from
the tree.
Note the cstyle.1 man page is provided under man/man1/cstyle.1
but since it is a developer utility it is not installed along
with the other man pages.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Added a simple sed script to do a search and replace on the Illumos
ZFS file names and replace them with the ZFS on Linux equivalent.
Example usage:
# Replace Illumos paths with Linux paths
$ ./scripts/zfs2zol-patch.sed arc.c.patch > arc.c.patch.linux
# Ensure the script worked as expected
$ diff arc.c.patch arc.c.patch.linux
# Apply the patch using Linux paths
$ patch -p1 < arc.c.patch.linux
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1679
When the kmod packaging infrastructure was originally added the
dependency on the rpmfusion yum repositories was disabled. This
was done at the time in favour of getting local builds working.
Now the time has come to conditionally re-enable that functionality
so we can properly provide binary kmod packages.
./configure --with-config=srpm
make SRPM_DEFINE_KMOD='--define="repo rpmfusion"' srpm-kmod
mock rebuild zfs-kmod-x.y.z-r.el6.src.rpm
One nice benefit of finishing this work is that the generic and
fedora spl-kmod spec files can be merged again.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
One of the side effects of calling zvol_create_minors() in
zvol_init() is that all pools listed in the cache file will
be opened. Depending on the state and contents of your pool
this operation can take a considerable length of time.
Doing this at load time is undesirable because the kernel
is holding a global module lock. This prevents other modules
from loading and can serialize an otherwise parallel boot
process. Doing this after module inititialization also
reduces the chances of accidentally introducing a race
during module init.
To ensure that /dev/zvol/<pool>/<dataset> devices are
still automatically created after the module load completes
a udev rules has been added. When udev notices that the
/dev/zfs device has been create the 'zpool list' command
will be run. This then will cause all the pools listed
in the zpool.cache file to be opened.
Because this process in now driven asynchronously by udev
there is the risk of problems in downstream distributions.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #756
Issue #1020
Issue #1234
This adds ability to set the location of spl via defines when
building from the spec files. This is useful for build systems
that build spl and zfs together without installing the actual rpms.
Signed-off-by: Nathaniel Clark <Nathaniel.Clark@misrule.us>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1486
In order to ensure that yum-builddep pulls in all the build
requirements a generic ${kmodname}-devel-kmod provides line is
added. This allows a version of the development headers to be
included without requiring knowledge of the kernel version.
This is important because unlike rpmbuild which does correctly
expand the source rpm spec file, yum-builddep does not. Without
this generic provides line mock which relies on yum-builddep is
unable to automatically satisfy the dependency.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The --with-linux and --with-linux-obj options must be specified
as part of the dkms build otherwise the package will be built
against the running kernel.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Support was added to dkms so build dependencies can be specified.
This allows us to ensure that the spl package will always be built
before the zfs package. Those patches have not yet been merged
upstream but they are available in the zfsonlinux/dkms repository.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Refresh the existing RPM packaging to conform to the 'Fedora
Packaging Guidelines'. This includes adopting the kmods2
packaging standard which is used fod kmods distributed by
rpmfusion for Fedora/RHEL.
http://fedoraproject.org/wiki/Packaging:Guidelineshttp://rpmfusion.org/Packaging/KernelModules/Kmods2
While the spec files have been entirely rewritten from a
user perspective the only major changes are:
* The Fedora packages now have a build dependency on the
rpmfusion repositories. The generic kmod packages also
have a new dependency on kmodtool-1.22 but it is bundled
with the source rpm so no additional packages are needed.
* The kernel binary module packages have been renamed from
zfs-modules-* to kmod-zfs-* as specificed by kmods2.
* The is now a common kmod-zfs-devel-* package in addition
to the per-kernel devel packages. The common package
contains the development headers while the per-kernel
package contains kernel specific build products.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1341
According to the FHS. Testing scripts and examples which are all
architecture independent should be installed in a subdirectory
under /usr/share.
http://www.pathname.com/fhs/2.2/fhs-4.11.html
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The new snapdev dataset property may be set to control the
visibility of zvol snapshot devices. By default this value
is set to 'hidden' which will prevent zvol snapshots from
appearing under /dev/zvol/ and /dev/<dataset>/. When set to
'visible' all zvol snapshots for the dataset will be visible.
This functionality was largely added because when automatic
snapshoting is enabled large numbers of read-only zvol snapshots
will be created. When creating these devices the kernel will
attempt to read their partition tables, and blkid will attempt
to identify any filesystems on those partitions. This leads
to a variety of issues:
1) The zvol partition tables will be read in the context of
the `modprobe zfs` for automatically imported pools. This
is undesirable and should be done asynchronously, but for
now reducing the number of visible devices helps.
2) Udev expects to be able to complete its work for a new
block devices fairly quickly. When many zvol devices are
added at the same time this is no longer be true. It can
lead to udev timeouts and missing /dev/zvol links.
3) Simply having lots of devices in /dev/ can be aukward from
a management standpoint. Hidding the devices your unlikely
to ever use helps with this. Any snapshot device which is
needed can be made visible by changing the snapdev property.
NOTE: This patch changes the default behavior for zvols which
was effectively 'snapdev=visible'.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1235Closes#945
Issue #956
Issue #756
In the interest of maintaining only one udev helper to give vdevs
user friendly names, the zpool_id and zpool_layout infrastructure
is being retired. They are superseded by vdev_id which incorporates
all the previous functionality.
Documentation for the new vdev_id(8) helper and its configuration
file, vdev_id.conf(5), can be found in their respective man pages.
Several useful example files are installed under /etc/zfs/.
/etc/zfs/vdev_id.conf.alias.example
/etc/zfs/vdev_id.conf.multipath.example
/etc/zfs/vdev_id.conf.sas_direct.example
/etc/zfs/vdev_id.conf.sas_switch.example
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#981
The -q option should quiet the mkfs.ext2 output but certain
versions of e2fsprogs appear to ignore it. This can result in
an extra 'done' message in the test output. To keep this noise
from distracting just direct stdout to /dev/null.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Test 5, 6, 7, and 7 in zconfig.sh use /bin/ as a source of random
directories and files for their test. This has lead to unexpected
tests failures because the total size of /bin/ on the test system
isn't checked and it is entirely possible for it to be larger than
the target filesystem.
To resolve this issue we create a somewhat random collection of
files and directories in /var/tmp to use. On average we expect
about 5MB of data with the worst case being 20MB. This is large
enough to be interesting and small enough to always fit in the
default test datasets.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1113
Parted version 3.0 doesn't allow us to specify the start and end
percentages as 50% and 100% respectively. This results in:
Error: The location 100% is outside the device /dev/zd0
Therefore we change the syntax to 51% and -1 for end of device.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The 'exit $?' command in the INT TERM EXIT trap was overwritting
the expected error code with the error code from mv. Fix the
issue by removing the 'exit $?'. It's important the we preserve
the original error code so failures are easily noticed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add the initial support for the 'smbshare' option using the
existing libshare infrastructure. Because this implementation
relies on usershares samba version 3.0.23 is required.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#493
The function unused_loop_device in /usr/libexec/zfs/common.sh
returns /dev/loop-control on the first call. This device is NOT
a loop device (https://github.com/torvalds/linux/commit/770fe30)
it is a control device. This in turn causes the script zconfig.sh
to fail with:
zpool-create.sh: Error 1 creating /tmp/zpool-vdev0 ->
/dev/loop-control loopback
The patch makes the function return /dev/loop[0-9]* which are
loop devices.
Signed-off-by: Andrew Reid <ColdCanuck@nailedtotheperch.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#797
Currently, mkfs.ext2 on zconfig.sh zvols tries to use a 8K blocksize,
probably because by default zvol exposes an optimal I/O size of 8K.
Unfortunately, a ext2 blocksize of 8K is not supported by the kernel,
so the resulting filesystem is unmountable.
This patch fixes the issue by making sure the blocksize is 4K. We have
to use -F to force it else mkfs.ext2 won't allow us to use a blocksize
smaller than the optimal I/O size.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#979
Remove all of the generated autotools products from the repository
and update the .gitignore files accordingly.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#718
Currently, zvols have a discard granularity set to 0, which suggests to
the upper layer that discard requests of arbirarily small size and
alignment can be made efficiently.
In practice however, ZFS does not handle unaligned discard requests
efficiently: indeed, it is unable to free a part of a block. It will
write zeros to the specified range instead, which is both useless and
inefficient (see dnode_free_range).
With this patch, zvol block devices expose volblocksize as their discard
granularity, so the upper layer is aware that it's not supposed to send
discard requests smaller than volblocksize.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#862
The end_writeback() function was changed by moving the call to
inode_sync_wait() earlier in to evict(). This effecitvely changes
the ordering of the sync but it does not impact the details of
the zfs implementation.
However, as part of this change end_writeback() was renamed to
clear_inode() to reflect the new semantics. This change does
impact us and clear_inode() now maps to end_writeback() for
kernels prior to 3.5.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#784
The vmtruncate_range() support has been removed from the kernel in
favor of using the fallocate method in the file_operations table.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #784
The export_operations member ->encode_fh() has been updated to
take both the child and parent inodes. This interface used to
take the child dentry and a bool describing if the parent is needed.
NOTE: While updating this code I noticed that we do not currently
cleanly handle the case where we're passed a connectable parent.
This code should be audited to make sure we're doing the right thing.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #784
Currently, zpool online -e (dynamic vdev expansion) doesn't work on
whole disks because we're invoking ioctl(BLKRRPART) from userspace
while ZFS still has a partition open on the disk, which results in
EBUSY.
This patch moves the BLKRRPART invocation from the zpool utility to the
module. Specifically, this is done just before opening the device in
vdev_disk_open() which is called inside vdev_reopen(). This requires
jumping through some hoops to get to the disk device from the partition
device, and to make sure we can still open the partition after the
BLKRRPART call.
Note that this new code path is triggered on dynamic vdev expansion
only; other actions, like creating a new pool, are unchanged and still
call BLKRRPART from userspace.
This change also depends on API changes which are available in 2.6.37
and latter kernels. The build system has been updated to detect this,
but there is no compatibility mode for older kernels. This means that
online expansion will NOT be available in older kernels. However, it
will still be possible to expand the vdev offline.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#808
torvalds/linux@adc0e91ab1 introduced
introduced d_make_root() as a replacement for d_alloc_root(). Further
commits appear to have removed d_alloc_root() from the Linux source
tree. This causes the following failure:
error: implicit declaration of function 'd_alloc_root'
[-Werror=implicit-function-declaration]
To correct this we update the code to use the current d_make_root()
interface for readability. Then we introduce an autotools check
to determine if d_make_root() is available. If it isn't then we
define some compatibility logic which used the older d_alloc_root()
interface.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#776
The mode argument of iops->create()/mkdir()/mknod() was changed from
an 'int' to a 'umode_t'. To prevent a compiler warning an autoconf
check was added to detect the API change and then correctly set a
zpl_umode_t typedef. There is no functional change.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#701
Allow rigorous (and expensive) tx validation to be enabled/disabled
indepentantly from the standard zfs debugging. When enabled these
checks ensure that all txs are constructed properly and that a dbuf
is never dirtied without taking the correct tx hold.
This checking is particularly helpful when adding new dmu consumers
like Lustre. However, for established consumers such as the zpl
with no known outstanding tx construction problems this is just
overhead.
--enable-debug-dmu-tx - Enable/disable validation of each tx as
--disable-debug-dmu-tx it is constructed. By default validation
is disabled due to performance concerns.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add support for the .zfs control directory. This was accomplished
by leveraging as much of the existing ZFS infrastructure as posible
and updating it for Linux as required. The bulk of the core
functionality is now all there with the following limitations.
*) The .zfs/snapshot directory automount support requires a 2.6.37
or newer kernel. The exception is RHEL6.2 which has backported
the d_automount patches.
*) Creating/destroying/renaming snapshots with mkdir/rmdir/mv
in the .zfs/snapshot directory works as expected. However,
this functionality is only available to root until zfs
delegations are finished.
* mkdir - create a snapshot
* rmdir - destroy a snapshot
* mv - rename a snapshot
The following issues are known defeciences, but we expect them to
be addressed by future commits.
*) Add automount support for kernels older the 2.6.37. This should
be possible using follow_link() which is what Linux did before.
*) Accessing the .zfs/snapshot directory via NFS is not yet possible.
The majority of the ground work for this is complete. However,
finishing this work will require resolving some lingering
integration issues with the Linux NFS kernel server.
*) The .zfs/shares directory exists but no futher smb functionality
has yet been implemented.
Contributions-by: Rohan Puri <rohan.puri15@gmail.com>
Contributiobs-by: Andrew Barnes <barnes333@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#173
Allow a source rpm to be rebuilt with debugging enabled. This
avoids the need to have to manually modify the spec file. By
default debugging is still largely disabled. To enable specific
debugging features use the following options with rpmbuild.
'--with debug' - Enables ASSERTs
# For example:
$ rpmbuild --rebuild --with debug zfs-modules-0.6.0-rc6.src.rpm
Additionally, ZFS_CONFIG has been added to zfs_config.h for
packages which build against these headers. This is critical
to ensure both zfs and the dependant package are using the same
prototype and structure definitions.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
DISCARD (REQ_DISCARD, BLKDISCARD) is useful for thin provisioning.
It allows ZVOL clients to discard (unmap, trim) block ranges from
a ZVOL, thus optimizing disk space usage by allowing a ZVOL to
shrink instead of just grow.
We can't use zfs_space() or zfs_freesp() here, since these functions
only work on regular files, not volumes. Fortunately we can use the
low-level function dmu_free_long_range() which does exactly what we
want.
Currently the discard operation is not added to the log. That's not
a big deal since losing discard requests cannot result in data
corruption. It would however result in disk space usage higher than
it should be. Thus adding log support to zvol_discard() is probably
a good idea for a future improvement.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>