Commit Graph

172 Commits

Author SHA1 Message Date
slashdd
792517389f Change /etc/mtab to /proc/self/mounts
Fix misleading error message:

 "The /dev/zfs device is missing and must be created.", if /etc/mtab is missing.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Eric Desrochers <eric.desrochers@canonical.com>
Closes #4680 
Closes #5029
2016-09-20 10:07:58 -07:00
Chunwei Chen
5b1bc1a1d8 Set proper dependency for string replacement targets
A lot of string replacement target don't have dependency or incorrect
dependency. We setup proper dependency by pattern rules.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4908
2016-08-02 10:28:29 -07:00
Brian Behlendorf
92547bc45c Systemd configuration fixes
* Disable zfs-import-scan.service by default.  This ensures that
pools will not be automatically imported unless they appear in
the cache file.  When this service is explicitly enabled pools
will be imported with the "cachefile=none" property set.  This
prevents the creation of, or update to, an existing cache file.

    $ systemctl list-unit-files | grep zfs
    zfs-import-cache.service                  enabled
    zfs-import-scan.service                   disabled
    zfs-mount.service                         enabled
    zfs-share.service                         enabled
    zfs-zed.service                           enabled
    zfs.target                                enabled

* Change services to dynamic from static by adding an [Install]
section and adding 'WantedBy' tags in favor of 'Requires' tags.
This allows for easier customization of the boot behavior.

* Start the zfs-import-cache.service after the root pivot so
the cache file is available in the standard location.

* Start the zfs-mount.service after the systemd-remount-fs.service
to ensure the root fs is writeable and the ZFS filesystems can
create their mount points.

* Change the default behavior to only load the ZFS kernel modules
in zfs-import-*.service or when blkid(8) detects a pool.  Users
who wish to unconditionally load the kernel modules must uncomment
the list of modules in /lib/modules-load.d/zfs.conf.

Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4325
Closes #4496
Closes #4658
Closes #4699
2016-05-27 11:54:29 -07:00
Manuel Amador (Rudd-O)
d402c18dd6 A collection of dracut fixes
- In older systems without sysroot.mount, import before dracut-mount,
  and re-enable old dracut mount hook
- rootflags MUST be present even if the administrator neglected to
  specify it explicitly
- Check that mount.zfs exists in sbindir
- Remove awk and head as (now unused) requirements, add grep, and
  install the right mount.zfs
- Eliminate one use of grep in Dracut
- Use a more accurate grepping statement to identify zfsutil in rootflags
- Ensure that pooldev is nonempty
- Properly handle /dev/sd* devices and more
- Use new -P to get list of zpool devices
- Bail out of the generator when zfs:AUTO is on the root command line
- Ignore errors from systemctl trying to load sysroot.mount, we only
  care about the output
- Determine which one is the correct initqueuedir at run time.
- Add a compatibility getargbool for our detection / setup script.
- Update dracut .gitignore files

Signed-off-by: <Matthew Thode mthode@mthode.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4558
Closes #4562
2016-05-12 14:31:15 -07:00
Carlo Landmeter
c53fb0113c Add support for alpine linux
Both Alpine Linux and Gentoo use OpenRC so we share its logic

Signed-off-by: Carlo Landmeter <clandmeter@gmail.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4386
2016-03-08 13:19:53 -08:00
Grischa Zengel
e79a6bacc6 Add nfs-kernel-server for Debian
Debian based systems use nfs-kernel-server as the service name.
List both nfs-server.service and nfs-kernel-server.service so
this service will work on multiple distributions.

Signed-off-by: Grischa Zengel <github.zfsonlinux@zengel.info>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4350
2016-02-25 10:19:09 -08:00
James Lee
33df62d052 zfs-import: Perform verbatim import using cache file
This change modifies the import service to use the default cache file
to perform a verbatim import of pools at boot.  This fixes code that
searches all devices and imported all visible pools.

Using the cache file is in keeping with the way ZFS has always worked,
how Solaris, Illumos, FreeBSD, and systemd performs imports, and is how
it is written in the man page (zpool(1M,8)):

    All pools  in  this  cache  are  automatically imported when the
    system boots.

Importantly, the cache contains important information for importing
multipath devices, and helps control which pools get imported in more
dynamic environments like SANs, which may have thousands of visible
and constantly changing pools, which the ZFS_POOL_EXCEPTIONS variable
is not equipped to handle.  Verbatim imports prevent rogue pools from
being automatically imported and mounted where they shouldn't be.

The change also stops the service from exporting pools at shutdown.
Exporting pools is only meant to be performed explicitly by the
administrator of the system.

The old behavior of searching and importing all visible pools is
preserved and can be switched on by heeding the warning and toggling
the ZPOOL_IMPORT_ALL_VISIBLE variable in /etc/default/zfs.

Signed-off-by: James Lee <jlee@thestaticvoid.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3777
Closes #3526
2015-10-13 10:37:05 -07:00
Turbo Fredriksson
8f90f7372a Rename 'zed.service' to 'zfs-zed.service'
For consistency all systemd unit files and init scripts now share
the same names.  This prevents an issue where the zed is started
twice on systems where both the systemd and sysv infrastructure is
installed concurrently.

For backward compatibility a 'zed' alias has been added.  This
allows the user to interact with the service using either the
name 'zed' or 'zfs-zed'.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3837
2015-10-02 17:33:32 -04:00
Turbo Fredriksson
57732964d3 Init script fixes
* Fix regression - "OVERLAY_MOUNTS" should have been "DO_OVERLAY_MOUNTS".
* Fix update-rc.d commands in postinst.  Thanx to subzero79@GitHub.
* Fix make sure a filesystem exists before trying to mount in mount_fs()
* Fix local variable usage.
* Fix to read_mtab():
  * Strip control characters (space - \040) from /proc/mounts GLOBALY,
    not just first occurrence.
  * Don't replace unprintable characters ([/-. ]) for use in the variable
    name with underscore. No need, just remove them all together.
* Add check_boolean() to check if a user configure option is
  set ('yes', 'Yes', 'YES' or any combination there of) OR '1'.
  Anything else is considered 'unset'.
* Add a ZFS_POOL_IMPORT to the default config.
  * This is a semi colon separated list of pools to import ONLY.
  * This is intended for systems which have _a lot_ of pools (from
    a SAN for example) and it would be to many to put in the
    ZFS_POOL_EXCEPTIONS variable..
* Add a config option "ZPOOL_IMPORT_OPTS" for adding additional options
  to "zpool import".
* Add documentation and the chance of overriding the ZPOOL_CACHE
  variable in the config file.
* Remove "sort" from find_pools() and setup_snapshot_booting().
  Sometimes not available, and not really necessary.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Issue #3816
2015-09-29 11:42:24 -07:00
yuina822
4a4809faab Fixed --signal typo
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3773
2015-09-22 16:04:44 -07:00
yuina822
e2ede4721b Add extra_started_commands because reload function is not default
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3773
2015-09-22 16:03:50 -07:00
SenH
1e17e910ea Force create /run/sendsigs.omit.d link when starting zed
Resolve the following error when restarting the zed by force creating
the /run/sendsigs.omit.d/zed link.

sudo /etc/init.d/zfs-zed restart
 * Stopping ZFS Event Daemon            [ OK ]
 * Starting ZFS Event Daemon
 ln: failed to create symbolic link `/run/sendsigs.omit.d/zed': File exists

Signed-off-by: SenH <sen@senhaerens.be>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3747
2015-09-08 09:45:34 -07:00
James Lee
3f1cc17c90 Reorder zfs-* services to allow /var on separate dataset
ZED depends on /var.  When /var is a separate dataset, it must be
mounted before starting ZED.  This change moves the zfs-zed service
from starting first, to starting after zfs-mount, but before zfs-share.

As discussed in issue #3513, ZED does not need to start first in order
to consume events made during the zfs-import and zfs-mount services.
The events will be queued and can be handled later in the boot process.

ZED may, however, handle sharing in the future, so it should be started
before the zfs-share service.

This commit also stops the zfs-import service from writing temp files
to /var/tmp on shutdown and it corrects the return code for the OpenRC
service.

Other OpenRC-specific changes noted in issue #3513 were reitereated in
issue #3715 and committed in da619f3.

Signed-off-by: James Lee <jlee@thestaticvoid.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3513
2015-09-02 09:16:39 -07:00
Richard Yao
da619f3a19 Some OpenRC dependency logic belongs in mount
The dependencies for handling / on ZFS belong in the mount script, not
the zed script.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3715
2015-08-30 10:06:59 -07:00
Turbo Fredriksson
48511ea645 Fix some minor issues with the SYSV init and initramfs scripts.
This is some minor fixes to commits 2cac7f5f11
and 2a34db1bdb.

* Make sure to alien'ate the new initramfs rpm package as well!
  The rpm package is build correctly, but alien isn't run on it to
  create the deb.
* Before copying file from COPY_FILE_LIST, make sure the DESTDIR/dir exists.
* Include /lib/udev/vdev_id file in the initrd.
* Because the initrd needs to use '/sbin/modprobe' instead of 'modprobe',
  we need to use this in load_module() as well.
  * Make sure that load_module() can be used more globaly, instead of
    calling '/sbin/modprobe' all over the place.
  * Make sure that check_module_loaded() have a parameter - module to
    check.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3626
2015-07-24 15:05:33 -07:00
Turbo Fredriksson
47a4a6fd5f Support parallel build trees (VPATH builds)
Build products from an out of tree build should be written
relative to the build directory.  Sources should be referred
to by their locations in the source directory.

This is accomplished by adding the 'src' and 'obj' variables
for the module Makefile.am, using relative paths to reference
source files, and by setting VPATH when source files are not
co-located with the Makefile.  This enables the following:

  $ mkdir build
  $ cd build
  $ ../configure \
    --with-spl=$HOME/src/git/spl/ \
    --with-spl-obj=$HOME/src/git/spl/build
  $ make -s

This change also has the advantage of resolving the following
warning which is generated by modern versions of automake.

  Makefile.am:00: warning: source file 'xxx' is in a subdirectory,
  Makefile.am:00: but option 'subdir-objects' is disabled

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1082
2015-07-17 13:42:51 -07:00
Turbo Fredriksson
d6c9ff0a6b Add /dev/mapper to the list of possible sources for pool devices.
This is especially needed when using LUKS backed pools.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3536
2015-06-29 12:32:05 -07:00
Turbo Fredriksson
16421a1dc8 Additional SYSV init script fixes (3).
* In read_mtab(), fix problems (!?) in the mounts file. It will record
  'rpool 1' as 'rpool\0401' instead of 'rpool\00401' which seems to be the
  correct (at least as far as 'printf' is concerned). Use this using the
  external 'echo' command (and not the one built in to the shell) because
  the internal one would interpret the backslash code (incorrectly), giving
  us a  instead.
* Remove reregister_mounts() - no longer needed.
* For Gentoo, the zfs_log_failure_msg() should use eend(), not eerror()
  (which requires an error message, which we don't have).

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3488
Closes #3509
Closes #3514
2015-06-25 11:56:47 -07:00
Turbo Fredriksson
216f9d04a6 Revert "Additional SYSV init script fixes."
This reverts commit 036391c980.

Because #3509 came just after this commit was accepted and is related
to the original problem the commit was supposed to fix, we need to
solve the problem in another way.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-06-25 11:56:09 -07:00
Turbo Fredriksson
036391c980 Additional SYSV init script fixes.
Use the 'mount' command instead of /proc/mounts to get a list of matching
filesystems.

This because /proc/mounts reports a pool with a space 'rpool 1' as
'rpool\0401'. The space is encoded as 3-digit octal which is legal.
However 'printf "%b"', which we use to filter out other illegal
characters (such as slash, space etc) can't properly interpret this
because it expects 4-digit octal. We get a  instead of the space
we expected. The correct value should have been 'rpool\00401' (note
the additional leading zero).

So use 'mount', which interprets all backslash-escapes correctly,
instead.

Signed-off-by: Turbo Fredriksson turbo@bayour.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3488
2015-06-17 13:30:03 -07:00
Turbo Fredriksson
4f38c25910 SYSV init script fixes.
* Change the order of the function library check/load.
  Redhat based system _can_ have a /lib/lsb/init-functions file (from
  the redhat-lsb-core package), but it's only partially what we can use.
  Instead, look for that file last, giving the script a chance to catch
  the 'real' distribution file.
* Filter out dashes and dots in dataset name in read_mtab().
* Get rid of 'awk' entirely. This is usually in /usr, which might not
  be availible.
* Get rid of the 'find /dev/disk/by-*' (find is on /usr, which might not
  be availible). Instead use echo in a for loop.
* Rebuild scripts if any of the *.in files changed.
* Move the sed part that filters out duplicates inside the check fo
  valid variable.

Signed-off-by: Turbo Fredriksson turbo@bayour.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3463
Closes #3457
2015-06-05 12:35:39 -07:00
Turbo Fredriksson
2a34db1bdb Base init scripts for SYSV systems
* Based on the init scripts included with Debian GNU/Linux, then take code
  from the already existing ones, trying to merge them into one set of
  scripts that will work for 'everyone' for better maintainability.
  * Add configurable variables to control the workings of the init scripts:
    * ZFS_INITRD_PRE_MOUNTROOT_SLEEP
      Set a sleep time before we load the module (used primarily by initrd
      scripts to allow for slower media (such as  USB devices etc) to be
      availible before we load the zfs module).
    * ZFS_INITRD_POST_MODPROBE_SLEEP
      Set a timed sleep in the initrd to after the load of the zfs module.
    * ZFS_INITRD_ADDITIONAL_DATASETS
      To allow for mounting additional datasets in the initrd. Primarily used
      in initrd scripts to allow for when filesystem needed to boot (such as
      /usr, /opt, /var etc) isn't directly under the root dataset.
    * ZFS_POOL_EXCEPTIONS
      Exclude pools from being imported (in the initrd and/or init scripts).
    * ZFS_DKMS_ENABLE_DEBUG, ZFS_DKMS_ENABLE_DEBUG_DMU_TX, ZFS_DKMS_DISABLE_STRIP
      Set to control how dkms should build the dkms packages.
    * ZPOOL_IMPORT_PATH
      Set path(s) where "zpool import" should import pools from.
      This was previously the job of "USE_DISK_BY_ID" (which is still used
      for backwards compatibility) but was renamed to allow for better
      control of import path(s).
      * If old USE_DISK_BY_ID is set, but not new ZPOOL_IMPORT_PATH, then we
        set ZPOOL_IMPORT_PATH to sane defaults just to be on the safe side.
    * ZED_ARGS
      To allow for local options to zed without having to change the init script.
  * The import function, do_import(), imports pools by name instead of '-a'
    for better control of pools to import and from where.
    * If USE_DISK_BY_ID is set (for backwards compatibility), but isn't 'yes'
      then ignore it.
    * If pool(s) isn't found with a simple "zpool import" (seen it happen),
      try looking for them in /dev/disk/by-id (if it exists). Any duplicates
      (pools found with both commands) is filtered out.
      * IF we have found extra pool(s) this way, we must force USE_DISK_BY_ID
        so that the first, simple "zpool import $pool" is able to find it.
    * Fallback on importing the pool using the cache file (if it exists) only
      if 'simple' import (either with ZPOOL_IMPORT_PATH or the 'built in'
      defaults) didn't work.
  * The export function, do_export(), will export all pools imported, EXCEPT
    the root pool (if there is one).
  * ZED script from the Debian GNU/Linux packages added.
    * Refreshed ZED init script from behlendorf@5e7a660 to be portable so it
      may be used on both LSB and Redhat style systems.
    * If there is no pool(s) imported and zed successfully shut down, we will
      unload the zfs modules.
  * The function library file for the ZoL init script is installed as
    /etc/init.d/zfs-functions.
  * The four init scripts, the /etc/{defaults,sysconfig,conf.d}/zfs config file
    as well as the common function library is tagged as '%config(noreplace)' in
    the rpm rules file to make sure they are not replaced automatically if locally
    modifed.
  * Pitfals and workarounds:
    * If we're running from init, remove stale /etc/dfs/sharetab before importing
      pools in the zfs-import init script.
    * On Debian GNU/Linux, there's a 'sendsigs' script that will kill basically
      everything quite early in the shutdown phase and zed is/should be stopped
      much later than that. We don't want zed to be among the ones killed, so add
      the zed pid to list of pids for 'sendsigs' to ignore.
    * CentOS uses echo_success() and echo_failure() to print out status of
      command. These in turn uses "echo -n \0xx[etc]" to move cursor and choose
      colour etc. This doesn't work with the modified IFS variable we need to
      use in zfs-import for some reason, so work around that when we define
      zfs_log_{end,failure}_msg() for RedHat and derivative distributions.
  * All scripts passes ShellCheck (with one false positive in do_mount()).

Signed-off-by: Turbo Fredriksson turbo@bayour.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Chris Dunlap <cdunlap@llnl.gov>
Closes #2974
Closes #2107
2015-05-28 14:14:53 -07:00
Brian Behlendorf
544f7184f8 Use ExecStartPre to load zfs modules
Commit 87abfcb broke the systemd import service by treating the
ExecStart line as if it were a shell command that could be executed.
This isn't the way systemd works and the correct way to handle this
case is with ExecStartPre.  This patch updates the zfs import service
files accordingly,

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu>
Closes #3440
2015-05-26 16:18:50 -07:00
Brian Behlendorf
87abfcba22 Wait in libzfs_init() for the /dev/zfs device
While module loading itself is synchronous the creation of the /dev/zfs
device is not.  This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs.  This small window between module loading and device creation
can result in spurious failures of libzfs_init().

This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created.  This allows scripts to reliably use the following
shell construct without the need for additional error handling.

$ /sbin/modprobe zfs && /sbin/zpool import -a

To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed.  The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently.  If it
takes longer than this it will fall back to polling for up to 10 seconds.

This behavior can be customized to some degree by setting the following
new environment variables.  This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior.  By default module auto-loading is now disabled.

* ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
* ZFS_MODULE_TIMEOUT="<seconds>"     - Seconds to wait for /dev/zfs

The zfs-import-* systemd service files have been updated to call
'/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
behavior.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #2556
2015-05-22 13:31:58 -07:00
DHE
9012354bf0 Rebuild init scripts on source file updates
The resulting script is not removed by 'make clean' or rebuilt
when the source files are changed. Users with long standing git
trees may find their init script is out of date.

Signed-off-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3273
2015-04-14 13:26:49 -07:00
Hajo Möller
6184b3a6a0 Actually source /etc/sysconfig/zfs instead of /etc/default/zfs
Signed-off-by: Hajo M<C3><B6>ller <dasjoe@users.noreply.github.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3162
2015-03-09 17:13:04 -07:00
Chris Dunlap
0e86d309cc Add ZED to zfs.redhat.in script
This commit updates the zfs.redhat.in script to start/stop ZED.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3153
2015-03-05 14:07:04 -08:00
Brian Behlendorf
a7b9d0c3a0 Replace zfs.redhat.in with zfs.lsb.in init script
This commit replaces the zfs.redhat.in init script with a slightly
modified version of the existing zfs.lsb.in init script.  This was
done to minimize the functional differences between platforms.
The lsb version of the script was choosen because it's heavily
tested and provides the most functionality.

Changes made for RHEL systems:
* Configuration: /etc/default/zfs -> /etc/sysconfig/zfs
* LSB functions: /lib/lsb/init-functions -> /etc/rc.d/init.d/functions
* Logging: log_begin_msg/log_end_msg -> action

Features in LSB which are now in RHEL:
* USE_DISK_BY_ID=0      - Use the by-id names
* VERBOSE_MOUNT=0       - Verbose mounts by default
* DO_OVERLAY_MOUNTS=0   - Overlay mounts by default
* MOUNT_EXTRA_OPTIONS=0 - Generic extra options

Existing RHEL features which were removed:
* Automatically mounting FSs on ZVOLs listed in /etc/fstab

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3153
2015-03-04 11:33:07 -08:00
Dan Swartzendruber
1b95fd5d70 Improve systemd script to not leave stale sharetab
The systemd script zfs-share.service does 'zfs share -a' to share
any required datasets.  Unfortunately, /etc/dfs/sharetab is stale
from the previous boot.  Delete it before we share.

Signed-off-by: Dan Swartzendruber <dswartz@druber.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2883
2014-12-18 09:54:56 -08:00
Dan Swartzendruber
80c50365c2 Fix systemd config for zfs-share.service
The zfs-share.service rule needs to be modified to ensure that it
does not execute before zfs-mount.service.

Signed-off-by: Dan Swartzendruber <dswartz@druber.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ralf Ertzinger <ralf@skytale.net>
Closes #2893
2014-11-19 10:33:07 -08:00
alteriks
4f6a14798d Import zfs pools after cryptsetup
The zfs-import-cache.service and zfs-import-scan.service should
should be started after cryptsetup to ensure all LUKS devices have
been opened.

Signed-off-by: alteriks <alteriks@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1474
2014-09-04 09:50:45 -07:00
Ralf Ertzinger
76c3a61642 Change startup mode of ZED
Change the startup mode of ZED to non-forking. While systemd can
track processes that detach from the terminal just fine, running
processes in non-forking mode is the preferred mode of operation.

Also remove user/group definitions as root/root is the default.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2252
2014-09-02 14:18:53 -07:00
Derek Dai
7a870db1b9 Do not export pool to prevent cache from been removed
Signed-off-by: Derek Dai <daiderek@gmail.com>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2353
2014-06-05 13:49:15 -07:00
Brian Behlendorf
51268f31a8 Remove SELinux enforcing check from init scripts
The default SELinux policy for RHEL and Fedora has been updated
to include ZFS in the list of filesystems which support xattrs.
Therefore, there's no longer a need to detect this in the init
scripts.

References:
  https://bugzilla.redhat.com/show_bug.cgi?id=811532
  https://bugzilla.redhat.com/show_bug.cgi?id=816543

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2166
2014-05-02 11:37:46 -07:00
Turbo Fredriksson
b79e1f1f27 Allow specifying '-o <opts>' in defaults/init script.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2103
2014-04-04 09:49:09 -07:00
Turbo Fredriksson
e37212f9a2 Support using overlay mounts in defaults/init script.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2103
2014-04-04 09:48:25 -07:00
Chris Dunlap
11a7043324 Add systemd unit file for zed
This commit adds a systemd unit file for zed.service and integrates
it into the zfs.target from commit 881f45c.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2108
Issue #2
2014-04-02 13:10:08 -07:00
Richard Yao
b42b812efb Inform OpenRC that ZFS uses mtab
p_l in #zfsonlinux reported that he had issues mounting filesystems that
were resolved by adding rc_need="mtab" to /etc/init.d/zfs. Closer
inspection revealed that we do have a race, but it is not clear how this
race caused mounting to fail. What is clear is that this race should be
fixed, so lets add the proper `use mtab` line to handle it.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2148
2014-03-04 11:54:44 -08:00
Ralf Ertzinger
881f45c6a8 Add systemd unit files for ZFS startup
This adds systemd unit files replacing the functionality offered by
the SysV init script found in etc/init.d.

It has been developed and tested on Fedora 19, Fedora 20
and openSuSE 13.1.

Four unit files and one target are offered.

zfs-import-cache.service:
    Import pools from /etc/zfs/zpool.cache. This unit will wait for
    udev to settle.
zfs-import-scan.service:
    Import pools by scanning /dev/disk/by-id for zvols. This unit will
    only run if /etc/zfs/zpool.cache is not present. This unit will wait
    for udev to settle
zfs-mount.service:
    Mount ZFS native filesystems. It contains a dependency to be loaded
    before local-fs.target.
zfs-share.service:
    Share NFS/SMB filesystems. This unit contains a dependency that
    will cause it to be restarted whenever the smb or nfs-server unit
    is restarted, restoring the shares added.
zfs.target:
    This target pulls in the other units in order to start ZFS. It's
    the only unit that can be enabled/disabled, all other services
    are static and pulled in by dependencies. It will honour zfs=off
    and zfs=no options on the kernel command line.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2108
2014-02-05 12:25:30 -08:00
Ned Bass
09d0b30fd1 vdev_id: support per-channel slot mappings
The vdev_id udev helper currently applies slot renumbering rules to
every channel (JBOD) in the system.  This is too inflexible for systems
with non-homogeneous storage topologies.  The "slot" keyword now takes
an optional third parameter which names a channel to which the mapping
will apply.  If the third parameter is omitted then the rule applies to
all channels.  The first-specified rule that can match a slot takes
precedence.  Therefore a channel-specific rule for a given slot should
generally appear before a generic rule for the same slot number.  In
this way a custom slot mapping can be applied to a particular channel
and a default mapping applied to the rest.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2056
2014-01-17 11:17:54 -08:00
Turbo Fredriksson
8c091798f2 Add UNSHARING of filesystems and EXPORTING pools
As a 'stop' action ensure the filesystem is unshared before
it is unmounted, just in case.  Additionally, export the pool
so it may be cleanly imported by a different host.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2003
2014-01-07 09:48:04 -08:00
Turbo Fredriksson
c1ab64d393 Update init script to allow verbose mounts
Allow verbose mounts to make is easier to monitor progress when
mounting a large number of filesystems.

This functionality is disabled by default.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1929
2013-12-06 10:59:35 -08:00
Turbo Fredriksson
fc220e9ea5 Update init script to allow /dev/disk/by-id import
Many people prefer to use by-id at import time instead of using
the cache file.  This can be a much better solution than the cache
file in some environments so we're adding some infrastructure to
allow it.

This functionality is disabled by default.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1929
2013-12-06 10:59:09 -08:00
Matthew Thode
760ec997df Updating init scripts to have more robust grepping
The previous pattern could accidentally match on things like
'real_root=ZFS=node02-zp00/ROOT/rootfs' due to the 'ZFS=no'
substring.

Signed-off-by: Matthew Thode <mthode@mthode.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1837
2013-11-08 10:55:20 -08:00
Richard Yao
9eaf0832ad Improve OpenRC init script
The current zfs OpenRC script's dependencies cause OpenRC to attempt to
unmount ZFS filesystems at shutdown while things were still using them,
which would fail. This is a cosmetic issue, but it should still be
addressed. It probably does not affect systems where the rootfs is a
legacy filesystem, but any system with the rootfs on ZFS needs to run
the ZFS init script after the system is ready to shutdown filesystems.

OpenRC's shutdown process occurs in the reverse order of the startup
process. Therefore running the ZFS shutdown procedure after filesystems
are ready to be unmounted requires running the startup procedure before
fstab. This patch changes the dependencies of the script to expliclty
run before fstab at boot when the rootfs is ZFS and to run after fstab
at boot whenever the rootfs is not ZFS. This should cover most use
cases.

The only cases not covered well by this are systems with legacy
root filesystems where people want to configure fstab to mount a non-ZFS
filesystem off a zvol and possibly also systems whose pools are stored
on network block devices. The former requires that the ZFS script run
before fstab, which could cause ZFS datasets to mount too early and
appear under the fstab mount points. The latter requires that the ZFS
script run after networking starts, which precludes the ability to store
any system information on ZFS. An additional OpenRC script could be
written to handle non-root pools on network block devices, but that will
depend on user demand and developer time.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1479
2013-06-18 17:03:25 -07:00
Turbo Fredriksson
382c4e5184 Possibility to disable (not start) zfs at bootup.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1402
2013-04-24 16:18:44 -07:00
Brian Behlendorf
0da31cd6ca Remove ARCH packaging
The kernel modules are now available in the Arch User Repository
(AUR) via zfs.  Since their packaging is maintained and superior
to ours it is being removed from the tree.

  https://wiki.archlinux.org/index.php/ZFS

Now that various distributions are picking up the packages we
should eventually be able to remove most of this infrastructure.
Packaging belongs with the distributions not upstream.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-03-06 15:46:41 -08:00
Brian Behlendorf
dbf763b39b Retire zpool_id infrastructure
In the interest of maintaining only one udev helper to give vdevs
user friendly names, the zpool_id and zpool_layout infrastructure
is being retired.  They are superseded by vdev_id which incorporates
all the previous functionality.

Documentation for the new vdev_id(8) helper and its configuration
file, vdev_id.conf(5), can be found in their respective man pages.
Several useful example files are installed under /etc/zfs/.

  /etc/zfs/vdev_id.conf.alias.example
  /etc/zfs/vdev_id.conf.multipath.example
  /etc/zfs/vdev_id.conf.sas_direct.example
  /etc/zfs/vdev_id.conf.sas_switch.example

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #981
2013-01-29 12:23:17 -08:00
Ned Bass
2957f38d78 vdev_id support for device link aliases
Add a vdev_id feature to map device names based on already defined
udev device links.  To increase the odds that vdev_id will run after
the rules it depends on, increase the vdev.rules rule number from 60
to 69.  With this change, vdev_id now provides functionality analogous
to zpool_id and zpool_layout, paving the way to retire those tools.

A defined alias takes precedence over a topology-derived name, but the
two naming methods can otherwise coexist. For example, one might name
drives in a JBOD with the sas_direct topology while naming an internal
L2ARC device with an alias.

For example, the following lines in vdev_id.conf will result in the
creation of links /dev/disk/by-vdev/{d1,d2}, each pointing to the same
target as the device link specified in the third field.

  #     by-vdev
  #     name     fully qualified or base name of device link
  alias d1       /dev/disk/by-id/wwn-0x5000c5002de3b9ca
  alias d2       wwn-0x5000c5002def789e

Also perform some minor vdev_id cleanup, such as removal of the unused
-s command line option.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #981
2012-12-03 14:04:47 -08:00
Brian Behlendorf
ca8b5af89d Remove autotools products
Remove all of the generated autotools products from the repository
and update the .gitignore files accordingly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #718
2012-08-27 11:47:44 -07:00
Etienne Dechamps
ee5fd0bb80 Set zvol discard_granularity to the volblocksize.
Currently, zvols have a discard granularity set to 0, which suggests to
the upper layer that discard requests of arbirarily small size and
alignment can be made efficiently.

In practice however, ZFS does not handle unaligned discard requests
efficiently: indeed, it is unable to free a part of a block. It will
write zeros to the specified range instead, which is both useless and
inefficient (see dnode_free_range).

With this patch, zvol block devices expose volblocksize as their discard
granularity, so the upper layer is aware that it's not supposed to send
discard requests smaller than volblocksize.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #862
2012-08-07 14:55:31 -07:00
Richard Yao
739a1a82e0 Linux 3.5 compat, end_writeback() changed to clear_inode()
The end_writeback() function was changed by moving the call to
inode_sync_wait() earlier in to evict().   This effecitvely changes
the ordering of the sync but it does not impact the details of
the zfs implementation.

However, as part of this change end_writeback() was renamed to
clear_inode() to reflect the new semantics.  This change does
impact us and clear_inode() now maps to end_writeback() for
kernels prior to 3.5.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #784
2012-07-23 12:29:36 -07:00
Richard Yao
ea1fdf46e2 Linux 3.5 compat, iops->truncate_range() removed
The vmtruncate_range() support has been removed from the kernel in
favor of using the fallocate method in the file_operations table.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #784
2012-07-23 12:29:32 -07:00
Richard Yao
756c3e5a9c Linux 3.5 compat, eops->encode_fh() takes inodes
The export_operations member ->encode_fh() has been updated to
take both the child and parent inodes.  This interface used to
take the child dentry and a bool describing if the parent is needed.

NOTE: While updating this code I noticed that we do not currently
cleanly handle the case where we're passed a connectable parent.
This code should be audited to make sure we're doing the right thing.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #784
2012-07-23 12:29:23 -07:00
Etienne Dechamps
b5a28807cd Move partition scanning from userspace to module.
Currently, zpool online -e (dynamic vdev expansion) doesn't work on
whole disks because we're invoking ioctl(BLKRRPART) from userspace
while ZFS still has a partition open on the disk, which results in
EBUSY.

This patch moves the BLKRRPART invocation from the zpool utility to the
module. Specifically, this is done just before opening the device in
vdev_disk_open() which is called inside vdev_reopen(). This requires
jumping through some hoops to get to the disk device from the partition
device, and to make sure we can still open the partition after the
BLKRRPART call.

Note that this new code path is triggered on dynamic vdev expansion
only; other actions, like creating a new pool, are unchanged and still
call BLKRRPART from userspace.

This change also depends on API changes which are available in 2.6.37
and latter kernels.  The build system has been updated to detect this,
but there is no compatibility mode for older kernels.  This means that
online expansion will NOT be available in older kernels.  However, it
will still be possible to expand the vdev offline.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #808
2012-07-17 09:17:31 -07:00
Richard Yao
ba9b5428fd Relicense zfs.gentoo.in from GPLv2 to 2-clause BSD
As the Gentoo sys-fs/zfs maintainer, I receive license compatibility
questions and at times, those questions can be harassing. I feel that
the presence of the GPL in Gentoo's package metadata promotes such
questions.  zfs.gentoo.in is the only GPLv2 licensed file in ZFS, so I
have taken the liberty of contacting all contributors to this file to
request permission to relicense it.

All of the contributors to this file have agreed to relicense it under
the 2-clause BSD license. I have added their Signed-offs to this commit,
in order of first contribution. Thank you everyone for being so
understanding.

Signed-off-by: devsk <devsku@gmail.com>
Signed-off-by: Alexey Shvetsov <alexxy@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Tselischev <andrewtselischev@gmail.com>
Signed-off-by: Zachary Bedell <zac@thebedells.org>
Signed-off-by: Gunnar Beutner <gunnar@beutner.name>
Signed-off-by: Kyle Fuller <inbox@kylefuller.co.uk>
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Closes #819
2012-07-10 15:00:16 -07:00
Richard Yao
6a0936babc Linux 3.4 compat, d_make_root() replaces d_alloc_root()
torvalds/linux@adc0e91ab1 introduced
introduced d_make_root() as a replacement for d_alloc_root(). Further
commits appear to have removed d_alloc_root() from the Linux source
tree. This causes the following failure:

  error: implicit declaration of function 'd_alloc_root'
  [-Werror=implicit-function-declaration]

To correct this we update the code to use the current d_make_root()
interface for readability.  Then we introduce an autotools check
to determine if d_make_root() is available.  If it isn't then we
define some compatibility logic which used the older d_alloc_root()
interface.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #776
2012-06-11 10:04:49 -07:00
Ned A. Bass
821b683436 Add vdev_id for JBOD-friendly udev aliases
vdev_id parses the file /etc/zfs/vdev_id.conf to map a physical path
in a storage topology to a channel name.  The channel name is combined
with a disk enclosure slot number to create an alias that reflects the
physical location of the drive.  This is particularly helpful when it
comes to tasks like replacing failed drives.  Slot numbers may also be
re-mapped in case the default numbering is unsatisfactory.  The drive
aliases will be created as symbolic links in /dev/disk/by-vdev.

The only currently supported topologies are sas_direct and sas_switch:

o  sas_direct - a channel is uniquely identified by a PCI slot and a
   HBA port

o  sas_switch - a channel is uniquely identified by a SAS switch port

A multipath mode is supported in which dm-mpath devices are handled by
examining the first running component disk, as reported by 'multipath
-l'.  In multipath mode the configuration file should contain a
channel definition with the same name for each path to a given
enclosure.

vdev_id can replace the existing zpool_id script on systems where the
storage topology conforms to sas_direct or sas_switch.  The script
could be extended to support other topologies as well.  The advantage
of vdev_id is that it is driven by a single static input file that can
be shared across multiple nodes having a common storage toplogy.
zpool_id, on the other hand, requires a unique /etc/zfs/zdev.conf per
node and a separate slot-mapping file.  However, zpool_id provides the
flexibility of using any device names that show up in
/dev/disk/by-path, so it may still be needed on some systems.

vdev_id's functionality subsumes that of the sas_switch_id script, and
it is unlikely that anyone is using it, so sas_switch_id is removed.

Finally, /dev/disk/by-vdev is added to the list of directories that
'zpool import' will scan.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #713
2012-06-01 08:55:14 -07:00
Brian Behlendorf
b39d3b9f7b Linux 3.3 compat, iops->create()/mkdir()/mknod()
The mode argument of iops->create()/mkdir()/mknod() was changed from
an 'int' to a 'umode_t'.  To prevent a compiler warning an autoconf
check was added to detect the API change and then correctly set a
zpl_umode_t typedef.  There is no functional change.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #701
2012-04-30 12:52:38 -07:00
Richard Yao
2ce9d0ec61 Make Gentoo initscript use modinfo
The -l parameter to modprobe has been removed from the latest upstream
code and this change has entered Gentoo. Using modinfo as a substitute
addresses this.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #636
2012-04-03 10:37:18 -07:00
Brian Behlendorf
1c5de20ae2 Add --enable-debug-dmu-tx configure option
Allow rigorous (and expensive) tx validation to be enabled/disabled
indepentantly from the standard zfs debugging.  When enabled these
checks ensure that all txs are constructed properly and that a dbuf
is never dirtied without taking the correct tx hold.

This checking is particularly helpful when adding new dmu consumers
like Lustre.  However, for established consumers such as the zpl
with no known outstanding tx construction problems this is just
overhead.

--enable-debug-dmu-tx  - Enable/disable validation of each tx as
--disable-debug-dmu-tx   it is constructed.  By default validation
                         is disabled due to performance concerns.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-03-23 12:25:17 -07:00
Brian Behlendorf
ebe7e575ea Add .zfs control directory
Add support for the .zfs control directory.  This was accomplished
by leveraging as much of the existing ZFS infrastructure as posible
and updating it for Linux as required.  The bulk of the core
functionality is now all there with the following limitations.

*) The .zfs/snapshot directory automount support requires a 2.6.37
   or newer kernel.  The exception is RHEL6.2 which has backported
   the d_automount patches.

*) Creating/destroying/renaming snapshots with mkdir/rmdir/mv
   in the .zfs/snapshot directory works as expected.  However,
   this functionality is only available to root until zfs
   delegations are finished.

      * mkdir - create a snapshot
      * rmdir - destroy a snapshot
      * mv    - rename a snapshot

The following issues are known defeciences, but we expect them to
be addressed by future commits.

*) Add automount support for kernels older the 2.6.37.  This should
   be possible using follow_link() which is what Linux did before.

*) Accessing the .zfs/snapshot directory via NFS is not yet possible.
   The majority of the ground work for this is complete.  However,
   finishing this work will require resolving some lingering
   integration issues with the Linux NFS kernel server.

*) The .zfs/shares directory exists but no futher smb functionality
   has yet been implemented.

Contributions-by: Rohan Puri <rohan.puri15@gmail.com>
Contributiobs-by: Andrew Barnes <barnes333@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #173
2012-03-22 13:03:47 -07:00
Brian Behlendorf
4b787d75c8 Cleanly support debug packages
Allow a source rpm to be rebuilt with debugging enabled.  This
avoids the need to have to manually modify the spec file.  By
default debugging is still largely disabled.  To enable specific
debugging features use the following options with rpmbuild.

  '--with debug'               - Enables ASSERTs

  # For example:
  $ rpmbuild --rebuild --with debug zfs-modules-0.6.0-rc6.src.rpm

Additionally, ZFS_CONFIG has been added to zfs_config.h for
packages which build against these headers.  This is critical
to ensure both zfs and the dependant package are using the same
prototype and structure definitions.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-27 14:08:17 -08:00
Etienne Dechamps
30930fba21 Add support for DISCARD to ZVOLs.
DISCARD (REQ_DISCARD, BLKDISCARD) is useful for thin provisioning.
It allows ZVOL clients to discard (unmap, trim) block ranges from
a ZVOL, thus optimizing disk space usage by allowing a ZVOL to
shrink instead of just grow.

We can't use zfs_space() or zfs_freesp() here, since these functions
only work on regular files, not volumes. Fortunately we can use the
low-level function dmu_free_long_range() which does exactly what we
want.

Currently the discard operation is not added to the log. That's not
a big deal since losing discard requests cannot result in data
corruption. It would however result in disk space usage higher than
it should be. Thus adding log support to zvol_discard() is probably
a good idea for a future improvement.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-09 16:19:38 -08:00
Etienne Dechamps
cb2d19010d Support the fallocate() file operation.
Currently only the (FALLOC_FL_PUNCH_HOLE) flag combination is
supported, since it's the only one that matches the behavior of
zfs_space(). This makes it pretty much useless in its current
form, but it's a start.

To support other flag combinations we would need to modify
zfs_space() to make it more flexible, or emulate the desired
functionality in zpl_fallocate().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #334
2012-02-09 16:19:32 -08:00
Etienne Dechamps
34037afe24 Improve ZVOL queue behavior.
The Linux block device queue subsystem exposes a number of configurable
settings described in Linux block/blk-settings.c. The defaults for these
settings are tuned for hard drives, and are not optimized for ZVOLs. Proper
configuration of these options would allow upper layers (I/O scheduler) to
take better decisions about write merging and ordering.

Detailed rationale:

 - max_hw_sectors is set to unlimited (UINT_MAX). zvol_write() is able to
   handle writes of any size, so there's no reason to impose a limit. Let the
   upper layer decide.

 - max_segments and max_segment_size are set to unlimited. zvol_write() will
   copy the requests' contents into a dbuf anyway, so the number and size of
   the segments are irrelevant. Let the upper layer decide.

 - physical_block_size and io_opt are set to the ZVOL's block size. This
   has the potential to somewhat alleviate issue #361 for ZVOLs, by warning
   the upper layers that writes smaller than the volume's block size will be
   slow.

 - The NONROT flag is set to indicate this isn't a rotational device.
   Although the backing zpool might be composed of rotational devices, the
   resulting ZVOL often doesn't exhibit the same behavior due to the COW
   mechanisms used by ZFS. Setting this flag will prevent upper layers from
   making useless decisions (such as reordering writes) based on incorrect
   assumptions about the behavior of the ZVOL.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-07 16:23:06 -08:00
Etienne Dechamps
b18019d2d8 Fix synchronicity for ZVOLs.
zvol_write() assumes that the write request must be written to stable storage
if rq_is_sync() is true. Unfortunately, this assumption is incorrect. Indeed,
"sync" does *not* mean what we think it means in the context of the Linux
block layer. This is well explained in linux/fs.h:

    WRITE:       A normal async write. Device will be plugged.
    WRITE_SYNC:  Synchronous write. Identical to WRITE, but passes down
                 the hint that someone will be waiting on this IO
                 shortly.
    WRITE_FLUSH: Like WRITE_SYNC but with preceding cache flush.
    WRITE_FUA:   Like WRITE_SYNC but data is guaranteed to be on
                 non-volatile media on completion.

In other words, SYNC does not *mean* that the write must be on stable storage
on completion. It just means that someone is waiting on us to complete the
write request. Thus triggering a ZIL commit for each SYNC write request on a
ZVOL is unnecessary and harmful for performance. To make matters worse, ZVOL
users have no way to express that they actually want data to be written to
stable storage, which means the ZIL is broken for ZVOLs.

The request for stable storage is expressed by the FUA flag, so we must
commit the ZIL after the write if the FUA flag is set. In addition, we must
commit the ZIL before the write if the FLUSH flag is set.

Also, we must inform the block layer that we actually support FLUSH and FUA.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-07 16:23:06 -08:00
Brian Behlendorf
47621f3d76 Linux 3.3 compat, sops->show_options()
The second argument of sops->show_options() was changed from a
'struct vfsmount *' to a 'struct dentry *'.  Add an autoconf check
to detect the API change and then conditionally define the expected
interface.  In either case we are only interested in the zfs_sb_t.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #549
2012-02-03 10:02:01 -08:00
Brian Behlendorf
ab26409db7 Linux 3.1 compat, super_block->s_shrink
The Linux 3.1 kernel has introduced the concept of per-filesystem
shrinkers which are directly assoicated with a super block.  Prior
to this change there was one shared global shrinker.

The zfs code relied on being able to call the global shrinker when
the arc_meta_limit was exceeded.  This would cause the VFS to drop
references on a fraction of the dentries in the dcache.  The ARC
could then safely reclaim the memory used by these entries and
honor the arc_meta_limit.  Unfortunately, when per-filesystem
shrinkers were added the old interfaces were made unavailable.

This change adds support to use the new per-filesystem shrinker
interface so we can continue to honor the arc_meta_limit.  The
major benefit of the new interface is that we can now target
only the zfs filesystem for dentry and inode pruning.  Thus we
can minimize any impact on the caching of other filesystems.

In the context of making this change several other important
issues related to managing the ARC were addressed, they include:

* The dnlc_reduce_cache() function which was called by the ARC
to drop dentries for the Posix layer was replaced with a generic
zfs_prune_t callback.  The ZPL layer now registers a callback to
drop these dentries removing a layering violation which dates
back to the Solaris code.  This callback can also be used by
other ARC consumers such as Lustre.

  arc_add_prune_callback()
  arc_remove_prune_callback()

* The arc_reduce_dnlc_percent module option has been changed to
arc_meta_prune for clarity.  The dnlc functions are specific to
Solaris's VFS and have already been largely eliminated already.
The replacement tunable now represents the number of bytes the
prune callback will request when invoked.

* Less aggressively invoke the prune callback.  We used to call
this whenever we exceeded the arc_meta_limit however that's not
strictly correct since it results in over zeleous reclaim of
dentries and inodes.  It is now only called once the arc_meta_limit
is exceeded and every effort has been made to evict other data from
the ARC cache.

* More promptly manage exceeding the arc_meta_limit.  When reading
meta data in to the cache if a buffer was unable to be recycled
notify the arc_reclaim thread to invoke the required prune.

* Added arcstat_prune kstat which is incremented when the ARC
is forced to request that a consumer prune its cache.  Remember
this will only occur when the ARC has no other choice.  If it
can evict buffers safely without invoking the prune callback
it will.

* This change is also expected to resolve the unexpect collapses
of the ARC cache.  This would occur because when exceeded just the
arc_meta_limit reclaim presure would be excerted on the arc_c
value via arc_shrink().  This effectively shrunk the entire cache
when really we just needed to reclaim meta data.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #466
Closes #292
2012-01-11 11:46:02 -08:00
Darik Horn
28eb9213d8 Linux 3.2 compat: set_nlink()
Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit

  SHA: bfe8684869601dacfcb2cd69ef8cfd9045f62170

Use the new set_nlink() kernel function instead.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #462
2011-12-16 20:02:52 -08:00
Prakash Surya
6ba3b44614 Add make rule for building Arch Linux packages
Added the necessary build infrastructure for building packages
compatible with the Arch Linux distribution. As such, one can now run:

    $ ./configure
    $ make pkg     # Alternatively, one can run 'make arch' as well

on the Arch Linux machine to create two binary packages compatible with
the pacman package manager, one for the zfs userland utilities and
another for the zfs kernel modules. The new packages can then be
installed by running:

    # pacman -U $package.pkg.tar.xz

In addition, source-only packages suitable for an Arch Linux chroot
environment or remote builder can also be build using the 'sarch' make
rule.

NOTE: Since the source dist tarball is created on the fly from the head
of the build tree, it's MD5 hash signature will be continually influx.
As a result, the md5sum variable was intentionally omitted from the
PKGBUILD files, and the '--skipinteg' makepkg option is used. This may
or may not have any serious security implications, as the source tarball
is not being downloaded from an outside source.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #491
2011-12-14 19:14:23 -08:00
Darik Horn
660cbada0f Quote variables in the zfs.lsb script.
For consistency and safety, quote all variables in the zfs.lsb script.
This protects in the unlikely case that any of the file names contain
whitespace.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #439
2011-12-05 09:51:55 -08:00
Darik Horn
c2d9c41d50 Source /etc/default/zfs after setting defaults.
Let the administrator override all script variables by sourcing the
/etc/default/zfs file after the default values are set.

The spelling mistake in the old path name makes it unlikely that this
bug affected any users.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #371
2011-12-05 09:51:20 -08:00
Brian Behlendorf
5547c2f1bf Simplify BDI integration
Update the code to use the bdi_setup_and_register() helper to
simplify the bdi integration code.  The updated code now just
registers the bdi during mount and destroys it during unmount.

The only complication is that for 2.6.32 - 2.6.33 kernels the
helper wasn't available so in these cases the zfs code must
provide it.  Luckily the bdi_setup_and_register() function
is trivial.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #367
2011-11-08 10:19:03 -08:00
Ned Bass
f021fe194f Use automatic variable in Makefile
As written, the $(init_SCRIPTS) rule in etc/init.d/Makefule.am
would not work as expected if the init_SCRIPTS variable were
to contain any elements other than zfs.  Fix this by replacing
the hard-coded 'zfs' reference with $@.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #410
2011-09-26 09:22:30 -07:00
Brian Behlendorf
1a2e6a635f Fix incorrect zpool_cache substitution
This regression was accidentally introduced by commit aa2b489.
I was attempting to simplify the init scripts and accidentally
confused the /etc/init.d and /etc/zfs paths.  This change reverts
the init script modifications.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #370
2011-08-22 16:01:59 -07:00
Brian Behlendorf
aa2b4896c9 Fix autoconf variable substitution in init scripts.
Change the variable substitution in the init script templates
according to the method described in the Autoconf manual;
Chapter 4.7.2: Installation Directory Variables.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-08-19 16:26:14 -07:00
Brian Behlendorf
de0a1c099b Autogen refresh for udev changes
Run autogen.sh using the same autotools versions as upstream:

 * autoconf-2.63
 * automake-1.11.1
 * libtool-2.2.6b
2011-08-08 16:30:27 -07:00
Kyle Fuller
12d06bac9b Move udev rules from /etc/udev to /lib/udev
This change moves the default install location for the zfs udev
rules from /etc/udev/ to /lib/udev/.  The correct convention is
for rules provided by a package to be installed in /lib/udev/.
The /etc/udev/ directory is reserved for custom rules or local
overrides.

Additionally, this patch cleans up some abuse of the bindir install
location by adding a udevdir and udevruledir install directories.
This allows us to revert to the default bin install location.  The
udev install directories can be set with the following new options.

  --with-udevdir=DIR      install udev helpers [EPREFIX/lib/udev]
  --with-udevruledir=DIR  install udev rules [UDEVDIR/rules.d]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #356
2011-08-08 16:21:10 -07:00
Brian Behlendorf
76659dc110 Add backing_device_info per-filesystem
For a long time now the kernel has been moving away from using the
pdflush daemon to write 'old' dirty pages to disk.  The primary reason
for this is because the pdflush daemon is single threaded and can be
a limiting factor for performance.  Since pdflush sequentially walks
the dirty inode list for each super block any delay in processing can
slow down dirty page writeback for all filesystems.

The replacement for pdflush is called bdi (backing device info).  The
bdi system involves creating a per-filesystem control structure each
with its own private sets of queues to manage writeback.  The advantage
is greater parallelism which improves performance and prevents a single
filesystem from slowing writeback to the others.

For a long time both systems co-existed in the kernel so it wasn't
strictly required to implement the bdi scheme.  However, as of
Linux 2.6.36 kernels the pdflush functionality has been retired.

Since ZFS already bypasses the page cache for most I/O this is only
an issue for mmap(2) writes which must go through the page cache.
Even then adding this missing support for newer kernels was overlooked
because there are other mechanisms which can trigger writeback.

However, there is one critical case where not implementing the bdi
functionality can cause problems.  If an application handles a page
fault it can enter the balance_dirty_pages() callpath.  This will
result in the application hanging until the number of dirty pages in
the system drops below the dirty ratio.

Without a registered backing_device_info for the filesystem the
dirty pages will not get written out.  Thus the application will hang.
As mentioned above this was less of an issue with older kernels because
pdflush would eventually write out the dirty pages.

This change adds a backing_device_info structure to the zfs_sb_t
which is already allocated per-super block.  It is then registered
when the filesystem mounted and unregistered on unmount.  It will
not be registered for mounted snapshots which are read-only.  This
change will result in flush-<pool> thread being dynamically created
and destroyed per-mounted filesystem for writeback.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #174
2011-08-04 13:37:38 -07:00
Brian Behlendorf
bfb73f9277 Add .gitignore for zfs.<distro> init scripts
Treat the automatically generated zfs.<distro> init scripts
as build products by adding them to a directory specific
.gitignore file.
2011-08-01 10:27:54 -07:00
Kyle Fuller
5faa9c0367 Turn the init.d scripts into autoconf config files
This change ensures the paths used by the provided init scripts
always reference the prefixes provided at configure time.  The
@sbindir@ and @sysconfdir@ prefixes will be correctly replaced
at build time.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #336
2011-08-01 09:54:44 -07:00
Kyle Fuller
615ab66d18 Provide a rc.d script for archlinux
Unlike most other Linux distributions archlinux installs its
init scripts in /etc/rc.d insead of /etc/init.d.  This commit
provides an archlinux rc.d script for zfs and extends the
build infrastructure to ensure it get's installed in the
correct place.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #322
2011-07-11 14:12:23 -07:00
Fajar A. Nugraha
3af2ce4d68 Check for "udevadm settle" vs "udevsettle"
RHEL5 does not have udevadm, so fix initscript accordingly

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #315
2011-07-08 11:43:16 -07:00
Gunnar Beutner
8b0cf399ff Updated init scripts to enable automatic sharing of ZFS datasets.
The relevant init scripts were updated so as to automatically share
ZFS datasets using "zfs share -a" at boot time.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-07-06 09:20:28 -07:00
Zachary Bedell
e93ced4847 Update zfs.gentoo/zfs.lsb init script
* Update paths to zpool/zfs tools,
* Log less for non-error conditions,
* Don't be fatal if umount fails at shutdown -- final init remount
  will take care of it if /usr or / are in use

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-07-06 09:20:14 -07:00
Gunnar Beutner
c8082367cf Removed erroneous backticks in the zfs.lunar init script.
The backticks would cause the output of the zfs commands
to be evaluated as input for the if construct rather than
their exit status.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-07-05 11:25:48 -07:00
Gunnar Beutner
0f4524cca4 Fixed indentation in the zfs.lunar init script.
One of the blocks in the init script wasn't indented properly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-07-05 11:25:48 -07:00
Andrew Tselischev
b59322a0d8 Fix 'rc_parallel="YES"' error
If rc_parallel="YES" zfs starts before localmount, which leads
to "No such file or directory" error on systems with /usr on a
separate partition.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-07-04 13:54:59 -07:00
Brian Behlendorf
2cf7f52bc4 Linux compat 2.6.39: mount_nodev()
The .get_sb callback has been replaced by a .mount callback
in the file_system_type structure.  When using the new
interface the caller must now use the mount_nodev() helper.

Unfortunately, the new interface no longer passes the vfsmount
down to the zfs layers.  This poses a problem for the existing
implementation because we currently save this pointer in the
super block for latter use.  It provides our only entry point
in to the namespace layer for manipulating certain mount options.

This needed to be done originally to allow commands like
'zfs set atime=off tank' to work properly.  It also allowed me
to keep more of the original Solaris code unmodified.  Under
Solaris there is a 1-to-1 mapping between a mount point and a
file system so this is a fairly natural thing to do.  However,
under Linux they many be multiple entries in the namespace
which reference the same filesystem.  Thus keeping a back
reference from the filesystem to the namespace is complicated.

Rather than introduce some ugly hack to get the vfsmount and
continue as before.  I'm leveraging this API change to update
the ZFS code to do things in a more natural way for Linux.
This has the upside that is resolves the compatibility issue
for the long term and fixes several other minor bugs which
have been reported.

This commit updates the code to remove this vfsmount back
reference entirely.  All modifications to filesystem mount
options are now passed in to the kernel via a '-o remount'.
This is the expected Linux mechanism and allows the namespace
to properly handle any options which apply to it before passing
them on to the file system itself.

Aside from fixing the compatibility issue, removing the
vfsmount has had the benefit of simplifying the code.  This
change which fairly involved has turned out nicely.

Closes #246
Closes #217
Closes #187
Closes #248
Closes #231
2011-07-01 13:36:39 -07:00
Brian Behlendorf
5c03efc379 Linux compat 2.6.39: security_inode_init_security()
The security_inode_init_security() function now takes an additional
qstr argument which must be passed in from the dentry if available.
Passing a NULL is safe when no qstr is available the relevant
security checks will just be skipped.

Closes #246
Closes #217
Closes #187
2011-07-01 12:40:08 -07:00
Brian Behlendorf
2a005961a4 Ensure all block devices are available
These days most disk drivers will probe for devices asynchronously.
This means it's possible that when you zfs init script runs all the
required block devices may not yet have been discovered.  The result
is the pool may fail to cleanly import at boot time.  This is
particularly common when you have a large number of devices.

The fix is for the init script to block until udev settles and we
are no longer detecting new devices.  Once the system has settled
the zfs modules can be loaded and the pool with be automatically
imported.
2011-06-30 14:45:33 -07:00
Prasad Joshi
b312979252 Tear down and flush the mmap region
The inode eviction should unmap the pages associated with the inode.
These pages should also be flushed to disk to avoid the data loss.
Therefore, use truncate_setsize() in evict_inode() to release the
pagecache.

The API truncate_setsize() was added in 2.6.35 kernel. To ensure
compatibility with the old kernel, the patch defines its own
truncate_setsize function.

Signed-off-by: Prasad Joshi <pjoshi@stec-inc.com>
Closes #255
2011-06-27 09:59:19 -07:00
Ned A. Bass
560bcf9d14 Multipath device manageability improvements
Update udev helper scripts to deal with device-mapper devices created
by multipathd.  These enhancements are targeted at a particular
storage network topology under evaluation at LLNL consisting of two
SAS switches providing redundant connectivity between multiple server
nodes and disk enclosures.

The key to making these systems manageable is to create shortnames for
each disk that conveys its physical location in a drawer.  In a
direct-attached topology we infer a disk's enclosure from the PCI bus
number and HBA port number in the by-path name provided by udev.  In a
switched topology, however, multiple drawers are accessed via a single
HBA port.  We therefore resort to assigning drawer identifiers based
on which switch port a drive's enclosure is connected to.  This
information is available from sysfs.

Add options to zpool_layout to generate an /etc/zfs/zdev.conf using
symbolic links in /dev/disk/by-id of the form
<label>-<UUID>-switch-port:<X>-slot:<Y>.  <label> is a string that
depends on the subsystem that created the link and defaults to
"dm-uuid-mpath" (this prefix is used by multipathd).  <UUID> is a
unique identifier for the disk typically obtained from the scsi_id
program, and <X> and <Y> denote the switch port and disk slot numbers,
respectively.

Add a callout script sas_switch_id for use by multipathd to help
create symlinks of the form described above.  Update zpool_id and the
udev zpool rules file to handle both multipath devices and
conventional drives.
2011-06-23 10:46:06 -07:00
Darik Horn
b772aedeec Autogen refresh.
Run autogen.sh using the same autotools versions as upstream:

 * autoconf-2.63
 * automake-1.11.1
 * libtool-2.2.6b
2011-06-17 13:24:44 -07:00
Darik Horn
b9f27ee765 Fix autoconf variable substitution in udev rules.
Change the variable substitution in the udev rule templates
according to the method described in the Autoconf manual;
Chapter 4.7.2: Installation Directory Variables.

The udev rules are improperly generated if the bindir parameter
overrides the prefix parameter during configure. For example:

  # ./configure --prefix=/usr/local --bindir=/opt/zfs/bin

The udev helper is installed as /opt/zfs/bin/zpool_id, but the
corresponding udev rule has a different path:

  # /usr/local/etc/udev/rules.d/60-zpool.rules
  ENV{DEVTYPE}=="disk", IMPORT{program}="/usr/local/bin/zpool_id -d %p"

The @bindir@ variable expands to "${exec_prefix}/bin", so it cannot
be used instead of @prefix@ directly.

This also applies to the zvol_id helper.

Closes #283.
2011-06-17 10:11:29 -07:00
Brian Behlendorf
2e08aedba4 Always check -Wno-unused-but-set-variable gcc support
The previous commit 8a7e1ceefa wasn't
quite right.  This check applies to both the user and kernel space
build and as such we must make sure it runs regardless of what
the --with-config option is set too.

For example, if --with-config=kernel then the autoconf test does
not run and we generate build warnings when compiling the kernel
packages.
2011-06-14 16:40:35 -07:00
Brian Behlendorf
8a7e1ceefa Check for -Wno-unused-but-set-variable gcc support
Gcc versions 4.3.2 and earlier do not support the compiler flag
-Wno-unused-but-set-variable.  This can lead to build failures
on older Linux platforms such as Debian Lenny.  Since this is
an optional build argument this changes add a new autoconf check
for the option.  If it is supported by the installed version of
gcc then it is used otherwise it is omited.

See commit's 12c1acde76 and
79713039a2 for the reason the
-Wno-unused-but-set-variable options was originally added.
2011-06-14 14:43:22 -07:00
Alexey Shvetsov
6f582dc708 Remove root 'ls' after mount workaround
This workaround was introduced to workaround issue #164.  This
issue was fixed by commit 5f35b19 so the workaround can be safely
dropped from both the zfs.fedora and zfs.gentoo init scripts.
2011-05-12 15:01:35 -07:00
Alexey Shvetsov
06abcdd3f4 Fix zfs.gentoo init script logic
* Fix zfs.ko module check
* Check 'zfs umount -a' return value
2011-05-12 14:45:57 -07:00