Commit Graph

2033 Commits

Author SHA1 Message Date
Chris Dunlap
69520d6855 Rework zed_notify_email for configurable PROG/OPTS
This commit reworks the zed_notify_email() function to allow
configuration of the mail executable and command-line arguments.

ZED_EMAIL_PROG specifies the name or path of the executable responsible
for sending notifications via email.  This variable defaults to "mail".

ZED_EMAIL_OPTS specifies command-line options passed to ZED_EMAIL_PROG.
The following keyword substitutions are performed:
- @ADDRESS@ is replaced with the recipient email address(es)
- @SUBJECT@ is replaced with the notification subject
This variable defaults to "-s '@SUBJECT@' @ADDRESS@".

ZED_EMAIL_ADDR replaces ZED_EMAIL (although the latter is retained
for backward compatibility).  This variable can contain multiple
addresses as long as they are delimited by whitespace.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3634
Closes #3631
2015-07-30 11:52:56 -07:00
Chris Dunlap
6f1eccff2c Fix whitespace in zed_log_err
This commit fixes the two adjacent spaces that appear in zed_log_err()
messages when ZEVENT_EID is undefined.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-07-30 11:52:35 -07:00
Brian Behlendorf
7e8bddd019 Update arc_memory_throttle() to check pageout
This brings the behavior of arc_memory_throttle() back in sync with
illumos.  The updated memory throttling policy roughly goes like this:

* Never throttle if more than 10% of memory is free.  This threshold
  is configurable with the zfs_arc_lotsfree_percent module option.

* Minimize any throttling of kswapd even when free memory is below
  the set threshold.  Allow it to write out pages as quickly as
  possible to help alleviate the memory pressure.

* Delay all other threads when free memory is below the set threshold
  in order to avoid compounding the memory pressure.  Buffers will be
  evicted from the ARC to reduce the issue.

The Linux specific zfs_arc_memory_throttle_disable module option has
been removed in favor of the existing zfs_arc_lotsfree_percent tuning.
Setting zfs_arc_lotsfree_percent=0 will have the same effect as
zfs_arc_memory_throttle_disable and it was therefore redundant.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3637
2015-07-30 11:52:12 -07:00
Brian Behlendorf
11f552fa90 Update arc_available_memory() to check freemem
While Linux doesn't provide detailed information about the state of
the VM it does provide us total free pages.  This information should
be incorporated in to the arc_available_memory() calculation rather
than solely relying on a signal from direct reclaim.  Conceptually
this brings arc_available_memory() back in sync with illumos.

It is also desirable that the target amount of free memory be tunable
on a system.  While the default values are expected to work well
for most workloads there may be cases where custom values are needed.
The zfs_arc_sys_free module option was added for this purpose.

zfs_arc_sys_free - The target number of bytes the ARC should leave
                   as free memory on the system.  This value can
                   checked in /proc/spl/kstat/zfs/arcstats and
                   setting this module option will override the
                   default value.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3637
2015-07-30 11:50:22 -07:00
Brian Behlendorf
6339c1b9dc Bound zvol_threads module option
The zvol_threads module option should be bounded to a reasonable
range.  The taskq must have at least 1 thread and shouldn't have
more than 1,024 at most.  The default value of 32 is a reasonable
default.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3614
2015-07-29 07:42:11 -07:00
Chunwei Chen
21a96fb635 Fix "BUG: Bad page state" caused by writeback flag
Commit d958324 fixed the deadlock between page lock and range lock by
unlocking the page lock before acquiring the range lock. However,
this created a new issue #3075.

The problem is that if we can't set the write back bit before releasing
the page lock.  Then other processes will be unaware that the page is
under active write back.  They may therefore truncate the page,
invalidate the page, or not honor the sync semantics.

To workaround this problem we re-dirty the page before dropping the
page lock.  While this doesn't prevent the page from being truncated
it does ensure it won't be invalidated.  Then the range lock and the
page lock are reacquired in the correct deadlock-free order.

Once both locks are safely held the page state can be rechecked.  If
all is well and the page is in the expect state the dirty bit can be
removed, the write back bit set, and the page removed from the skip
count.  If not the page will be handled as appropriate.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3075
2015-07-29 07:38:15 -07:00
Frédéric VANNIÈRE
c1718e9580 Fix build failure with Linux 4.1 and FTRACE
Signed-off-by: Frédéric VANNIÈRE <f.vanniere@planet-work.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3546
2015-07-29 07:35:06 -07:00
Brian Behlendorf
1229323d5f Align thread priority with Linux defaults
Under Linux filesystem threads responsible for handling I/O are
normally created with the maximum priority.  Non-I/O filesystem
processes run with the default priority.  ZFS should adopt the
same priority scheme under Linux to maintain good performance
and so that it will complete fairly when other Linux filesystems
are active.  The priorities have been updated to the following:

$ ps -eLo rtprio,cls,pid,pri,nice,cmd | egrep 'z_|spl_|zvol|arc|dbu|meta'
     -  TS 10743  19 -20 [spl_kmem_cache]
     -  TS 10744  19 -20 [spl_system_task]
     -  TS 10745  19 -20 [spl_dynamic_tas]
     -  TS 10764  19   0 [dbu_evict]
     -  TS 10765  19   0 [arc_prune]
     -  TS 10766  19   0 [arc_reclaim]
     -  TS 10767  19   0 [arc_user_evicts]
     -  TS 10768  19   0 [l2arc_feed]
     -  TS 10769  39   0 [z_unmount]
     -  TS 10770  39 -20 [zvol]
     -  TS 11011  39 -20 [z_null_iss]
     -  TS 11012  39 -20 [z_null_int]
     -  TS 11013  39 -20 [z_rd_iss]
     -  TS 11014  39 -20 [z_rd_int_0]
     -  TS 11022  38 -19 [z_wr_iss]
     -  TS 11023  39 -20 [z_wr_iss_h]
     -  TS 11024  39 -20 [z_wr_int_0]
     -  TS 11032  39 -20 [z_wr_int_h]
     -  TS 11033  39 -20 [z_fr_iss_0]
     -  TS 11041  39 -20 [z_fr_int]
     -  TS 11042  39 -20 [z_cl_iss]
     -  TS 11043  39 -20 [z_cl_int]
     -  TS 11044  39 -20 [z_ioctl_iss]
     -  TS 11045  39 -20 [z_ioctl_int]
     -  TS 11046  39 -20 [metaslab_group_]
     -  TS 11050  19   0 [z_iput]
     -  TS 11121  38 -19 [z_wr_iss]

Note that under Linux the meaning of a processes priority is inverted
with respect to illumos.  High values on Linux indicate a _low_ priority
while high value on illumos indicate a _high_ priority.

In order to preserve the logical meaning of the minclsyspri and
maxclsyspri macros when they are used by the illumos wrapper functions
their values have been inverted.  This way when changes are merged
from upstream illumos we won't need to remember to invert the macro.
It could also lead to confusion.

This patch depends on https://github.com/zfsonlinux/spl/pull/466.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #3607
2015-07-28 13:36:47 -07:00
Brian Behlendorf
c97d30691c Check for NULL in dmu_free_long_range_impl()
A NULL should never be passed as the dnode_t pointer to the function
dmu_free_long_range_impl().  Regardless, because we have a reported
occurrence of this let's add some error handling to catch this.
Better to report a reasonable error to caller than panic the system.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3445
2015-07-28 13:30:53 -07:00
Turbo Fredriksson
21d41d6806 Make sure that POOL_IMPORTED is set, unset and checked where appropriate.
* If it's unset in find_rootfs(), no pool is imported so no point in
  looking for a rootfs.
* If find_rootfs() couldn't find a rootfs, the pool is exported. Remember
  to unset POOL_IMPORTED after doing so.
* Set POOL_IMPORTED if/when a pool have been imported in import_pool().
* Improve backup import (the one using cache file).

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3636
2015-07-28 13:29:28 -07:00
Turbo Fredriksson
48511ea645 Fix some minor issues with the SYSV init and initramfs scripts.
This is some minor fixes to commits 2cac7f5f11
and 2a34db1bdb.

* Make sure to alien'ate the new initramfs rpm package as well!
  The rpm package is build correctly, but alien isn't run on it to
  create the deb.
* Before copying file from COPY_FILE_LIST, make sure the DESTDIR/dir exists.
* Include /lib/udev/vdev_id file in the initrd.
* Because the initrd needs to use '/sbin/modprobe' instead of 'modprobe',
  we need to use this in load_module() as well.
  * Make sure that load_module() can be used more globaly, instead of
    calling '/sbin/modprobe' all over the place.
  * Make sure that check_module_loaded() have a parameter - module to
    check.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3626
2015-07-24 15:05:33 -07:00
Brian Behlendorf
96c080cb9c Minor style cleanup
Address minor differences in style between upstream and ZoL.  This
patch contains no functional differences and is solely designed to
minimize the delta from upstream.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3533
2015-07-23 09:42:54 -07:00
Brian Behlendorf
3056818343 Remove double counting HDR_L2ONLY_SIZE
Commit d962d5d didn't quite properly resolve the HDR_L2ONLY_SIZE
accounting.  Accounting is now performed only in the constructor
and destructor which is a nice simplification.  It should have
been removed the from create and destroy functions.  This brings
up back in sync with upstream.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3533
2015-07-23 09:42:44 -07:00
Brian Behlendorf
8c8af9d807 Add hdr_recl() reclaim callback
Originally removed because it wasn't required under Linux.  However,
there may still be some utility in signaling the arc reclaim thread
under Linux via reclaim.  This should already have happened by other
means but it's not harmless and reduces another point of divergence
with upstream.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3533
2015-07-23 09:42:40 -07:00
Brian Behlendorf
728d6ae91e Reinstate zfs_arc_p_min_shift
Commit f521ce1 removed the minimum value for "arc_p" allowing it to
drop to zero or grow to "arc_c".  This was done to improve specific
workload which constantly dirties new "metadata" but also frequently
touches a "small" amount of mfu data (e.g. mkdir's).

This change may still be desirable but it needs to be re-investigated.
in the context of the recent ARC changes from upstream.  Therefore
this code is being restored to facilitate benchmarking.  By setting
"zfs_arc_p_min_shift=64" we easily compare the performance.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3533
2015-07-23 09:42:32 -07:00
Prakash Surya
36da08ef9b Illumos 5817 - change type of arcs_size from uint64_t to refcount_t
5817 change type of arcs_size from uint64_t to refcount_t
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://www.illumos.org/issues/5817
  https://github.com/illumos/illumos-gate/commit/2fd872a

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3533
2015-07-23 09:42:28 -07:00
Prakash Surya
500445c046 Illumos 5445 - Add more visibility via arcstats
5445 Add more visibility via arcstats; specifically arc_state_t
stats and differentiate between "data" and "metadata"
Reviewed by: Basil Crow <basil.crow@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Bayard Bell <bayard.bell@nexenta.com>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/5445
  https://github.com/illumos/illumos-gate/commit/4076b1b

Porting Notes:

This patch is an improved version of cc7f677 which was previously
merged in ZoL.  This patch incorporates the additional improvements
which were made upstream.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3533
2015-07-23 09:42:06 -07:00
Matthew Ahrens
ca67b33aba Illumos 5376 - arc_kmem_reap_now() should not result in clearing arc_no_grow
5376 arc_kmem_reap_now() should not result in clearing arc_no_grow
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5376
  https://github.com/illumos/illumos-gate/commit/2ec99e3

Porting Notes:

The good news is that many of the recent changes made upstream to the
ARC tackled issues previously observed by ZoL with similar solutions.
The bad news is those solution weren't identical to the ones we applied.
This patch is designed to split the difference and apply as much of the
upstream work as possible.

* The arc_available_memory() function was removed previous in ZoL but
due to the upstream changes it makes sense to add it back.  This function
has been customized for Linux so that it can be used to determine a low
memory.  This provides the same basic functionality as the illumos version
allowing us to minimize changes through the rest of the code base.  The
exact mechanism used to detect a low memory state remains unchanged so
this change isn't a significant as it might first appear.

* This patch includes the long standing fix for arc_shrink() which was
originally proposed in #2167.  Since there were related changes to this
function it made sense to include that work.

* The arc_init() function has been re-factored.  As before it sets sane
default values for the ARC but then calls arc_tuning_update() to apply
user specific tuning made via module options.  The arc_tuning_update()
function is then called periodically by the arc_reclaim_thread() to
apply changes to the tunings made during normal operation.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3616
Closes #2167
2015-07-23 09:41:28 -07:00
Brian Behlendorf
3b79cef212 Set default _initconfdir directory
The _initconfdir macro is normally provided by global rpm macros
file for use in the spec file.  However, older distributions such
as CentOS 6 do not define it.  To prevent a build failure in this
case the spec file has been updated to use a reasonable default
when the value is undefined.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3617
2015-07-21 13:16:50 -07:00
Brian Behlendorf
53b1d9794e Add logic to try and recover an inode with an invalid mode
When an inode is detected with invalid mode bits the safe thing to
do is panic the system.  This indicates a problem with the contents
of a dnode and it should never be possible.  This is the default
behavior.

Unfortunately, due to flaws in the system attribute (SA) implementation
(on all platforms) it was possible that ZFS could create a damaged dnode.
This was a rare issue which only impacted dnodes which used a spill
block.  Normally only symlinks and files with ACLs would require a
spill block.  However, if the dataset had the xattr=sa property set
and extended attributes were used this problem could occur.

As of the 0.6.4 tag the root cause of this issue has been fixed.  For
pools which are exhibiting this damage the 'zfs_recover=1' module option
may be set.  This will cause ZFS to interpret the dnode with invalid
mode bits as a normal file.  This may allow the files to be accessed
for recovery purposes.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3548
2015-07-17 15:33:35 -07:00
Turbo Fredriksson
47a4a6fd5f Support parallel build trees (VPATH builds)
Build products from an out of tree build should be written
relative to the build directory.  Sources should be referred
to by their locations in the source directory.

This is accomplished by adding the 'src' and 'obj' variables
for the module Makefile.am, using relative paths to reference
source files, and by setting VPATH when source files are not
co-located with the Makefile.  This enables the following:

  $ mkdir build
  $ cd build
  $ ../configure \
    --with-spl=$HOME/src/git/spl/ \
    --with-spl-obj=$HOME/src/git/spl/build
  $ make -s

This change also has the advantage of resolving the following
warning which is generated by modern versions of automake.

  Makefile.am:00: warning: source file 'xxx' is in a subdirectory,
  Makefile.am:00: but option 'subdir-objects' is disabled

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1082
2015-07-17 13:42:51 -07:00
Brian Behlendorf
2a53e2dacc Update inode under range lock
After a successful write the inode must be updated under the range
lock.  If it is updated after dropping the lock there exists a race
where the znode and inode wile disagree about the file size.  This
could result in narrow window of time where read(2) is able to access
data beyond what fstat(2) reports as the file size.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes #3601
2015-07-17 09:18:22 -07:00
Brian Behlendorf
bd29109f1a Linux 4.2 compat: follow_link() / put_link()
As of Linux 4.2 the kernel has completely retired the nameidata
structure.  One of the few remaining consumers of this interface
were the follow_link() and put_link() callbacks.

This patch adds the required checks to configure to detect the
interface change and updates the functions accordingly.  Migrating
to the simple_follow_link() interface was considered but was decided
against ironically due to the increased complexity.

It also should be noted that the kernel follow_link() and put_link()
interfaces changes several times after 4.1 and but before 4.2.  This
means there is a narrow range of kernel commits which never appear
in an official tag of the Linux kernel which ZoL will not build.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Issue #3596
2015-07-17 09:18:16 -07:00
Brian Behlendorf
7eb333fbdd Linux 4.2 compat: remove bio->bi_cnt access
Linux 4.2 commit torvalds/linux@dac5621 renamed bio->bi_cnt to
bio->__bi_cnt.  Because this value is only used once in a block of
debug code it simplest just to remove the PANIC.  To my knowledge
this debugging has never been hit or proved useful so this is no
great loss.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #3596
2015-07-17 09:16:08 -07:00
Brian Behlendorf
e80da86447 Linux 4.2 compat: bdi_setup_and_register()
The vfs_compat.h header should include the linux/backing-dev.h header
because it depends on the bdi_* functions defined there.  In previous
kernels this header was being indirectly included which prevented a
build failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #3596
2015-07-17 09:15:43 -07:00
Matthew Ahrens
905edb405d Illumos 5347 - idle pool may run itself out of space
5347 idle pool may run itself out of space
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://github.com/illumos/illumos-gate/commit/231aab8
  https://github.com/illumos/illumos-gate/commit/4a92375 3642
  https://www.illumos.org/issues/5347
  https://github.com/zfsonlinux/zfs/commit/89b1cd6 (partial commit & fix)
  https://github.com/zfsonlinux/zfs/commit/fbeddd6 Illumos 4390
  https://github.com/zfsonlinux/zfs/commit/2696dfa Illumos 3642, 3643

Porting notes:
This is completing the partial fix from FreeBSD

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3586
2015-07-14 10:35:21 -07:00
Manoj Joseph
93f6d7e2e5 Illumos 5764 - "zfs send -nv" directs output to stderr
5764 "zfs send -nv" directs output to stderr
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Basil Crow <basil.crow@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://github.com/illumos/illumos-gate/commit/dc5f28a
  https://www.illumos.org/issues/5764

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3585
2015-07-14 10:28:32 -07:00
Alexander Eremin
1cddb8c9ff Illumos 5610 - zfs clone from different source and target pools produces coredump
5610 zfs clone from different source and target pools produces coredump
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://github.com/illumos/illumos-gate/commit/03b1c29
  https://www.illumos.org/issues/5610
  https://www.illumos.org/issues/5824
  https://github.com/zfsonlinux/zfs/issues/2911
  https://github.com/zfsonlinux/zfs/commit/9063f65

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3584
2015-07-14 10:27:46 -07:00
Prasad Joshi
eaa52d32b0 Illumos 1765 - assert triggered in libzfs_import.c
1765 assert triggered in libzfs_import.c trying to import pool
name beginning with a number
Reviewed-by: Garrett D'Amore <garrett@damore.org>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://github.com/illumos/illumos-gate/commit/9edf9eb
  https://www.illumos.org/issues/1765

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3562
2015-07-14 10:23:29 -07:00
Richard Yao
0de7c552b6 Failure of userland copy should return EFAULT
Many key internal functions pass system return codes that are safe to
return to userland. In the case of ddi_copyin(9F), an error passes -1
and the documentation states very clearly that drivers should pass
EFAULT to userland when this happens.

http://illumos.org/man/9F/ddi_copyin

This does not happen in the ZFS source code. I believe it should be
changed to pass EFAULT. I caught this when writing man pages for the
libzfs_core API.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3575
2015-07-14 10:20:35 -07:00
Boris Protopopov
b39c22b73c Translate sync zio to sync bio
Translate zio requests with ZIO_PRIORITY_SYNC_READ and
ZIO_PRIORITY_SYNC_WRITE into synchronous bio requests by setting
READ_SYNC and WRITE_SYNC flags. Specifically, WRITE_SYNC flag turns
out to have a pronounced effect when writing to an SSD-based SLOG.

When WRITE_SYNC is not set (WRITE is set instead), the block trace
for a SLOG device looks as follows:
...
130,96   0        3     0.008968390     0  C   W 830464 + 136 [0]
130,96   0        4     0.011999161     0  C   W 830720 + 136 [0]
130,96   0        5     0.023955549     0  C   W 831744 + 136 [0]
130,96   0        6     0.024337663 19775  A   W 832000 + 136 <- (130,97) 829952
130,96   0        7     0.024338823 19775  Q   W 832000 + 136 [z_wr_iss/6]
130,96   0        8     0.024340523 19775  G   W 832000 + 136 [z_wr_iss/6]
130,96   0        9     0.024343187 19775  P   N [z_wr_iss/6]
130,96   0       10     0.024344120 19775  I   W 832000 + 136 [z_wr_iss/6]
130,96   0       11     0.026784405     0 UT   N [swapper] 1
130,96   0       12     0.026805339   202  U   N [kblockd/0] 1
130,96   0       13     0.026807199   202  D   W 832000 + 136 [kblockd/0]
130,96   0       14     0.026966948     0  C   W 832000 + 136 [0]
130,96   3        1     0.000449358 19788  A   W 829952 + 136 <- (130,97) 827904
130,96   3        2     0.000450951 19788  Q   W 829952 + 136 [z_wr_iss/19]
130,96   3        3     0.000453212 19788  G   W 829952 + 136 [z_wr_iss/19]
130,96   3        4     0.000455956 19788  P   N [z_wr_iss/19]
130,96   3        5     0.000457076 19788  I   W 829952 + 136 [z_wr_iss/19]
130,96   3        6     0.002786349     0 UT   N [swapper] 1
...

Here the 130,197 is the partition created on the log device when adding it
to the pool, whereas the base device is 130,96. As one can see, the writes
to the SLOG are not marked synchronous (the S is missing next to W), and
the queue unplugs occur based on the timer (UT event) resulting in slightly
over 2 msec latency of writes. This results in a sub-par performance of
single stream synchronous writes (limited by latency of the SLOG).

When the WRITE_SYNC is set, a similar trace looks as follows:
...
130,96   4        1     0.000000000 70714  A  WS 4280576 + 136 <- (130,97) 4278528
130,96   4        2     0.000000832 70714  Q  WS 4280576 + 136 [(null)]
130,96   4        3     0.000002109 70714  G  WS 4280576 + 136 [(null)]
130,96   4        4     0.000003394 70714  P   N [(null)]
130,96   4        5     0.000003846 70714  I  WS 4280576 + 136 [(null)]
130,96   4        6     0.000004854 70714  D  WS 4280576 + 136 [(null)]
130,96   5        1     0.000354487 70713  A  WS 4280832 + 136 <- (130,97) 4278784
130,96   5        2     0.000355072 70713  Q  WS 4280832 + 136 [(null)]
130,96   5        3     0.000356383 70713  G  WS 4280832 + 136 [(null)]
130,96   5        4     0.000357635 70713  P   N [(null)]
130,96   5        5     0.000358088 70713  I  WS 4280832 + 136 [(null)]
130,96   5        6     0.000359191 70713  D  WS 4280832 + 136 [(null)]
130,96   0       76     0.000159539     0  C  WS 4280576 + 136 [0]
130,96  16       85     0.000742108 70718  A  WS 4281088 + 136 <- (130,97) 4279040
130,96  16       86     0.000743197 70718  Q  WS 4281088 + 136 [z_wr_iss/15]
130,96  16       87     0.000744450 70718  G  WS 4281088 + 136 [z_wr_iss/15]
130,96  16       88     0.000745817 70718  P   N [z_wr_iss/15]
130,96  16       89     0.000746705 70718  I  WS 4281088 + 136 [z_wr_iss/15]
130,96  16       90     0.000747848 70718  D  WS 4281088 + 136 [z_wr_iss/15]
130,96   0       77     0.000604063     0  C  WS 4280832 + 136 [0]
130,96   0       78     0.000899858     0  C  WS 4281088 + 136 [0]

As one can see, all the writes are synchronous (WS), and I/O completions
(e.g. from issue I to completion C) take 160-250 usec, or about 10x faster.

Since WRITE_SYNC or READ_SYNC flags are among several factors that are
considered when processing bio requests, it seems prudent to mark all the
zio requests of synchronous priority with the READ/WRITE_SYNC flags to make
them eligible for consideration as such by the Linux block I/O layer.

Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3529
2015-07-13 14:28:50 -07:00
Brian Behlendorf
2b7b78fa5d Fix switch-bool warning
As of gcc version 5.1.1 a new warning has been added to detect the
use of a boolean in a switch statement (-Wswitch-bool).  Resolve the
warning by explicitly casting the value to an integer type.

  zfs-0.6.4/module/zfs/zvol.c: In function 'zvol_request':
  error: switch condition has boolean value [-Werror=switch-bool]

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-07-13 13:03:01 -07:00
Brian Behlendorf
c2d17fd891 Disable gcc bool-compare warning
As of gcc version 5.1.1 a new boolean comparison warning has been
introduced.  This warning is harmless but is triggered several places
in the ZFS code base.  Because warnings are promoted to errors when
building with debugging enabled it is necessary to disable the warning
when using versions of gcc which automatically enabling this check.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-07-13 12:55:26 -07:00
Brian Behlendorf
5970eb3d60 Use truncate instead of fallocate in ziltest.sh
For the purposes of creating sparse files the truncate command is
preferable to fallocate because generic sparse files are more widely
supported by older platforms.  Specifically Debian Wheezy which is
based on a 2.6.32 kernel used ext3 by default which at the time did
not support it.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-07-13 11:02:59 -07:00
Richard Yao
541da9935d Fix Xen Virtual Block Device detection
We fail to make partitions on xvd (Xen Virtual Block) devices. This also
causes debug builds of zpool create to return an error when given xen
virtual block devices. These devices should be given the same treatment
as vd (KVM Virtual Block) devices, so we adjust the relevant code paths.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3576
2015-07-10 12:21:14 -07:00
Will Andrews
98cb3a7655 Illumos 5813 - zfs_setprop_error(): Handle errno value E2BIG.
5813 zfs_setprop_error(): Handle errno value E2BIG.
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://github.com/illumos/illumos-gate/commit/6fdcb3d
  https://www.illumos.org/issues/5813

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3572
2015-07-10 12:13:19 -07:00
Justin T. Gibbs
99197f034e Illumos 5661 - ZFS: "compression = on" should use lz4 if feature is enabled
5661 ZFS: "compression = on" should use lz4 if feature is enabled
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Reviewed by: Xin LI <delphij@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://github.com/illumos/illumos-gate/commit/db1741f
  https://www.illumos.org/issues/5661

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3571
2015-07-10 12:11:45 -07:00
Jan Kryl
15cfbb38fd Illumos 5427 - memory leak in libzfs when doing rollback
5427 memory leak in libzfs when doing rollback
Reviewed by: Michael Tsymbalyuk <mtzaurus@gmail.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Approved by: Dan McDonald <danmcd@omniti.com>

References
  https://github.com/illumos/illumos-gate/commit/b7070b7
  https://www.illumos.org/issues/5427

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3569
2015-07-10 12:09:32 -07:00
Basil Crow
de0a9d7630 Illumos 5118 - When verifying or creating a storage pool, error messages only show one device
5118 When verifying or creating a storage pool, error messages
only show one device
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Boris Protopopov <boris.protopopov@me.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://github.com/illumos/illumos-gate/commit/75fbdf9
  https://www.illumos.org/issues/5118

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3567
2015-07-10 12:07:13 -07:00
George Wilson
3e43edd2c5 Illumos 4966 - zpool list iterator does not update output
4966 zpool list iterator does not update output
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://github.com/illumos/illumos-gate/commit/cd67d23
  https://www.illumos.org/issues/4966

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3566
2015-07-10 12:00:35 -07:00
Josef 'Jeff' Sipek
411bf201f5 Illumos 4745 - fix AVL code misspellings
4745 fix AVL code misspellings
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://github.com/illumos/illumos-gate/commit/6907ca4
  https://www.illumos.org/issues/4745

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3565
2015-07-10 11:58:37 -07:00
Josef 'Jeff' Sipek
02f8fe4260 Illumos 4626 - libzfs memleak in zpool_in_use()
4626 libzfs memleak in zpool_in_use()
Reviewed by: Tony Nguyen <tony.nguyen@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://github.com/illumos/illumos-gate/commit/fb13f48
  https://www.illumos.org/issues/4626

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3563
2015-07-10 11:57:38 -07:00
Brian Behlendorf
cc49250563 Move dracut directory to contrib
The dracut code is analogous to the initramfs code and as such
it should be located in the contrib with initramfs for consistency.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-07-09 13:59:37 -07:00
Turbo Fredriksson
2cac7f5f11 Initramfs scripts for ZoL.
* Supports booting of a ZFS snapshot.
  Do this by cloning the snapshot into a dataset. If this, the resulting
  dataset, already exists, destroy it. Then mount it on root.
  * If snapshot does not exist, use base dataset (the part before '@')
    as boot filesystem instead.
  * If no snapshot is specified on the 'root=' kernel command line, but there
    is an '@', then get a list of snapshots below that filesystem and ask the
    user which to use.
  * Clone with 'mountpoint=none' and 'canmount=noauto' - we mount manually
    and explicitly.
    * For sub-filesystems, that doesn't have a mountpoint property set, we use
      the 'org.zol:mountpoint' to keep track of it's mountpoint.
  * Allow rollback of snapshots instead of clone it and boot from the clone.
* Allow mounting a root- and subfs with mountpoint=legacy set
* Allow mounting a filesystem which is using nativ encryption.
* Support all currently used kernel command line arguments
  All the different distributions have their own standard on what to specify
  on the kernel command line to boot of a ZFS filesystem.
  * Extra options:
    * zfsdebug=(on,yes,1)	Show extra debugging information
    * zfsforce=(on,yes,1)	Force import the pool
    * rollback=(on,yes,1)	Rollback (instead of clone) the snapshot
* Only try to import pool if it haven't already been imported
  * This will negate the need to force import a pool that have not been exported cleanly.
  * Support exclusion of pools to import by setting ZFS_POOL_EXCEPTIONS in /etc/default/zfs.
* Support additional configuration variable ZFS_INITRD_ADDITIONAL_DATASETS
  to mount additional filesystems not located under your root dataset.
* Include /etc/modprobe.d/{zfs,spl}.conf in the initrd if it/they exist.
* Include the udev rule to use by-vdev for pool imports.
* Include the /etc/default/zfs file to the initrd.
* Only try /dev/disk/by-* in the initrd if USE_DISK_BY_ID is set.
  * Use /dev/disk/by-vdev before anything.
  * Add /dev as a last ditch attempt.
  * Fallback to using the cache file if that exist if nothing else worked.
* Use /sbin/modprobe instead of built-in (BusyBox) modprobe.
  This gets rid of the message "modprobe: can't load module zcommon".
  Thanx to pcoultha for finding this.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2116
Closes #2114
2015-07-08 18:14:34 -07:00
Tim Chase
1cd777340b Prevent reclaim in metaslab preload threads
Reclaim during metaslab preloading can cause deadlocks involving znode
z_lock and ARC buffer header ht_lock.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3532.
2015-07-06 09:36:13 -07:00
Alexander Motin
e16b3fcc61 Illumos 5008 - lock contention (rrw_exit) while running a read only load
5008 lock contention (rrw_exit) while running a read only load
Reviewed by: Matthew Ahrens <matthew.ahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Approved by: Garrett D'Amore <garrett@damore.org>

Porting notes:

This patch ported perfectly cleanly to ZoL.  During testing 100% cached
small-block reads, extreme contention was noticed on rrl->rr_lock from
rrw_exit() due to the frequent entering and leaving ZPL.  Illumos picked
up this patch from FreeBSD and it also helps under Linux.

On a 1-minute 4K cached read test with 10 fio processes pinned to a single
socket on a 4-socket (10 thread per socket) NUMA system, contentions on
rrl->rr_lock were reduced from 508799 to 43085.

Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3555
2015-07-06 09:34:13 -07:00
Matthew Ahrens
4bda3bd0e7 Illumos 5911 - ZFS "hangs" while deleting file
5911 ZFS "hangs" while deleting file
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

References:
  https://www.illumos.org/issues/5911
  https://github.com/illumos/illumos-gate/commit/46e1baa

Porting notes:

Resolved ISO C90 forbids mixed declarations and code wanting in
the dnode_free_range() function.

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3554
2015-07-06 09:31:42 -07:00
Arne Jansen
5e8cd5d17f Illumos 5981 - Deadlock in dmu_objset_find_dp
5981 Deadlock in dmu_objset_find_dp
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/5981
  https://github.com/illumos/illumos-gate/commit/1d3f896

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3553
2015-07-06 09:31:35 -07:00
Andriy Gapon
71e2fe41be Illumos 5946, 5945
5946 zfs_ioc_space_snaps must check that firstsnap and lastsnap refer to snapshots
5945 zfs_ioc_send_space must ensure that fromsnap refers to a snapshot
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>

References:
  https://www.illumos.org/issues/5946
  https://www.illumos.org/issues/5945
  https://github.com/illumos/illumos-gate/commit/24218be

Ported-by: Andriy Gapon <avg@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3552
2015-07-06 09:31:30 -07:00
Andriy Gapon
b6640117f0 Illumos 5870 - dmu_recv_end_check() leaks origin_head hold if error happens in drc_force branch
5870 dmu_recv_end_check() leaks origin_head hold if error happens in drc_force branch
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5870
  https://github.com/illumos/illumos-gate/commit/beddaa9

Ported-by: Andriy Gapon <avg@FreeBSD.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3551
2015-07-06 09:22:18 -07:00