Commit Graph

2268 Commits

Author SHA1 Message Date
Chris Williamson b9541d6b7d Illumos 5408 - managing ZFS cache devices requires lots of RAM
5408 managing ZFS cache devices requires lots of RAM
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>

Porting notes:

Due to the restructuring of the ARC-related structures, this
patch conflicts with at least the following existing ZoL commits:

    6e1d7276c9
    Fix inaccurate arcstat_l2_hdr_size calculations

        The ARC_SPACE_HDRS constant no longer exists and has been
        somewhat equivalently replaced by HDR_L2ONLY_SIZE.

    e0b0ca983d
    Add visibility in to cached dbufs

        The new layering of l{1,2}arc_buf_hdr_t within the arc_buf_hdr
        struct requires additional structure member names to be used
        when referencing the inner items.  Also, the presence of L1 or L2
        inner member is indicated by flags using the new HDR_HAS_L{1,2}HDR
        macros.

Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-06-11 10:27:25 -07:00
George Wilson 2a4324141f Illumos 5369 - arc flags should be an enum
5369 arc flags should be an enum
5370 consistent arc_buf_hdr_t naming scheme
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

Porting notes:

ZoL has moved some ARC definitions into arc_impl.h.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported by: Tim Chase <tim@chase2k.com>
2015-06-11 10:27:25 -07:00
Brian Behlendorf 2345368646 Rename cv_wait_interruptible() to cv_wait_sig()
Commit f752b46e added the cv_wait_interruptible() function to allow
condition variables to be woken by signals.  This function and its
timed wait counterpart should have been named cv_wait_sig() to match
the illumos interface which provides the same functionality.

This patch renames the symbol but leaves a #define compatibility
wrapper in place until the ZFS code can be moved to the correct
name.

This patch also makes a small number of cosmetic changes to make
the condvar source and header cstyle clean.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #456
2015-06-10 16:36:12 -07:00
Brian Behlendorf 86c16c59fe Retire rwsem_is_locked() compat
Stock Linux 2.6.32 and earlier kernels contained a broken version of
rwsem_is_locked() which could return an incorrect value.  Because of
this compatibility code was added to detect the broken implementation
and replace it with our own if needed.

The fix for this issue was merged in to the mainline Linux kernel as
of 2.6.33 and the major enterprise distributions based on 2.6.32 have
all backported the fix.  Therefore there is no longer a need to carry
this code and it can be removed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #454
2015-06-10 16:35:48 -07:00
Matthew Ahrens c3520e7f1f Illumos 5818 - zfs {ref}compressratio is incorrect with 4k sector size
5818 zfs {ref}compressratio is incorrect with 4k sector size
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Approved by: Albert Lee <trisk@omniti.com>

References:
  https://www.illumos.org/issues/5818
  https://github.com/illumos/illumos-gate/commit/81cd5c5

Ported-by: Don Brady <don.brady@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3432
2015-06-10 16:24:01 -07:00
Arne Jansen 9c43027b3f Illumos 5269 - zpool import slow
5269 zpool import slow
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5269
  https://github.com/illumos/illumos-gate/commit/12380e1e

Ported-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3396
2015-06-09 13:48:02 -07:00
Chris Dunlop a876b0305e Make taskq_wait() block until the queue is empty
Under Illumos taskq_wait() returns when there are no more tasks
in the queue.  This behavior differs from ZoL and FreeBSD where
taskq_wait() returns when all the tasks in the queue at the
beginning of the taskq_wait() call are complete.  New tasks
added whilst taskq_wait() is running will be ignored.

This difference in semantics makes it possible that new subtle
issues could be introduced when porting changes from Illumos.
To avoid that possibility the taskq_wait() function is being
updated such that it blocks until the queue in empty.

The previous behavior remains available through the
taskq_wait_outstanding() interface.  Note that this function
was previously called taskq_wait_all() but has been renamed
to avoid confusion.

Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #455
2015-06-09 12:20:12 -07:00
Turbo Fredriksson d050c627b5 Improve on the ZFS events documentation
* Add information about the 'zpool events' command in zpool(8).
* More events and payloads defined in zfs-events(5).
* I/O Stages and I/O Flags sections added.
* Remove unused legacy "zio_deadline" payload define.

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3467
2015-06-09 11:19:19 -07:00
Brian Behlendorf 65037d9b25 Add libzfs_error_init() function
All fprintf() error messages are moved out of the libzfs_init()
library function where they never belonged in the first place.  A
libzfs_error_init() function is added to provide useful error
messages for the most common causes of failure.

Additionally, in libzfs_run_process() the 'rc' variable was renamed
to 'error' for consistency with the rest of the code base.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
2015-05-22 13:34:58 -07:00
Brian Behlendorf dc5e8b7041 Add boot_ncpus macro
For compatibility define boot_ncpus as num_online_cpus().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-05-21 09:58:01 -07:00
Tim Chase f467b05a26 Initialize dbu_tqent in dmu_buf_init_user()
The dbu_evict_taskq added in 0c66c32d is only invoked via
taskq_dispatch_ent(). In these cases, ZoL's implementation of taskqs
requires the entries to be initialized first with taskq_init_ent() in
order that, among other things, the embedded spinlock is initialized
properly.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3419
2015-05-18 11:39:14 -07:00
Max Grossman 5dc8b7365f Illumos 5765 - add support for estimating send stream size with lzc_send_space when source is a bookmark
5765 add support for estimating send stream size with lzc_send_space when source is a bookmark
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@nexenta.com>

References:
  https://www.illumos.org/issues/5765
  https://github.com/illumos/illumos-gate/commit/643da460

Porting notes:
* Unused variable 'recordsize' in dmu_send_estimate() dropped

Ported-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3397
2015-05-13 09:03:59 -07:00
Matthew Ahrens 252e1a54ab Illumos 5810 - zdb should print details of bpobj
5810 zdb should print details of bpobj
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>

References:
  https://www.illumos.org/issues/5810
  https://github.com/illumos/illumos-gate/commit/732885fc

Ported-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3387
2015-05-11 15:10:24 -07:00
Tim Chase e48533383b Linux 2.6.36 compat, use REQ_FAILFAST_MASK and remove pre-2.6.36 support
Commit f4af6bb783 which added support
for REQ_FAILFAST_MASK but the new autoconf test didn't use the same
preprocessor macro name as the code did.

The effect is that FAILFAST mode has not been enabled for ZoL in any
post-2.6.35 kernel.

Retire the HAVE_BIO_RW_FAILFAST interface used in pre-2.6.28 kernels.

Raise an error condition if the FAILFAST interface can't be detected.

Signed-off-by: Tim Chase <tim@onlight.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3386
2015-05-11 15:07:00 -07:00
Matthew Ahrens f1512ee61e Illumos 5027 - zfs large block support
5027 zfs large block support
Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5027
  https://github.com/illumos/illumos-gate/commit/b515258

Porting Notes:

* Included in this patch is a tiny ISP2() cleanup in zio_init() from
Illumos 5255.

* Unlike the upstream Illumos commit this patch does not impose an
arbitrary 128K block size limit on volumes.  Volumes, like filesystems,
are limited by the zfs_max_recordsize=1M module option.

* By default the maximum record size is limited to 1M by the module
option zfs_max_recordsize.  This value may be safely increased up to
16M which is the largest block size supported by the on-disk format.
At the moment, 1M blocks clearly offer a significant performance
improvement but the benefits of going beyond this for the majority
of workloads are less clear.

* The illumos version of this patch increased DMU_MAX_ACCESS to 32M.
This was determined not to be large enough when using 16M blocks
because the zfs_make_xattrdir() function will fail (EFBIG) when
assigning a TX.  This was immediately observed under Linux because
all newly created files must have a security xattr created and
that was failing.  Therefore, we've set DMU_MAX_ACCESS to 64M.

* On 32-bit platforms a hard limit of 1M is set for blocks due
to the limited virtual address space.  We should be able to relax
this one the ABD patches are merged.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #354
2015-05-11 12:23:16 -07:00
Matthew Ahrens 63e3a8616b Illumos 5349 - verify that block pointer is plausible before reading
5349 verify that block pointer is plausible before reading
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Xin Li <delphij@FreeBSD.org>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Approved by: Gordon Ross <gwr@nexenta.com>

References:
  https://www.illumos.org/issues/5349
  https://github.com/illumos/illumos-gate/commit/f63ab3d5

Porting notes:
* Several variable declarations were moved due to C style needs

Ported-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3373
2015-05-08 14:09:15 -07:00
Christopher Siden 0c60cc326b Illumos 4951 - ZFS administrative commands (fix)
4951 ZFS administrative commands should use reserved space, not fail with ENOSPC
Approved by: Christopher Siden <christopher.siden@delphix.com>

References:
  https://www.illumos.org/issues/4951
  https://github.com/illumos/illumos-gate/commit/c39f2c8

Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-05-04 09:41:10 -07:00
Matthew Ahrens 3d45fdd6c0 Illumos 4951 - ZFS administrative commands should use reserved space
4951 ZFS administrative commands should use reserved space, not with ENOSPC
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://www.illumos.org/issues/4373
  https://github.com/illumos/illumos-gate/commit/7d46dc6

Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-05-04 09:41:10 -07:00
Matthew Ahrens 83017311e4 Illumos 3654,3656
3654 zdb should print number of ganged blocks
3656 remove unused function zap_cursor_move_to_key()
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://www.illumos.org/issues/3654
  https://www.illumos.org/issues/3656
  https://github.com/illumos/illumos-gate/commit/d5ee8a1

Porting Notes:

3655 and 3657 were part of this commit but those hunks were dropped
since they apply to mdb.

Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-05-04 09:41:09 -07:00
George Wilson 98b254188a Illumos #5244 - zio pipeline callers should explicitly invoke next stage
5244 zio pipeline callers should explicitly invoke next stage
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Approved by: Gordon Ross <gwr@nexenta.com>

References:
  https://www.illumos.org/issues/5244
  https://github.com/illumos/illumos-gate/commit/738f37b

Porting Notes:

1. The unported "2932 support crash dumps to raidz, etc. pools"
   caused a merge conflict due to a copyright difference in
   module/zfs/vdev_raidz.c.
2. The unported "4128 disks in zpools never go away when pulled"
   and additional Linux-specific changes caused merge conflicts in
   module/zfs/vdev_disk.c.

Ported-by: Richard Yao <richard.yao@clusterhq.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2828
2015-04-30 15:07:47 -07:00
Justin T. Gibbs 6ebebaceb1 Illumos 5531 - NULL pointer dereference in dsl_prop_get_ds()
5531 NULL pointer dereference in dsl_prop_get_ds()
Author: Justin T. Gibbs <justing@spectralogic.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/5531
  https://github.com/illumos/illumos-gate/commit/e57a022

Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-04-28 16:25:44 -07:00
Justin T. Gibbs 0c66c32d1d Illumos 5056 - ZFS deadlock on db_mtx and dn_holds
5056 ZFS deadlock on db_mtx and dn_holds
Author: Justin Gibbs <justing@spectralogic.com>
Reviewed by: Will Andrews <willa@spectralogic.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5056
  https://github.com/illumos/illumos-gate/commit/bc9014e

Porting Notes:

sa_handle_get_from_db():
  - the original patch includes an otherwise unmentioned fix for a
    possible usage of an uninitialised variable

dmu_objset_open_impl():
  - Under Illumos list_link_init() is the same as filling a list_node_t
    with NULLs, so they don't notice if they miss doing list_link_init()
    on a zero'd containing structure (e.g. allocated with kmem_zalloc as
    here). Under Linux, not so much: an uninitialised list_node_t goes
    "Boom!" some time later when it's used or destroyed.

dmu_objset_evict_dbufs():
  - reduce stack usage using kmem_alloc()

Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-04-28 16:25:34 -07:00
Justin T. Gibbs d683ddbb72 Illumos 5314 - Remove "dbuf phys" db->db_data pointer aliases in ZFS
5314 Remove "dbuf phys" db->db_data pointer aliases in ZFS
Author: Justin T. Gibbs <justing@spectralogic.com>
Reviewed by: Andriy Gapon <avg@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Will Andrews <willa@spectralogic.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5314
  https://github.com/illumos/illumos-gate/commit/c137962

Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-04-28 16:25:20 -07:00
Alex Reece 9925c28cde Illumos 5095 - panic when adding a duplicate dbuf to dn_dbufs
5095 panic when adding a duplicate dbuf to dn_dbufs
Author: Alex Reece <alex@delphix.com>
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Mattew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Josef Sipek <jeffpc@josefsipek.net>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/5095
  https://github.com/illumos/illumos-gate/commit/86bb58a

Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-04-28 16:24:49 -07:00
Justin T. Gibbs 5aea3644d6 Illumos 5038 - Remove "old-style" flexible array usage in ZFS.
5038 Remove "old-style" flexible array usage in ZFS.
Author: Justin T. Gibbs <justing@spectralogic.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Richard Lowe <richlowe@richlowe.net>

References:
  https://www.illumos.org/issues/5038
  https://github.com/illumos/illumos-gate/commit/7f18da4

Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-04-28 16:24:24 -07:00
Alex Reece 8951cb8dfb Illumos 4873 - zvol unmap calls can take a very long time for larger datasets
4873 zvol unmap calls can take a very long time for larger datasets
Author: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Basil Crow <basil.crow@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/4873
  https://github.com/illumos/illumos-gate/commit/0f6d88a

Porting Notes:

dbuf_free_range():
  - reduce stack usage using kmem_alloc()
  - the sorted AVL tree will handle the spill block case correctly
    without all the special handling in the for() loop

Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-04-28 16:24:03 -07:00
Jerry Jelinek 788eb90c4c Illumos 3897 - zfs filesystem and snapshot limits
3897 zfs filesystem and snapshot limits
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Christopher Siden <christopher.siden@delphix.com>

References:
  https://www.illumos.org/issues/3897
  https://github.com/illumos/illumos-gate/commit/a2afb61

Porting Notes:

dsl_dataset_snapshot_check(): reduce stack usage using kmem_alloc().

Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-04-28 16:22:51 -07:00
Isaac Huang 0336f3d001 Remove useless variable spa_active_count
This isn't required for the Linux port because the kernel tracks
if a module is busy.  The prototype for spa_busy() is also removed
since its definition was already removed.

Signed-off-by: Isaac Huang <he.huang@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3262
2015-04-27 09:22:05 -07:00
Justin T. Gibbs ec8501ee12 5313 Allow I/Os to be aggregated across ZIO priority classes
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Will Andrews <willa@SpectraLogic.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/5313
  https://github.com/illumos/illumos-gate/commit/fe319232

Ported-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3280
2015-04-24 15:16:56 -07:00
Ned Bass 4eb30c6864 Serialize access to spa->spa_feat_stats nvlist
The function spa_add_feature_stats() manipulates the shared nvlist
spa->spa_feat_stats in an unsafe concurrent manner. Add a mutex to
protect the list.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3335
2015-04-24 15:04:43 -07:00
Chunwei Chen 07012da668 Fix kernel panic due to tsd_exit in ZFS_EXIT(zsb)
The following panic would occur under certain heavy load:
[ 4692.202686] Kernel panic - not syncing: thread ffff8800c4f5dd60 terminating with rrw lock ffff8800da1b9c40 held
[ 4692.228053] CPU: 1 PID: 6250 Comm: mmap_deadlock Tainted: P           OE  3.18.10 #7

The culprit is that ZFS_EXIT(zsb) would call tsd_exit() every time, which
would purge all tsd data for the thread. However, ZFS_ENTER is designed to be
reentrant, so we cannot allow ZFS_EXIT to blindly purge tsd data.

Instead, we rely on the new behavior of tsd_set. When NULL is passed as the
new value to tsd_set, it will automatically remove the tsd entry specified the
the key for the current thread.

rrw_tsd_key and zfs_allow_log_key already calls tsd_set(key, NULL) when
they're done. The zfs_fsyncer_key relied on ZFS_EXIT(zsb) to call tsd_exit() to
do clean up. Now we explicitly call tsd_set(key, NULL) on them.

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3247
2015-04-24 14:57:54 -07:00
Richard Yao d3c677bcd3 Implement areleasef()
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #449
2015-04-24 13:02:37 -07:00
Tim Chase 40d06e3c78 Mark all ZPL and ioctl functions as PF_FSTRANS
Prevent deadlocks by disabling direct reclaim during all ZPL and ioctl
calls as well as the l2arc and adapt ARC threads.

This obviates the need for MUTEX_FSTRANS so its previous uses and
definition have been eliminated.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3225
2015-04-03 11:38:59 -07:00
Tim Chase ae26dd0039 Don't allow shrinking a PF_FSTRANS context
Avoid deadlocks when entering the shrinker from a PF_FSTRANS context.

This patch also reverts commit d0d5dd7 which added MUTEX_FSTRANS.  Its
use has been deprecated within ZFS as it was an ineffective mechanism
to eliminate deadlocks.  Among other things, it introduced the need for
strict ordering of mutex locking and unlocking in order that the
PF_FSTRANS flag wouldn't set incorrectly.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #446
2015-04-03 11:32:31 -07:00
Chris Dunlop c089961110 Add crgetzoneid() stub
Illumos 3897 introduces a dependency on crgetzoneid(). Stub it out until
such time as zones are implemented.

References:
  https://www.illumos.org/issues/3897
  https://github.com/illumos/illumos-gate/commit/fb7001f

Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #444
2015-04-02 09:49:55 -07:00
Prakash Surya a4069eef2e Illumos 5695 - dmu_sync'ed holes do not retain birth time
5695 dmu_sync'ed holes do not retain birth time
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5695
  https://github.com/illumos/illumos-gate/commit/70163ac

Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3229
2015-03-27 14:51:34 -07:00
Ned Bass 95a6990d9a Add NULL guard in zfs_zrlock_class event class
The owner field could be NULL in some cases, so add a guard.  Shorten
__entry field names to fit assignment statements in 80 columns.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Fixes #3220
2015-03-27 14:45:32 -07:00
Brian Behlendorf 7d90f569b3 Check all vdev labels in 'zpool import'
When using 'zpool import' to scan for available pools prefer vdev names
which reference vdevs with more valid labels.  There should be two labels
at the start of the device and two labels at the end of the device.  If
labels are missing then the device has been damaged or is in some other
way incomplete.  Preferring names with fully intact labels helps weed out
bad paths and improves the likelihood of being able to import the pool.

This behavior only applies when scanning /dev/ for valid pools.  If a
cache file exists the pools described by the cache file will be used.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Closes #3145
Closes #2844
Closes #3107
2015-03-25 14:52:52 -07:00
Chris Dunlop d07b7c7f21 Reduce size of zfs_sb_t: allocate z_hold_mtx separately
zfs_sb_t has grown to the point where using kmem_zalloc() for allocations
is triggering the 32k warning threshold.

We can't safely convert this entire allocation to use vmem_alloc() instead
of kmem_alloc() because the backing_dev_info structure is embedded here.
It depends on the bit_waitqueue() function which won't behave properly
when given a virtual address.

Instead, use vmem_alloc() to allocate the z_hold_mtx array separately.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #3178
2015-03-24 13:17:44 -07:00
Tim Chase 79a0056e13 Add mutex_enter_nested() which maps to mutex_lock_nested()
Also add support for the "name" parameter in mutex_init().  The name
allows for better diagnostics, namely in /proc/lock_stats when
lock debugging is enabled.  Nested mutexes are necessary to support
CONFIG_PROVE_LOCKING. ZoL can use mutex_enter_nested()'s "class" argument
to to convey the locking hierarchy.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #439
2015-03-20 13:53:31 -07:00
Brian Behlendorf 2cbb06b561 Restructure per-filesystem reclaim
Originally when the ARC prune callback was introduced the idea was
to register a single callback for the ZPL.  The ARC could invoke this
call back if it needed the ZPL to drop dentries, inodes, or other
cache objects which might be pinning buffers in the ARC.  The ZPL
would iterate over all ZFS super blocks and perform the reclaim.

For the most part this design has worked well but due to limitations
in 2.6.35 and earlier kernels there were some problems.  This patch
is designed to address those issues.

1) iterate_supers_type() is not provided by all kernels which makes
it impossible to safely iterate over all zpl_fs_type filesystems in
a single callback.  The most straight forward and portable way to
resolve this is to register a callback per-filesystem during mount.
The arc_*_prune_callback() functions have always supported multiple
callbacks so this is functionally a very small change.

2) Commit 050d22b removed the non-portable shrink_dcache_memory()
and shrink_icache_memory() functions and didn't replace them with
equivalent functionality.  This meant that for Linux 3.1 and older
kernels the ARC had no mechanism to drop dentries and inodes from
the caches if needed.  This patch adds that missing functionality
by calling shrink_dcache_parent() to release dentries which may be
pinning inodes.  This will result in all unused cache entries being
dropped which is a bit heavy handed but it's the only interface
available for old kernels.

3) A zpl_drop_inode() callback is registered for kernels older than
2.6.35 which do not support the .evict_inode callback.  This ensures
that when the last reference on an inode is dropped it is immediately
removed from the cache.  If this isn't done than inode can end up on
the global unused LRU with no mechanism available to ZFS to drop them.
Since the ARC buffers are not dropped the hottest inodes can still
be recreated without performing disk IO.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Issue #3160
2015-03-20 10:35:20 -07:00
Justin T. Gibbs 4c7b7eedcd Illumos 5630 - stale bonus buffer in recycled dnode_t leads to data corruption
5630 stale bonus buffer in recycled dnode_t leads to data corruption
Author: Justin T. Gibbs <justing@spectralogic.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/5630
  https://github.com/illumos/illumos-gate/commit/cd485b4

Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Issue #3172
2015-03-12 15:40:39 -07:00
Ned Bass 417104bdd3 Use cached feature info in spa_add_feature_stats()
Avoid issuing I/O to the pool when retrieving feature flags information.
Trying to read the ZAPs from disk means that zpool clear would hang if
the pool is suspended and recovery would require a reboot. To keep the
feature stats resident in memory, we hang a cached nvlist off of the
spa.  It is built up from disk the first time spa_add_feature_stats() is
called, and refreshed thereafter using the cached feature reference
counts. spa_add_feature_stats() gets called at pool import time so we
can be sure the cached nvlist will be available if the pool is later
suspended.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3082
2015-03-05 14:11:10 -08:00
Brian Behlendorf 8c45def24a Linux 4.0 compat: bdi_setup_and_register()
The 'capabilities' argument which was passed to bdi_setup_and_register()
has been removed.  File systems should no longer pass BDI_CAP_MAP_COPY.
For our purposes this means there are now three different interfaces
which must be handled.  A zpl_bdi_setup_and_register() wrapper function
has been introduced to provide a single interface to the ZPL code.

* 2.6.32 - 2.6.33, bdi_setup_and_register() is not exported.
* 2.6.34 - 3.19, bdi_setup_and_register() takes 3 arguments.
* 4.0 - x.y, bdi_setup_and_register() takes 2 arguments.

I've also taken this opportunity to remove HAVE_BDI because kernels
older then 2.6.32 are no longer supported.  All kernels newer than
this will have one of the above interfaces.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #3128
2015-03-03 10:49:45 -08:00
Brian Behlendorf 4ec15b8dcf Use MUTEX_FSTRANS mutex type
There are regions in the ZFS code where it is desirable to be able
to be set PF_FSTRANS while a specific mutex is held.  The ZFS code
could be updated to set/clear this flag in all the correct places,
but this is undesirable for a few reasons.

1) It would require changes to a significant amount of the ZFS
   code.  This would complicate applying patches from upstream.

2) It would be easy to accidentally miss a critical region in
   the initial patch or to have an future change introduce a
   new one.

Both of these concerns can be addressed by using a new mutex type
which is responsible for managing PF_FSTRANS, support for which was
added to the SPL in commit zfsonlinux/spl@9099312 - Merge branch
'kmem-rework'.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #3050
Closes #3055
Closes #3062
Closes #3132
Closes #3142
Closes #2983
2015-03-03 10:46:40 -08:00
Brian Behlendorf d0d5dd7144 Add MUTEX_FSTRANS mutex type
There are regions in the ZFS code where it is desirable to be able
to be set PF_FSTRANS while a specific mutex is held.  The ZFS code
could be updated to set/clear this flag in all the correct places,
but this is undesirable for a few reasons.

1) It would require changes to a significant amount of the ZFS
   code.  This would complicate applying patches from upstream.

2) It would be easy to accidentally miss a critical region in
   the initial patch or to have an future change introduce a
   new one.

Both of these concerns can be addressed by adding a new mutex type
which is responsible for managing PF_FSTRANS, support for which was
added to the SPL in commit 9099312 - Merge branch 'kmem-rework'.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #435
2015-03-03 10:18:24 -08:00
Brian Behlendorf 5f920fbee1 Retire MUTEX_OWNER checks
To minimize the size of a kmutex_t a MUTEX_OWNER check was added.
It allowed the kmutex_t wrapper to leverage the mutex owner which was
already stored in the mutex for certain kernel configurations.

The upside to this was that it reduced the size of the kmutex_t wrapper
structure by the size of a task_struct pointer (4/8 bytes).  The
downside was that two mutex implementations needed to be maintained.
Depending on your exact kernel configuration the correct one would
be selected.

Over the years this solution worked but it could be fragile since it
depending heavily on assumed kernel mutex implementation details.  For
example the SPL_AC_MUTEX_OWNER_TASK_STRUCT configure check needed to
be added when the kernel changed how the owner was stored.  It also
made the code more complicated than it needed to be.

Therefore, in the name of simplicity and portability this optimization
is being retired.  It will slightly increase the memory requirements
for a kmutex_t but only very slightly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #435
2015-03-03 10:13:33 -08:00
Brian Behlendorf a900e28e71 Fix cstyle issue in mutex.h
This patch only addresses the issues identified by the style checker
in mutex.h.  It contains no functional changes.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #435
2015-03-03 10:13:25 -08:00
Brian Behlendorf c1bc8e610b Retire spl_module_init()/spl_module_fini()
In the original implementation of the SPL wrappers were provided
for module initialization and cleanup.  This was done to abstract
away any compatibility code which might be needed for the SPL.

As it turned out the only significant compatibility issue was that
the default pwd during module load differed under Illumos and Linux.
Since this is such as minor thing and the wrappers complicate the
code they are being retired.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#2985
2015-02-27 13:43:39 -08:00
Jörg Thalheim 534759fad3 Linux 3.19 compat: file_inode was added
struct access f->f_dentry->d_inode was replaced by accessor function
file_inode(f)

Signed-off-by: Joerg Thalheim <joerg@higgsboson.tk>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3084
2015-02-10 11:24:51 -08:00